[Python-ideas] Re: A standard library Multiset implementation?
> I am often driven to use, for example, itertools set(permutations(multiset, > n)) Try the more-itertools package: https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.distinct_permutations from more_itertools import distinct_permutations from collections import Counter c = Counter('abracadabra’) print(list(distinct_permutations(c.elements(), 3))) > there is little if any solid evidence that they do what they claim to do ( The code in more-itertools is trustworthy. Also the docs have links to the source, so you can just read the code and let it earn your trust. Raymond ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/GQQDVWYY5JPGEBL7DY33O4IZGD6ADLDV/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Additional LRU cache introspection facilities
> I propose a method: > ... > returns a dictionary {arg: value} representing the cache. > It wouldn't be the cache itself, just a shallow copy > of the cache data I recommend against going down this path. It exposes (and potentially locks in) implementation details such as how we distinguish positional arguments, keyword arguments, and type information (something that has changed more than once). Also, a shallow copy still leaves plenty of room for meddling with the contents of the keys, potentially breaking the integrity of the cache. Another concern is that we've worked hard to remove potential deadlocks from the lru_cache. Hanging on a lock while copying the whole cache complicates our efforts and risks breaking it as users exploit the new feature in unpredictable ways. FWIW, OrderedDict provides methods that make it easy to roll your own variants of the lru_cache(). It would better to do that than to complexify the base implementation in ways that I think we would regret. Raymond ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/OKQO6GFE4JTEAJR4S454KMMKN6C6CNUZ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Inline Try-Except Clause
> Have a look at PEP 463, which looks into this in some detail. I wish this PEP had gained more traction. Sooner or later, everyone wants an expression form of a try/except. When it comes to expressing "in the event of this exception, I want this default", exception expressions read much more nicely than an equivalent try/except block. Also, new syntax would keep the rest of the language clean so that don't end up adding dozens of get() methods. Or r having us expand function signatures with default arguments, like min() and max() functions for example. It would be great if this PEP were to be resurrected. Raymond ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/S4NAVCQQAXYHXYL5DYOYMTACZ6G6A4SW/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Propouse add context to json module.
Based on experience with the decimal module, I think this would open a can of worms. To match what decimal does, we would need a Context() object with methods for dump, dumps, load, loads. There would need to be a thread-local or contextvar instance accessed by getcontext and setcontext, and perhaps a decorator as well. We would need a few pre-made instances for common cases. Also, the decimal module was context aware from the outset. For JSON, we have large body of pre-existing client code that was created and tested without the concept of a context. Should existing code use the new context and possibly break assumed invariants? If the existing code had explicit parameters (such as indent=4), would the context override the parameter, take a backseat to the parameter, or raise an exception? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/KAD3U5HPTC76PCVXBAB4PMKVPDV4P4JC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Default behavior for random.sample when no k
Ram Rachum wrote:. > I notice that the random.sample function doesn't have a default behavior > set when you don't specify k. This is fortunate, because we could make > that behavior just automatically take the length of the first argument. So > we could do this: > shuffled_numbers = random.sample(range(10, 10 ** 5)) > What do you think? This is bad API design. The most likely user mistake is to omit the *k* argument. We want that to be an error. It is common to sample from large populations, we don't want the default to do anything terrible — for example, you're in a Jupyter notebook and type "sample(range(10_000_000))" and forget to enter the sample size. Also, having *k* default to the population size would be surprisingly inconsistent given that choices() has a default k=1. API design principle: don't have unexpectedly different defaults in related functions. Lastly, the use for in-line shuffling is not the primary use case. If there were a default argument, it should cater to the principal use case,. API design principle: don't do anything weird or unexpected by default. IMO you're trying too hard to jam a round peg into a square hole. There isn't a substantive problem being solved — being explicit by writing "sample(p, len(p))" instead of "sample(p)" isn't an undue burden. Please also consider that we thought about all of this when sample() was first created. The current API is intentional. As you noted, this suggestion was also already rejected on the bug tracker. So, this thread seems like an attempt to second guess that outcome as well as the original design decision. If you're going to do something like that, save it for something important :-) Raymond ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/K4RQTFYD43OHQTSCWC32R2KYFQGXHR36/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Default behavior for random.sample when no k
Steven D'Aprano wrote: > > This is easily solved with a three-line helper: > def shuffled(iterable): ... > I have implemented this probably a half a dozen times, and I expect > others have too. FWIW, we've already documented a clean way to do it, https://docs.python.org/3/library/random.html#random.shuffle , "To shuffle an immutable sequence and return a new shuffled list, use sample(x, k=len(x)) instead." >>> data = 'random module' >>> ''.join(sample(data, len(data))) 'uaemdor odmln' Given that we already have shuffle() and sample(), I really don't think we need a third way to it. How about we save API extensions for ideas that add genuine new, useful capabilities. Raymond ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VVKZU6ABPBYZORXMURCIHBZZRNRREMIS/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Augment abc.Set API (support named set methods for dictionary view objects)
> On Jun 1, 2020, at 3:32 AM, a...@yert.pink a...@yert.pink > wrote: > > I propose that the `Set` ABC API should be augmented to contain all of the > named methods. This would provide consistency in the collections, and enhance > the duck typing capabilities of the `Set` abc. Two thoughts. First, I believe Guido intentionally omitted the named set methods from the ABC — perhaps the reasons are documented in the ABC PEP. Second, most APIs are easily expanded by adding new methods, but ABCs define a minimum for other classes to implement. So if we added new methods, it would likely break code that was only meeting the existing minimum. Raymond ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YX7KJ6XRMF4O4MQYRFP7X5KBD52VTGTC/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Equality between some of the indexed collections
> On May 3, 2020, at 6:19 PM, Steven D'Aprano wrote: > >>> `frozenset` and `set` make a counterexample: >>> >> frozenset({1}) == {1} >>> True >>> >> >> Nice catch! That's really interesting. Is there reasoning behind >> `frozenset({1}) == {1}` but `[1] != (1,)`, or is it just an accident of >> history? > > Conceptually, sets are sets, whether they are mutable or frozen. Right. This isn't an accident. It is by design. Also, some numeric types are specifically designed for cross-type comparison: >>> int(3) == float(3) == complex(3, 0) True And in Python 2, by design, str and unicode were comparable: >>> u'abc' == 'abc' True But the general rule is that objects aren't cross-type comparable by default. We have to specifically enable that behavior when we think it universally makes sense. The modern trend is to avoid cross-type comparability, enumerates and data classes for example: >>> Furniture = Enum('Furniture', ('table', 'chair', 'couch')) >>> HTML = Enum('HTML', ('dl', 'ol', 'ul', 'table')) >>> Furniture.table == HTML.table False >>> A = make_dataclass('A', 'x') >>> B = make_dataclass('B', 'x') >>> A(10) == B(10) False Bytes and str are not comparable in Python 3: >>> b'abc' == 'abc' False >> Isn't a tuple essentially just a frozenlist? I know the intended >> semantics of tuples and lists tend to be different, but I'm not sure that's >> relevant. In terms of API, it might look that way. But in terms of use cases, they are less alike: lists-are-looping, tuples-are-for-nonhomongenous-fields. List are like database tables; tuples are like records in the database. Lists are like C arrays; tuples are like structs. On the balance, I think more harm than good would result from making sequence equality not depend on type. Also when needed, it isn't difficult to be explicit that you're converting to a common type to focus on contents: >>> s = bytes([10, 20, 30]) >>> t = (10, 20, 30) >>> list(s) == list(t) When you think about it, it makes sense that a user gets to choose whether equality is determined by contents or by contents and type. For some drinkers, a can of beer is equal to a bottle of bear; for some drinkers, they aren't equal at all ;-) Lastly, when it comes to containers. They each get to make their own rules about what is equal. Dicts compare on contents regardless of order, but OrderedDict requires that the order matches. Raymond ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7WOB36JSIX3ZSG7KFNQ4F563ZKSW32G5/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding a "once" function to functools
> On Apr 29, 2020, at 11:15 AM, Tom Forbes wrote: > > What exactly would the issue be with this: > > ``` > import functools > from threading import Lock > > def once(func): >sentinel = object() >cache = sentinel >lock = Lock() > >@functools.wraps(func) >def _wrapper(): >nonlocal cache, lock, sentinel >if cache is sentinel: >with lock: >if cache is sentinel: >cache = func() >return cache > >return _wrapper > ``` This recipe is the best variant so far and gives us something concrete to talk about :-) Benefits: Guarantees the wrapped function is not called more than once. Restrictions: Only works with zero argument functions. Risks: Any reentrancy or recursion will result in deadlock. Limitations: No instrumentation. No ability to reset or clear. Won't work across multiple processes. It would be nice to look at some compelling use cases. Off hand, I can't think of time when I would have used this decorator. Also, I have a nagging worry that holding a non-reentrant lock across an arbitrary user defined function call is recipe for deadlocks. That's why during code reviews we typically check every single use of Lock() to see if it should have been an RLock(), especially in big systems where GC, __del__, or weakref callbacks can trigger running any code at just about any time. Raymond ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/B32VKG5IPHKEL4Y7MP7WMQZXZYYWVT64/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: deque: Allow efficient operations
> On Apr 29, 2020, at 12:02 PM, Christopher Barker wrote: > > On Apr 29, 2020, at 08:33, Christopher Barker wrote: > > I've wondered about Linked Lists for a while, but while there are many > > versions on PyPi, I can't find one that seems to be mature and maintained. > > Which seems to indicate that there isn't much demand for them. > > Isn't much demand for a *generic* linked list. It would probably be a good > recipe though -- so users could have a starting point for their custom > version. In case you're interested, the pure python OrderedDict code uses a doubly linked list augmented by a dictionary to quickly find individual links. It may be worth taking at look.¹ The implementation was mostly obvious. The only trick was to use weakrefs for the backlink to avoid creating a reference cycle — the original version just lets GC do the clean-up, but users wanted to avoid cycles entirely. Raymond ¹ https://github.com/python/cpython/blob/3.8/Lib/collections/__init__.py#L78 ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5ZSDX4FBSZEG3W6CGY6DNDOTLDOK7AQJ/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Adding a "once" function to functools
> On Apr 26, 2020, at 7:03 AM, Tom Forbes wrote: > > I would like to suggest adding a simple “once” method to functools. As the > name suggests, this would be a decorator that would call the decorated > function, cache the result and return it with subsequent calls. It seems like you would get just about everything you want with one line: once = lru_cache(maxsize=None) which would be used like this: @once def welcome(): len('hello') > Using lru_cache like this works but it’s not as efficient as it could be - in > every case you’re adding lru_cache overhead despite not requiring it. You're likely imagining more overhead than there actually is. Used as shown above, the lru_cache() is astonishingly small and efficient. Access time is slightly cheaper than writing d[()] where d={(): some_constant}. The infinite_lru_cache_wrapper() just makes a single dict lookup and returns the value.¹ The lru_cache_make_key() function just increments the empty args tuple and returns it.² And because it is a C object, calling it will be faster than for a Python function that just returns a constant, "lambda: some_constant()". This is very, very fast. Raymond ¹ https://github.com/python/cpython/blob/master/Modules/_functoolsmodule.c#L870 ² https://github.com/python/cpython/blob/master/Modules/_functoolsmodule.c#L809 ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VCWTMH6Z6ADAH5YKRQ6CU4ZIHLLBN4KQ/ Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Why operators are useful
> On Mar 15, 2019, at 6:49 PM, Chris Angelico wrote: > > On Sat, Mar 16, 2019 at 12:40 PM Raymond Hettinger > wrote: >> Also, it seems like the efficiency concerns were dismissed with hand-waving. >> But usually, coping and updating aren't the desired behavior. When teaching >> Python, I like to talk about how the design of the language nudges you >> towards fast, clear, correct code. The principle is that things that are >> good for you are put within easy reach. Things that require more thought are >> placed a little further away. That is the usual justification for copy() >> and deepcopy() having to be imported rather than being builtins. Copying is >> an obvious thing to do; it is also not usually good for you; so, we have you >> do one extra step to get to it. >> > > I'm not sure I understand this argument. Are you saying that d1+d2 is > bad code because it will copy the dictionary, and therefore it > shouldn't be done? Because the exact same considerations apply to the > addition of two lists, which already exists in the language. Is it bad > to add lists together instead of using extend()? Yes, that exactly. Consider a table in a database. Usually what people want/need/ought-to-do is an SQL UPDATE rather than copy and update which would double the memory requirement and be potentially many times slower. The same applies to Python lists. Unless you actually have a requirement for three distinct lists (c = a + b), it is almost always better to extend in place. Adding lists rather than extending them is a recipe for poor performance (especially if it occurs in a loop): Raymond Performant version s = socket.socket() try: s.connect((host, port)) s.send(request) blocks = [] while True: block = s.recv(4096) if not block: break blocks += [block] # Normally done with append() page = b''.join(blocks) print(page.replace(b'\r\n', b'\n').decode()) finally: s.close() Catastrophic version s = socket.socket() try: s.connect((host, port)) s.send(request) blocks = [] while True: block = s.recv(4096) if not block: break blocks = blocks + [block] # Not good for you. page = b''.join(blocks) print(page.replace(b'\r\n', b'\n').decode()) finally: s.close() ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Why operators are useful
> On Mar 15, 2019, at 12:28 PM, Rhodri James wrote: > > I suspect this is a bit personal; I had sufficiently evil lecturers in my > university Algebra course that I still don't automatically take the > commutativity of "+" over a particular group as a given :-) Nothing is > obvious unless you already know it. We don't design Python for ourselves. We design it for everyday users. Telling them that they can assume nothing is an anti-pattern. People do rely quite a bit on their intuitions. They also rely on implicit patterns already present in the language (i.e. in no other place is + idempotent, in no other place is + a destructive rather than concatenative or accumulative operator). As for commutativity, + would be obviously commutative for numeric types and obviously noncommutative for sequence concatenation, but for dicts the non-commutativity isn't obvious at all. And since the "|" operator is already used for mapping views, the + operator for merging would be unexpected. What is missing from the discussion is that we flat out don't need an operator for this. Use of explicit method names, update() or merge(), is already clear and already brief. Also, if we're honest with ourselves, most of us would use this less than once a year. So why make a pervasive change for this? Today, at least one PEP was rejected that had a stronger case than this proposal. We should consider asking why other major languages haven't gone down this path. The most likely reasons are 1) insufficient need, 2) the "+" operator doesn't make sense, and 3) there are already clean ways to do it. Also, it seems like the efficiency concerns were dismissed with hand-waving. But usually, coping and updating aren't the desired behavior. When teaching Python, I like to talk about how the design of the language nudges you towards fast, clear, correct code. The principle is that things that are good for you are put within easy reach. Things that require more thought are placed a little further away. That is the usual justification for copy() and deepcopy() having to be imported rather than being builtins. Copying is an obvious thing to do; it is also not usually good for you; so, we have you do one extra step to get to it. Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Why operators are useful
> On Mar 15, 2019, at 10:51 AM, Guido van Rossum wrote: > > The general idea here is that once you've learned this simple notation, > equations written using them are easier to *manipulate* than equations > written using functional notation -- it is as if our brains grasp the > operators using different brain machinery, and this is more efficient. There is no question that sometimes operators can be easier to manipulate and reason about than equivalent methods. The use of "+" and "*" are a major win for numeric and sequence types. There is also no question that sometimes method names are better than operators (otherwise, we wouldn't use method names at all). APL is an extreme example of a rich set of operators being both powerful and opaque. So, we have to ask whether we're stretching too far from "operators are good" to "we need this operator". Here are some considerations: Frequency of usage: Math provides ∑ and ∏ because they are common. It doesn't provide a special operator for sqrt(c**2 - b**2) because the latter is less fundamental and less common. To me, f=d.copy() followed by f.update(e) arises so rarely that an operator isn't warranted. The existing code is already concise, clear, and rare. Familiarity: We know about + because we use it a lot in addition and concatenation contexts. However, a symbol like ⊗ is more opaque unless we're using it every day for a particular purpose. To me, the "+" operator implies "add/extend" semantics rather than "replace" semantics. Successive applications of "+" are never idempotent unless one operand is an identity element. So for me, "+" isn't familiar for dict merges. Loosely put, it isn't "plus-like". I think this is why so many other languages decided not use "+" for dict merges even when that would have been a trivially easy implementation choice. Obviousness: When working with "+" on numeric types, it is obvious it should be commutative. When using "+" when sequence types, it is obvious that concatenation is non-commutative. When using "+" for mapping types, it is not obvious that it isn't commutative. Likewise, it isn't obvious that "+" is a destructive operation for mappings (consider that adding to a log file never destroys existing log entries, while updating a dict will overwrite existing values). Harmony: The operators on dict views use "|" but regular dicts would use "+". That doesn't seem harmonious. Impact: When a class in the standard library adds a method or operator, the reverberations are felt only locally. In contrast, the dict API is fundamental. Changing it will reverberate for years. It will be felt in the ABCs, typeshed, and every mapping-like object. IMO such an impactful change should only be made if it adds significant new functionality rather than providing a slightly shorter spelling of something we already have. Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Dict joining using + and +=
> On Mar 5, 2019, at 2:13 PM, Greg Ewing wrote: > > Rhodri James wrote: >> I have to go and look in the documentation because I expect the union >> operator to be '+'. > > Anyone raised on Pascal is likely to find + and * more > natural. Pascal doesn't have bitwise operators, so it > re-uses + and * for set operations. I like the economy > of this arrangement -- it's not as if there's any > other obvious meaning that + and * could have for sets. The language SETL (the language of sets) also uses + and * for set operations.¹ For us though, the decision to use | and & are set in stone. The time for debating the decision was 19 years ago.² Raymond ¹ https://www.linuxjournal.com/article/6805 ² https://www.python.org/dev/peps/pep-0218/ ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Dict joining using + and +=
> On Mar 4, 2019, at 11:24 AM, Guido van Rossum wrote: > > * Regarding how often this is needed, we know that this is proposed and > discussed at length every few years, so I think this will fill a real need. I'm not sure that conclusion follows from the premise :-) Some ideas get proposed routinely because they are obvious things to propose, not because people actually need them. One hint is that the proposals always have generic variable names, "d = d1 + d2", and another is that they are almost never accompanied by actual use cases or real code that would be made better. I haven't seen anyone in this thread say they would use this more than once a year or that their existing code was unclear or inefficient in any way. The lack of dict addition support in other languages (like Java example) is another indicator that there isn't a real need -- afaict there is nothing about Python that would cause us to have a unique requirement that other languages don't have. FWIW, there are some downsides to the proposal -- it diminishes some of the unifying ideas about Python that I typically present on the first day of class: * One notion is that the APIs nudge users toward good code. The "copy.copy()" function has to be imported -- that minor nuisance is a subtle hint that copying isn't good for you. Likewise for dicts, writing "e=d.copy(); e.update(f)" is a minor nuisance that either serves to dissuade people from unnecessary copying or at least will make very clear what is happening. The original motivating use case for ChainMap() was to make a copy free replacement for excessively slow dict additions in ConfigParser. Giving a plus-operator to mappings is an invitation to writing code that doesn't scale well. * Another unifying notion is that the star-operator represents repeat addition across multiple data types. It is a nice demo to show that "a * 5 == a + a + a + a + a" where "a" is an int, float, complex, str, bytes, tuple, or list. Giving __add__() to dicts breaks this pattern. * When teaching dunder methods, the usual advice regarding operators is to use them only when their meaning is unequivocal; otherwise, have a preference for named methods where the method name clarifies what is being done -- don't use train+car to mean train.shunt_to_middle(car). For dicts that would mean not having the plus-operator implement something that isn't inherently additive (it applies replace/overwrite logic instead), that isn't commutative, and that isn't linear when applied in succession (d1+d2+d3). * In the advanced class where C extensions are covered, the organization of the slots is shown as a guide to which methods make sense together: tp_as_number, tp_as_sequence, and tp_as_mapping. For dicts to gain the requisite methods, they will have to become numbers (in the sense of filling out the tp_as_number slots). That will slow down the abstract methods that search the slot groups, skipping over groups marked as NULL. It also exposes method groups that don't typically appear together, blurring their distinction. * Lastly, there is a vague piece of zen-style advice, "if many things in the language have to change to implement idea X, it stops being worth it". In this case, it means that every dict-like API and the related abstract methods and typing equivalents would need to grow support for addition in mappings (would it even make sense to add to shelve objects or os.environ objects together?) That's my two cents worth. I'm ducking out now (nothing more to offer on the subject). Guido's participation in the thread has given it an air of inevitability so this post will likely not make a difference. Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Dict joining using + and +=
> On Mar 1, 2019, at 11:31 AM, Guido van Rossum wrote: > > There's a compromise solution for this possible. We already do this for > Sequence and MutableSequence: Sequence does *not* define __add__, but > MutableSequence *does* define __iadd__, and the default implementation just > calls self.update(other). I propose the same for Mapping (do nothing) and > MutableMapping: make the default __iadd__ implementation call > self.update(other). Usually, it's easy to add methods to classes without creating disruption, but ABCs are more problematic. If MutableMapping grows an __iadd__() method, what would that mean for existing classes that register as MutableMapping but don't already implement __iadd__? When "isinstance(m, MutableMapping)" returns True, is it a promise that the API is fully implemented? Is this something that mypy could would or should complain about? > Anyways, the main reason to prefer d1+d2 over {**d1, **d2} is that the latter > is highly non-obvious except if you've already encountered that pattern before I concur. The latter is also an eyesore and almost certain to be a stumbling block when reading code. That said, I'm not sure we actually need a short-cut for "d=e.copy(); d.update(f)". Code like this comes-up for me perhaps once a year. Having a plus operator on dicts would likely save me five seconds per year. If the existing code were in the form of "d=e.copy(); d.update(f); d.update(g); d.update(h)", converting it to "d = e + f + g + h" would be a tempting but algorithmically poor thing to do (because the behavior is quadratic). Most likely, the right thing to do would be "d = ChainMap(e, f, g, h)" for a zero-copy solution or "d = dict(ChainMap(e, f, g, h))" to flatten the result without incurring quadratic costs. Both of those are short and clear. Lastly, I'm still bugged by use of the + operator for replace-logic instead of additive-logic. With numbers and lists and Counters, the plus operator creates a new object where all the contents of each operand contribute to the result. With dicts, some of the contents for the left operand get thrown-away. This doesn't seem like addition to me (IIRC that is also why sets have "|" instead of "+"). Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 8 update on line length
> On Feb 22, 2019, at 1:10 PM, Greg Ewing wrote: >> “Typesetters hundreds of years ago used less than 80 chars per line, so >> that’s what we should do for Python code now” is a pretty weak argument. > > But that's not the entire argument -- the point it is that typesetters > had the goal of making lines of text readable, which is similar (if not > quite the same) as the goal of making lines of program code readable. > It's a lot closer than, for example, the goal of fitting in an > accountant's spreadsheet. The issue with reference to typesetter rules is that they were targeted at blocks of prose rather than heavily nested hanging indents with non-trivial string literals or a dotted attribute notation. Typesetters were also dealing with fixed page widths and need to leave gutter space for binding. The "rules" aren't comparable at all. > I would say it the other way around. Once you've reduced the complexity > of a line to something a human can handle, *most* of the time 80 chars > is enough. That would make sense if we started at column 0; however, if you have your prefix your thoughts with something like ''' class TestRemote(unittest.TestCase): def test_heartbeat(self): ... self.assertIsInstance(... ''' then the meant of the part "a human can handle" starts at column 30. Then if you need good variable names and/or have to module.function prefixes, there is sometimes little to left to work with. Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 8 update on line length
> On Feb 21, 2019, at 5:06 PM, Chris Barker via Python-ideas > wrote: > > > class Frabawidget: > ... > @wozzle.setter > def (self, woozle): > if not (self.min_woozle < woozle < self.max_woozle): > raise ValueError(f"Expected woozle to be between > {self.min_woozle} and {self.max_woozle}") > self._wozzle = normalize(woozle) > > That's 103 chars long -- and very readable. But, is this that much worse? > > class Frabawidget: > ... > @wozzle.setter > def (self, woozle): > if not (self.min_woozle < woozle < self.max_woozle): > raise ValueError(f"Expected woozle to be between" > "{self.min_woozle} and {self.max_woozle}") > self._wozzle = normalize(woozle) > > (it IS harder to write, that's for sure) Yes, it's worse. You introduced twos bugs. First the space between the two fragments was lost. Second the f on the second f-string was dropped. I see these kinds of line-wrapping errors frequently. The bugs are CAUSED by the line length rule. Also, in the case of multi-line templates, there is no way to wrap them without getting very far from WYSIWYG: def run_diagnostics(location, system, test_engineer): ... if (failures): print(dedent(f'''\ There were {num_failures) anomalies detected in the {location} {system} at {event_time ::%I:%M:%S}}. These anomalies were classified as {level}. Further action is {'' if escalate else 'not'} recommended. ''') else: print(dedent(f'''\ A total of {num_test_cases} diagnostics were run in the {location} {system} as of {event_time::%I:%M:%S}}. No anomalies were detected and further action is not required. Test signed by {test_engineer.title()}. ... Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] PEP 505: None-aware operators
> On Jul 18, 2018, at 10:43 AM, Steve Dower wrote: > > Possibly this is exactly the wrong time to propose the next big syntax > change, since we currently have nobody to declare on it, but since we're > likely to argue for a while anyway it probably can't hurt (and maybe this > will become the test PEP for whoever takes the reins?). It probably is the wrong time and probably can hurt (by introducing divisiveness when we most need for be focusing on coming together). This PEP also shares some traits with PEP 572 in that it solves a somewhat minor problem with new syntax and grammar changes that affect the look and feel of the language in a way that at least some of us (me for example) find to be repulsive. This PEP is one step further away from Python reading like executable pseudo-code. That trait is currently a major draw to the language and I don't think it should get tossed away just to mitigate a minor irritant. We should also consider a moratorium on language changes for while. There is more going on than just a transition to a post-bdfl world. The other implementations of Python are having a hard time keeping up with our recent, ferocious rate of change. Even among the core developers, most people are not fully up to date learning all the new features that have already been added (how many of you are competent with typing, data classes, generalized unpacking, concurrent futures, async, the scoping rules for exceptions and comprehensions, the hundreds of niggling changes in the past few releases, __init_subclass__, __set_name__, details of import logic, issues with SSL certificates, new collections ABCs, etc.?) We've been putting major changes in faster than anyone can keep up with them. We really need to take a breath. Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Fwd: collections.Counter should implement fromkeys
On Jun 29, 2018, at 5:32 PM, Abe Dillon wrote: > > Sure, but in Hettinger's own words "whenever you have a constructor war, > everyone should get their wish". People that want a counting constructor have > that, > people that want the ability to initialize values don't have that. Sorry Abe, but you're twisting my words and pushing very hard for a proposal that doesn't make sense and isn't necessary. * Counts initialized to zero: This isn't necessary. The whole point of counters is that counts default to zero without pre-initialization. * Counts initialized to one: This is already done by the regular constructor. Use "Counter(keys)" if the keys are known to be unique and "Counter(set(keys)" to ignore duplicates. >>> Counter('abc') Counter({'a': 1, 'b': 1, 'c': 1}) >>> Counter(set('abbacac')) Counter({'a': 1, 'b': 1, 'c': 1}) * Counts initialized to some other value: That would be an unusual thing to do but would be easy with the current API. >>> Counter(dict.fromkeys('abc', 21)) Counter({'a': 21, 'b': 21, 'c': 21}) * Note, the reason that fromkeys() is disabled is that it has nonsensical or surprising interpretations: >>> Counter.fromkeys('aaabbc', 2) # What should this do that doesn't surprise at least some users? * That reason is already shown in the source code. @classmethod def fromkeys(cls, iterable, v=None): # There is no equivalent method for counters because setting v=1 # means that no element can have a count greater than one. raise NotImplementedError( 'Counter.fromkeys() is undefined. Use Counter(iterable) instead.') > Obviously, Python breaks SOLID principals successfully all over the place for > pragmatic reasons. > I don't think this is one of those cases. No amount of citing generic design principles will justify adding an API that doesn't make sense. Besides, any possible use cases already have reasonable solutions using the existing API. That is likely why no one has ever requested this behavior before. Based on what I've read in this thread, I see nothing that would change the long-standing decision not to have a fromkeys() method for collections.Counter. The original reasoning still holds. Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Have a "j" format option for lists
On May 9, 2018, at 7:39 AM, Facundo Batistawrote: > > This way, I could do: > authors = ["John", "Mary", "Estela"] "Authors: {:, j}".format(authors) > 'Authors: John, Mary, Estela' > ... > > What do you think? That is an inspired idea. I like it :-) Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Add "default" kw argument to operator.itemgetter and operator.attrgetter
> On May 6, 2018, at 6:00 AM, Steven D'Apranowrote: > > On Thu, May 03, 2018 at 04:32:09PM +1000, Steven D'Aprano wrote: > >> Maybe I'm slow today, but I'm having trouble seeing how to write this as >> a lambda. > > Yes, I was definitely having a "cannot brain, I have the dumb" day, > because it is not that hard to write using lambda. See discussion here: > > https://mail.python.org/pipermail/python-list/2018-May/732795.html > > If anything, the problem is a plethora of choices, where it isn't clear > which if any is the best way, or the One Obvious Way At one time, lambda was the one obvious way. Later, partial, itemgetter, attrgetter, and methodcaller were added to express common patterns for key-functions and map(). If needed, the zoo of lambda alternatives could be further extended to add a rpartial() function that partials from the right. That would have helped with Miki's example. Instead of: get = attrgetter('foo', None) return get(args) or get(config) or get(env) He could've written: get = rpartial(getattr, 'foo', None) return get(args) or get(config) or get(env) If itemgetter and attrgetter only did a single lookup, a default might make sense. However, that doesn't fit well with multiple and/or chained lookups where are number of options are possible. (See https://bugs.python.org/issue14384#msg316222 for examples and alternatives). Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] [offlist] Re: Add "default" kw argument to operator.itemgetter and operator.attrgetter
> On May 2, 2018, at 11:32 PM, Steven D'Apranowrote: > > Intended by whom? By me. I proposed itemgetter() in the first place. That rationale I gave convinced Guido and python-dev to accept it. I then wrote the code, docs, tests and have maintained it for over a decade. So, I have a pretty good idea of what it was intended for. > I think you are being too dismissive of actual use-cases requested by > actual users. Wow, I don't know what to do with this. Over the years, I've added a lot of things requested by users. I really don't like the tone you've struck and what you've implied about me as developer. That feels somewhat pushy and aggressive. Why not just give a +1 to things that are a good idea and -1 for things we're better off without -- no need for ad hominem comments about the person making the post rather than its content -- that feels somewhat disrespectful. > Default values might not have been the primary use > considered when the API was first invented, but the fact that people > keep asking for this feature should tell us that at least some people > have intended uses that are remaining unmet. When I've seen the request in the past, it always alway "it might be nice if ..." but there were no legitimate use cases presented, just toy examples. Also, I'm concerned that about increasing the complexity of itemgetter() API to serve an occasional exotic use case rather that being easy to learn and remember for the common cases. Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Add "default" kw argument to operator.itemgetter and operator.attrgetter
> On May 2, 2018, at 1:08 AM, Vincent Maillolwrote: > > Our PEP idea would be to purpose to add a global default value for > itemgeet and attrgetter method. My preference is to not grow that API further. It is creep well beyond its intended uses. At some point, we're really better off just using a lambda. Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] collections.Counter should implement __mul__, __rmul__
> On Apr 16, 2018, at 5:43 PM, Tim Peterswrote: > > BTW, if _`Counter * scalar` is added, we should think more about > oddball cases. While everyone knows what _they_ mean by "scalar", > Python doesn't. I've started working on an implementation and several choices arise: 1) Reject scalar with a TypeError if scalar is a Counter 2) Reject scalar with a TypeError if scalar is a Mapping 3) Reject scalar with a TypeError if scalar is a Collection 4) Reject scalar with a TypeError if scalar is Sized (has a __len__ method). I lean toward rejecting all things Sized because _everyone_ knows that scalars aren't sized ;-) Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] collections.Counter should implement __mul__, __rmul__
> On Apr 15, 2018, at 10:07 PM, Tim Peterswrote: > > Adding Counter * integer doesn't bother me a bit, but the definition > of what that should compute isn't obvious. Any thoughts on Counter * float? A key use case for what is being proposed is: c *= 1 / c.total Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] collections.Counter should implement __mul__, __rmul__
> On Apr 15, 2018, at 9:04 PM, Peter Norvigwrote: > > it would be a bit weird and disorienting for the arithmetic operators to have > two different signatures: > > += > -= > *= > /= > > Is it weird and disorienting to have: > > += > *= Yes, there is a precedent that does seem to have worked out well in practice :-) It isn't exactly parallel because strings aren't containers of numbers, they don't have & and |, and there isn't a reason to want a / operation, but it does suggest that signature variation might not be problematic. BTW, do you just want __mul__ and __rmul__? If those went in, presumably there will be a request to support __imul__ because otherwise c*=3 would still work but would be inefficient (that was the rationale for adding inplace variants for all the current arithmetic operators). Likewise, presumably someone would legitimately want __div__ to support the normalization use case. Perhaps less likely, there would be also be a request for __floordiv__ to allow exactly scaled results to stay in the domain of integers. Which if any of these makes sense to you? Also, any thoughts on the cleanest way to express the computation of a chi-squared statistic (for example, to compare observed first digit frequencies to the frequencies predicted by Benford's Law)? This isn't an arbitrary question (it came up when a professor first proposed a variant of this idea a few years ago). Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] collections.Counter should implement __mul__, __rmul__
> On Apr 15, 2018, at 7:18 PM, Wes Turnerwrote: > > And I'm done sharing non-pure-python solutions for this problem, I promise Keep them coming :-) Thanks for the research. It helps to remind ourselves that almost none of our problems are new :-) Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] collections.Counter should implement __mul__, __rmul__
> On Apr 15, 2018, at 5:44 PM, Peter Norvigwrote: > > If you think of a Counter as a multiset, then it should support __or__, not > __add__, right? FWIW, Counter is explicitly documented to support the four multiset-style mathematical operations discussed in Knuth TAOCP Volume II section 4.6.3 exercise 19: >>> c = Counter(a=3, b=1) >>> d = Counter(a=1, b=2) >>> c + d # add two counters together: c[x] + d[x] Counter({'a': 4, 'b': 3}) >>> c - d # saturating subtraction (keeping only positive >>> counts) Counter({'a': 2}) >>> c & d # intersection: min(c[x], d[x]) Counter({'a': 1, 'b': 1}) >>> c | d # union: max(c[x], d[x]) Counter({'a': 3, 'b': 2}) The wikipedia article on Multisets lists a further operation, inclusion, that is not currently supported: https://en.wikipedia.org/wiki/Multiset#Basic_properties_and_operations > I do think it would have been fine if Counter did not support "+" at all > (and/or if Counter was limited to integer values). But given where we are > now, it feels like we should preserve `c + c == 2 * c`. The + operation has legitimate use cases (it is perfectly reasonable to want to combine the results two separate counts). And, as you pointed out, it is what we already have and cannot change :-) So, the API design issue that confronts us is that it would be a bit weird and disorienting for the arithmetic operators to have two different signatures: += -= *= /= Also, we should respect the comments given by others on the tracker issue. In particular, there is a preference to not have an in-place operation and only allow a new counter instance to be created. That will help people avoid data structure modality problems: . c[category] += 1 # Makes sense during the frequency counting or accumulation phase c /= c.total # Covert to a probability mass function c[category] += 1 # This code looks correct but no longer makes any sense > As to the "doesn't really add any new capabilities" argument, that's true, > but it is also true for Counter as a whole: it doesn't add much over > defaultdict(int), but it is certainly convenient to have a standard way to do > what it does. IIRC, the defaultdict(int) in your first version triggered a bug because the model inadvertently changed during the analysis phase rather than being frozen after the training phase. The Counter doesn't suffer from the same issue (modifying the dict on a failed lookup). Also, the Counter class does have a few value added features: Counter(iterable), c.most_common(), c.elements(), etc. But yes, at its heart the counter is mostly just a specialized dictionary. The thought I was trying to express is that suggestions to build out Counter API are a little less compelling when we already have a way to do it that is flexible, fast, clear, and standard (i.e. dict comprehensions). > I agree with your intuition that low level is better. `total` would be > useful. If you have total and mul, then as you and others have pointed out, > normalize is just c *= 1/c.total. I fully support adding some functionality for scaling to support probability distributions, bayesian update steps, chi-square tests, and whatnot. The people who need convincing are the other respondents on the tracker. They had a strong mental model for the Counter class that is somewhat at odds with this proposal. Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] collections.Counter should implement __mul__, __rmul__
> On Apr 15, 2018, at 2:05 PM, Peter Norvigwrote: > > For most types that implement __add__, `x + x` is equal to `2 * x`. > > ... > > > That is true for all numbers, list, tuple, str, timedelta, etc. -- but not > for collections.Counter. I can add two Counters, but I can't multiply one by > a scalar. That seems like an oversight. If you view the Counter as a sparse associative array of numeric values, it does seem like an oversight. If you view the Counter as a Multiset or Bag, it doesn't make sense at all ;-) >From an implementation point of view, Counter is just a kind of dict that has >a __missing__() method that returns zero. That makes it trivially easy to >subclass Counter to add new functionality or just use dictionary >comprehensions for bulk updates. > > > It would be worthwhile to implement multiplication because, among other > reasons, Counters are a nice representation for discrete probability > distributions, for which multiplication is an even more fundamental operation > than addition. There is an open issue on this topic. See: https://bugs.python.org/issue25478 One stumbling point is that a number of commenters are fiercely opposed to non-integer uses of Counter. Also, some of the use cases (such as those found in Allen Downey's "Think Stats" and "Think Bayes" books) also need division and rescaling to a total (i.e. normalizing the total to 1.0) for a probability mass function. If the idea were to go forward, it still isn't clear whether the correct API should be low level (__mul__ and __div__ and a "total" property) or higher level (such as a normalize() or rescale() method that produces a new Counter instance). The low level approach has the advantage that it is simple to understand and that it feels like a logical extension of the __add__ and __sub__ methods. The downside is that doesn't really add any new capabilities (being just short-cuts for a simple dict comprehension or call to c.values()). And, it starts to feature creep the Counter class further away from its core mission of counting and ventures into the realm of generic sparse arrays with numeric values. There is also a learnability/intelligibility issue in __add__ and __sub__ correspond to "elementwise" operations while __mul__ and __div__ would be "scalar broadcast" operations. Peter, I'm really glad you chimed in. My advocacy lacked sufficient weight to move this idea forward. Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]
> On Apr 8, 2018, at 6:43 PM, Tim Peterswrote: > >> My other common case for accumulate() is building cumulative >> probability distributions from probability mass functions (see the >> code for random.choice() for example, or typical code for a K-S test). > > So, a question: why wasn't itertools.accumulate() written to accept > iterables of _only_ numeric types? Akin to `sum()`. I gather from > one of Nick's messages that it was so restricted in 3.2. Then why was > it generalized to allow any 2-argument function? Prior to 3.2, accumulate() was in the recipes section as pure Python code. It had no particular restriction to numeric types. I received a number of requests for accumulate() to be promoted to a real itertool (fast, tested, documented C code with a stable API). I agreed and accumulate() was added to itertools in 3.2. It worked with anything supporting __add__, including str, bytes, lists, and tuples. More specifically, accumulate_next() called PyNumber_Add() without any particular type restriction. Subsequently, I got requests to generalize accumulate() to support any arity-2 function (with operator.mul offered as the motivating example). Given that there were user requests and there were ample precedents in other languages, I acquiesced despite having some reservations (if used with a lambda, the function call overhead might make accumulate() slower than a plain Python for-loop without the function call). So, that generalized API extension went into 3.3 and has remained unchanged ever since. Afterwards, I was greeted with the sound of crickets. Either it was nearly perfect or no one cared or both ;-) It remains one of the least used itertools. > Given that it was, `sum()` is no longer particularly relevant: the > closest thing by far is now `functools.reduce()`, which does support > an optional `initial` argument. Which it really should, because it's > impossible for the implementation to guess a suitable starting value > for an arbitrary user-supplied dyadic function. > > My example using accumulate() to generate list prefixes got snipped, > but same thing there: it's impossible for that snippet to work unless > an empty list is supplied as the starting value. And it's impossible > for the accumulate() implementation to guess that. Honestly, I couldn't immediately tell what this code was doing: list(accumulate([8, 4, "k"], lambda x, y: x + [y], first_result=[])) This may be a case where a person would be better-off without accumulate() at all. > In short, for _general_ use `accumulate()` needs `initial` for exactly > the same reasons `reduce()` needed it. The reduce() function had been much derided, so I've had it mentally filed in the anti-pattern category. But yes, there may be wisdom there. > BTW, the type signatures on the scanl (requires an initial value) and > scanl1 (does not support an initial value) implementations I pasted > from Haskell's Standard Prelude give a deeper reason: without an > initial value, a list of values of type A can only produce another > list of values of type A via scanl1. The dyadic function passed must > map As to As. But with an initial value supplied of type B, scanl can > transform a list of values of type A to a list of values of type B. > While that may not have been obvious in the list prefix example I > gave, that was at work: a list of As was transformed into a list _of_ > lists of As. That's impossible for scanl1 to do, but easy for scanl. Thanks for pointing that out. I hadn't considered that someone might want to transform one type into another using accumulate(). That is pretty far from my mental model of what accumulate() was intended for. Also, I'm still not sure whether we would want code like that buried in an accumulate() call rather than as a regular for-loop where I can see the logic and trace through it with pdb. As for scanl, I'm not sure what this code means without seeing some python equivalent. scanl:: (a -> b -> a) -> a -> [b] -> [a] scanl f q xs = q : (case xs of [] -> [] x:xs -> scanl f (f q x) xs) scanl1 :: (a -> a -> a) -> [a] -> [a] scanl1 f (x:xs) = scanl f x xs scanl1 _ [] = [] > Or, in short, someone coming from a typed functional language > background sees all sorts of things that rarely (if ever) come up in > number-crunching languages. Their sensibilities should count too - > although not much ;-) They should get _some_ extra consideration in > this context, though, because `itertools` is one of the first things > they dig into when they give Python a try. I concur. >> and it would have been distracting to even had the option. > > Distracting for how long? One second or two? ;-) Possibly forever. In my experience, if a person initially frames a problem wrong (or perhaps in a hard to solve way), it can
Re: [Python-ideas] Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]
> On Apr 8, 2018, at 12:22 PM, Tim Peterswrote: > > [Guido] >> Well if you can get Raymond to agree on that too I suppose you can go ahead. >> Personally I'm -0 but I don't really write this kind of algorithmic code >> enough to know what's useful. > > Actually, you do - but you don't _think_ of problems in these terms. > Neither do I. For those who do: consider any program that has state > and responds to inputs. When you get a new input, the new state is a > function of the existing state and the input. The Bayesian world view isn't much different except they would prefer "prior" instead of "initial" or "start" ;-) my_changing_beliefs = accumulate(stream_of_new_evidence, bayes_rule, prior=what_i_used_to_think) Though the two analogies are cute, I'm not sure they tell us much. In running programs or bayesian analysis, we care more about the result rather than the accumulation of intermediate results. My own experience with actually using accumulations in algorithmic code falls neatly into two groups. Many years ago, I used APL extensively in accounting work and my recollection is that a part of the convenience of "\+" was that the sequence length didn't change (so that the various data arrays could interoperate with one another). My other common case for accumulate() is building cumulative probability distributions from probability mass functions (see the code for random.choice() for example, or typical code for a K-S test). For neither of those use case categories did I ever want an initial value and it would have been distracting to even had the option. For example, when doing a discounted cash flow analysis, I was taught to model the various flows as a single sequence of up and down arrows rather than thinking of the initial balance as a distinct concept¹ Because of this background, I was surprised to have the question ever come up at all (other than the symmetry argument that sum() has "start" so accumulate() must as well). When writing itertools.accumulate(), I started by looking to see what other languages had done. Since accumulate() is primarily a numerical tool, I expected that the experience of numeric-centric languages would have something to teach us. My reasoning was that if the need hadn't arisen for APL, R, Numpy, Matlab², or Mathematica, perhaps it really was just noise. My views may be dated though. Looking at the wheel sieve and collatz glide record finder, I see something new, a desire to work with lazy, potentially infinite accumulations (something that iterators do well but almost never arises in the world of fixed-length sequences or cumulative probability distributions). So I had been warming up to the idea, but got concerned that Nick could have had such a profoundly different idea about what the code should do. That cooled my interest a bit, especially when thinking about two key questions, "Will it create more problems than it solves?" and "Will anyone actually use it?". Raymond ¹ http://www.chegg.com/homework-help/questions-and-answers/solve-present-worth-cash-flow-shown-using-three-interest-factors-10-interest-compounded-an-q878034 ² https://www.mathworks.com/help/matlab/ref/accumarray.html ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]
> On Apr 6, 2018, at 9:06 PM, Tim Peterswrote: > >> >>What is this code trying to accomplish? > > It's quite obviously trying to bias the reader against the proposal by > presenting a senseless example ;-) FWIW, the example was not from me. It was provided by the OP on the tracker. I changed the start point from 10 to a 6 so it at least made some sense as the continuation of a factorial sequence: 6 24 120 > By sheer coincidence, I happened to write another yesterday. This is > from a program looking for the smallest integers that yield new > records for Collatz sequence lengths. Nice. That brings the number of real-world examples up to a total of three (collatz, wheel sieve, and signal processing). Prior to today, that total was only one (which was found after much digging). > Later: > >def coll(SHIFT=24): >... >from itertools import accumulate, chain, cycle >... >LIMIT = 1 << SHIFT >... >abc, first, deltas = buildtab(SHIFT, LIMIT) >... >for num in accumulate(chain([first], cycle(deltas))): >assert num % 3 != 2 > > As in Will's code, it would be more readable as: > >for num in accumulate(cycle(deltas), start=first): That does read better. I am curious how you would have written it as a plain for-loop before accumulate() was added (part of the argument against reduce() was that a plain for-loop would be clearer 99% of the time). > That said, if the need came up often, as you noted it's dead easy to > write a helper function to encapsulate the "head scratcher" part, and > with no significant loss of efficiency. > > So I'd be -0 overall, _except_ that "chain together a singleton list > and a cycle" is so obscure on the face of it than I'm not sure most > programmers who wanted the functionality of `start=` would ever think > of it. I'm not sure that I would have, except that I studied Ness's > wheel sieve code a long time ago and the idea stuck. So that makes me > +0.4. Agreed that the "chain([x], it)" step is obscure. That's a bit of a bummer -- one of the goals for the itertools module was to be a generic toolkit for chopping-up, modifying, and splicing iterator streams (sort of a CRISPR for iterators). The docs probably need another recipe to show this pattern: def prepend(value, iterator): "prepend(1, [2, 3, 4]) -> 1 2 3 4" return chain([value], iterator) Thanks for taking a look at the proposal. I was -0 when it came up once before. Once I saw a use case pop-up on this list, I thought it might be worth discussing again. Raymond ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/