[Python-Dev] PEP 411: Provisional packages in the Python standard library
Eli Bendersky wrote (in http://mail.python.org/pipermail/python-dev/2012-February/116393.html ): A package will be marked provisional by including the following paragraph as a note at the top of its documentation page: I really would like some marker available from within Python itself. Use cases: (1) During development, the documentation I normally read first is whatever results from import module; help(module), or possibly dir(module). (2) At BigCorp, there were scheduled times to move as much as possible to the current (or current-1) version. Regardless of policy, full regression test suites don't generally exist. If Python were viewed as part of the infrastructure (rather than as part of a specific application), or if I were responsible for maintaining an internal application built on python, that would be the time to upgrade python -- and I would want an easy way to figure out which applications and libraries I should concentrate on for testing. * Encapsulation of the import state (PEP 368) Wrong PEP number. I'm guessing that you meant 406. -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
PEP author Victor asked (in http://mail.python.org/pipermail/python-dev/2012-February/116499.html): Maybe I missed the answer, but how do you handle timestamp with an unspecified starting point like os.times() or time.clock()? Should we leave these function unchanged? If *all* you know is that it is monotonic, then you can't -- but then you don't really have resolution either, as the clock may well speed up or slow down. If you do have resolution, and the only problem is that you don't know what the epoch was, then you can figure that out well enough by (once per type per process) comparing it to something that does have an epoch, like time.gmtime(). -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP for new dictionary implementation
PEP author Mark Shannon wrote (in http://mail.python.org/pipermail/python-dev/attachments/20120208/05be469a/attachment.txt): ... allows ... (the ``__dict__`` attribute of an object) to share keys with other attribute dictionaries of instances of the same class. Is the same class a deliberate restriction, or just a convenience of implementation? I have often created subclasses (or even families of subclasses) where instances (as opposed to the type) aren't likely to have additional attributes. These would benefit from key-sharing across classes, but I grant that it is a minority use case that isn't worth optimizing if it complicates the implementation. By separating the keys (and hashes) from the values it is possible to share the keys between multiple dictionaries and improve memory use. Have you timed not storing the hash (in the dict) at all, at least for (unicode) str-only dicts? Going to the string for its own cached hash breaks locality a bit more, but saves 1/3 of the memory for combined tables, and may make a big difference for classes that have relatively few instances. Reduction in memory use is directly related to the number of dictionaries with shared keys in existence at any time. These dictionaries are typically half the size of the current dictionary implementation. How do you measure that? The limit for huge N across huge numbers of dicts should be 1/3 (because both hashes and keys are shared); I assume that gets swamped by object overhead in typical small dicts. If a table is split the values in the keys table are ignored, instead the values are held in a separate array. If they're just dead weight, then why not use them to hold indices into the array, so that values arrays only have to be as long as the number of keys, rather than rounding them up to a large-enough power-of-two? (On average, this should save half the slots.) A combined-table dictionary never becomes a split-table dictionary. I thought it did (at least temporarily) as part of resizing; are you saying that it will be re-split by the time another thread is allowed to see it, so that it is never observed as combined? Given that this optimization is limited to class instances, I think there should be some explanation of why you didn't just automatically add slots for each variable assigned (by hard-coded name) within a method; the keys would still be stored on the type, and array storage could still be used for the values; the __dict__ slot could initially be a NULL pointer, and instance dicts could be added exactly when they were needed, covering only the oddball keys. I would reword (or at least reformat) the Cons section; at the moment, it looks like there are four separate objections, and seems to be a bit dismissive towards backwards copmatibility. Perhaps something like: While this PEP does not change any documented APIs or invariants, it does break some de facto invariants. C extension modules may be relying on the current physical layout of a dictionary. That said, extensions which rely on internals may already need to be recompiled with each feature release; there are already changes planned for both Unicode (for efficiency) and dicts (for security) that would require authors of these extensions to at least review their code. Because iteration (and repr) order can depend on the order in which keys are inserted, it will be possible to construct instances that iterate in a different order than they would under the current implementation. Note, however, that this will happen very rarely in code which does not deliberately trigger the differences, and that test cases which rely on a particular iteration order will already need to be corrected in order to take advantage of the security enhancements being discussed under hash randomization, or for use with Jython and PyPy. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Store timestamps as decimal.Decimal objects
In http://mail.python.org/pipermail/python-dev/2012-February/116073.html Nick Coghlan wrote: Besides, float128 is a bad example - such a type could just be returned directly where we return float64 now. (The only reason we can't do that with Decimal is because we deliberately don't allow implicit conversion of float values to Decimal values in binary operations). If we could really replace float with another type, then there is no reason that type couldn't be a nearly trivial Decimal subclass which simply flips the default value of the (never used by any caller) allow_float parameter to internal function _convert_other. Since decimal inherits straight from object, this subtype could even be made to inherit from float as well, and to store the lower- precision value there. It could even produce the decimal version lazily, so as to minimize slowdown on cases that do not need the greater precision. Of course, that still doesn't answer questions on whether the higher precision is a good idea ... -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] plugging the hash attack
In http://mail.python.org/pipermail/python-dev/2012-January/116003.html Benjamin Peterson wrote: 2. It will be off by default in stable releases ... This will prevent code breakage ... 2012/1/27 Steven D'Aprano steve at pearwood.info: ... it will become on by default in some future release? On Fri, Jan 27, 2012, Benjamin Peterson benjamin at python.org wrote: Yes, 3.3. The solution in 3.3 could even be one of the more sophisticated proposals we have today. Brett Cannon (Mon Jan 30) wrote: I think that would be good. And I would even argue we remove support for turning it off to force people to no longer lean on dict ordering as a crutch (in 3.3 obviously). Turning it on by default is fine. Removing the ability to turn it off is bad. If regression tests fail with python 3, the easiest thing to do is just not to migrate to python 3. Some decisions (certainly around unittest, but I think even around hash codes) were settled precisely because tests shouldn't break unless the functionality has really changed. Python 3 isn't yet so dominant as to change that tradeoff. I would go so far as to add an extra step in the porting recommendations; before porting to python 3.x, run your test suite several times with hash randomization turned on; any failures at this point are relying on formally undefined behavior and should be fixed, but can *probably* be fixed just by wrapping the results in sorted. (I would offer a patch to the porting-to-py3 recommendation, except that I couldn't find any not associated specifically with 3.0) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Counting collisions for the win
In http://mail.python.org/pipermail/python-dev/2012-January/115715.html Frank Sievertsen wrote: Am 20.01.2012 13:08, schrieb Victor Stinner: I'm surprised we haven't seen bug reports about it from users of 64-bit Pythons long ago A Python dictionary only uses the lower bits of a hash value. If your dictionary has less than 2**32 items, the dictionary order is exactly the same on 32 and 64 bits system: hash32(str) mask == hash64(str) mask for mask= 2**32-1. No, that's not true. Whenever a collision happens, other bits are mixed in very fast. Frank Bits are mixed in quickly from a denial-of-service standpoint, but Victor is correct from a Why don't the tests already fail? standpoint. A dict with 2**12 slots, holding over 2700 entries, will be far larger than most test cases -- particularly those with visible output. In a dict that size, 32-bit and 64-bit machines will still probe the same first, second, third, fourth, fifth, and sixth slots. Even on the rare cases when there are at least 6 collisions, the next slots may well be either the same, or close enough that it doesn't show up in a changed iteration order. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 414 - Unicode Literals for Python 3
In http://mail.python.org/pipermail/python-dev/2012-February/116953.html Terry J. Reedy wrote: I presume that most 2.6 code has problems other than u'' when attempting to run under 3.x. Why? If you're talking about generic code that has seen minimal changes since 2.0, sure. But I think this request is specifically for projects that are thinking about python 3, but are trying to use a single source base regardless of version. Using an automatic translation step means that python (or at least python 3) would no longer be the actual source code. I've worked with enough generated source code in other languages that it is worth some pain to avoid even a slippery slope. By the time you drop 2.5, the subset language is already pretty good; if I have to write something version-specific, I prefer to treat that as a sign that I am using the wrong approach. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Add a frozendict builtin type
In http://mail.python.org/pipermail/python-dev/2012-February/116955.html Victor Stinner proposed: The blacklist implementation has a major issue: it is still possible to call write methods of the dict class (e.g. dict.set(my_frozendict, key, value)). It is also possible to use ctypes and violate even more invariants. For most purposes, this falls under consenting adults. The whitelist implementation has an issue: frozendict and dict are not compatible, dict is not a subclass of frozendict (and frozendict is not a subclass of dict). And because of Liskov substitutability, they shouldn't be; they should be sibling children of a basedict that doesn't have the the mutating methods, but also doesn't *promise* not to mutate. * frozendict values must be immutable, as dict keys Why? That may be useful, but an immutable dict whose values might mutate is also useful; by forcing that choice, it starts to feel too specialized for a builtin. * Add an hash field to the PyDictObject structure That is another indication that it should really be a sibling class; most of the uses I have had for immutable dicts still didn't need hashing. It might be a worth adding anyhow, but only to immutable dicts -- not to every instance dict or keywords parameter. * frozendict.__hash__ computes hash(frozenset(self.items())) and caches the result is its private hash attribute Why? hash(frozenset(selk.keys())) would still meet the hash contract, but it would be approximately twice as fast, and I can think of only one case where it wouldn't work just as well. (That case is wanting to store a dict of alternative configuration dicts (with no defaulting of values), but ALSO wanting to use the configurations themselves (as opposed to their names) as the dict keys.) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 414 - Unicode Literals for Python 3
In http://mail.python.org/pipermail/python-dev/2012-February/117070.html Vinay Sajip wrote: It's moot, but as I see it: the purpose of PEP 414 is to facilitate a single codebase across 2.x and 3.x. However, it only does this if your 3.x interest is 3.3+ For many people -- particularly those who haven't ported yet -- 3.x will mean 3.3+. There are some who will support 3.2 because it is a LTS release on some distribution, just as there were some who supported Python 1.5 (but not 1.6) long into the 2.x cycle, but I expect them to be the minority. I certainly don't expect 3.2 to remain a primary development target, the way that 2.7 is. IIRC, the only ways to use 3.2 even today are: (a) Make an explicit choice to use something other than the default (b) Download directly and choose 3.x without OS support (c) Use Arch Linux These are the sort of people who can be expected to upgrade. Now also remember that we're talking specifically about projects that have *not* been ported to 3.x (== no existing users to support), and that won't be ported until 3.2 is already in maintenance mode. If you also want to or need to support 3.0 - 3.2, it makes your workflow more painful, Compared to dropping 3.2, yes. Compared to supporting 3.2 today? I don't see how. because you can't run tests on 2.x or 3.3 and then run them on 3.2 without an intermediate source conversion step - just like the 2to3 step that people find painful when it's part of maintenance workflow, and which in part prompted the PEP in the first place. So the only differences compared to today are that: (a) Fewer branches are after the auto-conversion. (b) No current branches are after the auto-conversion. (c) The auto-conversion is much more limited in scope. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 416: Add a frozendict builtin type
In http://mail.python.org/pipermail/python-dev/2012-February/117113.html Victor Stinner posted: An immutable mapping can be implemented using frozendict:: class immutabledict(frozendict): def __new__(cls, *args, **kw): # ensure that all values are immutable for key, value in itertools.chain(args, kw.items()): if not isinstance(value, (int, float, complex, str, bytes)): hash(value) # frozendict ensures that all keys are immutable return frozendict.__new__(cls, *args, **kw) What is the purpose of this? Is it just a hashable frozendict? If it is for security (as some other messages suggest), then I don't think it really helps. class Proxy: def __eq__(self, other): return self.value == other def __hash__(self): return hash(self.value) An instance of Proxy is hashable, and the hash is not object.hash, but it is still mutable. You're welcome to call that buggy, but a secure sandbox will have to deal with much worse. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] [RELEASED] Python 3.3.0 alpha 1
In http://mail.python.org/pipermail/python-dev/2012-March/117348.html Georg Brandl ge...@python.org posted: Python 3.3 includes a range of improvements of the 3.x series, as well as easier porting between 2.x and 3.x. Major new features in the 3.3 release series are: As much as it is nice to just celebrate improvements, I think readers (particularly on the download page http://www.python.org/download/releases/3.3.0/ ) would be better served if there were an additional point about porting and the hash changes. http://docs.python.org/dev/whatsnew/3.3.html#porting-to-python-3-3 also failed to mention this, and even the changelog didn't seem to warn people about failing tests or tell them how to work around it. Perhaps something like: Hash Randomization (issue 13703) is now on by default. Unfortunately, this does break some tests; it can be temporarily turned off by setting the environment variable PYTHONHASHSEED to 0 before launching python. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] problem with recursive yield from delegation
http://mail.python.org/pipermail/python-dev/2012-March/117396.html Stefan Behnel posted: I found a problem in the current yield from implementation ... [paraphrasing] g1 yields from g2 g2 yields from g1 XXX python follows the existing delegation without checking re-entrancy g2 (2nd call) checks re-entrancy, and raises an exception g1 (2nd call) gets to handle the exception, and doesn't g2 (1st call) gets to handle the exception, and does How is this a problem? Re-entering a generator is a bug. Python caught it and raised an appropriate exception. It would be nice if python caught the generator cycle as soon as it was created, just as it would be nice if reference cycles were collected as soon as they became garbage. But python doesn't promise to catch cycles immediately, and the checks required to do so would slow down all code, so in practice the checks are delayed. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Adding a builtins parameter to eval(), exec() and __import__().
http://mail.python.org/pipermail/python-dev/2012-March/117395.html Brett Cannon posted: [in reply to Mark Shannon's suggestion of adding a builtins parameter to match locals and globals] It's a mess right now to try to grab the __import__() implementation and this would actually help clarify import semantics by saying that __import__() for any chained imports comes from __import__()s locals, globals, or builtins arguments (in that order) or from the builtins module itself (i.e. tstate-builtins). How does that differ from today? If you're saying that the locals and (module-level) globals aren't always checked in order, then that is a semantic change. Probably a good change, but still a change -- and it can be made indepenently of Mark's suggestion. Also note that I would assume this was for sandboxing, and that missing names should *not* fall back to the real globals, although I would understand if bootstrapping required the import statement to get special treatment. (Note that I like Mark's proposed change; I just don't see how it cleans up import.) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python install layout and the PATH on win32
In view-source:http://mail.python.org/pipermail/python-dev/2012-March/117586.html van.lindberg at gmail.com posted: 1) The layout for the python root directory for all platforms should be as follows: stdlib = {base/userbase}/lib/python{py_version_short} platstdlib = {base/userbase}/lib/python{py_version_short} purelib = {base/userbase}/lib/python{py_version_short}/site-packages platlib = {base/userbase}/lib/python{py_version_short}/site-packages include = {base/userbase}/include/python{py_version_short} scripts = {base/userbase}/bin data = {base/userbase} Why? Pure python vs compiled C doesn't need to be separated at the directory level, except for cleanliness. Some (generally unix) systems prefer to split the libraries into several additional pieces depending on CPU architecture. The structure listed above doesn't have a location for docs. Some packages (such as tcl) may be better off in their own area. What is data? Is this an extra split compared to today, or does it refer to things like LICENSE.txt, README.txt, and NEWS.txt? And even once I figure out where files have moved, and assume that the split is perfect -- what does this buy me over the current situation? I was under the impression that programs like distutils already handled finding the appropriate directories for a program; if you're rewriting that logic, you're just asking for bugs on a strange platform that you don't use. If you're looking for things interactively, then platform conventions are probably more important than consistency across platforms. If you disagree, you are welcome to reorganize your personal linux installation so that it matches windows, and see whether it causes you any problems. ... We *already* have this. The only difference in this proposal is that we go from py_version_nodot to py_version_short, i.e. from c:\python33\lib\python33 to c:\python33\lib\python3.3 I have not seen that redundancy before on windows. I'm pretty sure that it is a relic of your Linux provider wanting to support multiple python versions using shared filesystems. The Windows standard is to use a local disk, and to bundle it all up into its own directory, similar to the way that java apps sometimes ship with their own JVM. Also note that using the dot in a directory name is incautious. I haven't personally had trouble in several years, but doing so is odd enough that some should be expected. Python already causes some grief by not installing in Program Files, but that is at least justified by the spaces in filenames problem; what is the advantange of 3.3? I'm using windows, and I just followed the defaults at installation. It is possible that the installer continued to do something based on an earlier installation, but I don't think this machine has ever had a customized installation of any python version. C:\python32\* Everything is under here; I assume {base/userbase} would be set to C:\python32 As is customary for windows, the base directory contains the license/readme/news and all executables that the user is expected to use directly. (python.exe, pythonw.exe. It also contains w9xpopen.exe that users do not use, but that too is fairly common.) There is no data directory. Subdirectories are: C:\python32\DLLs In additional to regular DLL files, it contains .pyd files and icons. It looks like modules from the stdlib that happen to be written in C. Most users will never bother to look here. C:\python32\Doc A .chm file; full html would be fine too, but removing it would be a bad idea. C:\python32\include These are the header files, though most users will never have any use for them, as there isn't generally a compiler. C:\python32\Lib The standard library -- or at least the portion implemented in python. Note that site-packages is a subdirectory here. It doesn't happen to have an __init__.py, but to an ordinary user it looks just like any other stdlib package, such as xml or multiprocessing. I personally happen to keep things in subdirectories of site-packages, but I can't say what is standard. Moving site-packages out of the Lib directory might make sense, but probably isn't worth the backward compatibility hit. C:\python32\libs .lib files. I'm not entirely sure what these (as opposed to the DLLs) are for; lib files aren't that common on windows. My machine does not appear to have any that aren't associated with cross-platform tools or unix emulation. C:\python32\tcl Note that this is in addition to associated files under DLLs and libs. I would prefer to see them in one place, but moving it in with non-tcl stuff would not be an improvement. Most users will never look (or care); those that do usually appreciate knowing that, for example, the dde subdirectory is for tcl. C:\python32\Tools This has three subdirectories (i18n,
[Python-Dev] Python install layout and the PATH on win32
In http://mail.python.org/pipermail/python-dev/2012-March/117617.html van.lindberg at gmail.com posted: As noted earlier in the thread, I also change my proposal to maintain the existing differences between system installs and user installs. [Wanted lower case, which should be irrelevant; sysconfig.get_python_inc already assumes lower case despite the configuration file.] [Wanted bin instead of Scripts, even though they aren't binaries.] If there are to be any changes, I *am* tempted to at least harmonize the two install types, but to use the less redundant system form. If the user is deliberately trying to hide that it is version 33 (or even that it is python), then so be it; defaulting to redundant information is not an improvement. Set the base/userbase at install time, with defaults of base = %SystemDrive%\{py_version_nodot} userbase = %USERPROFILE%\Application Data\{py_version_nodot} usedbase = base for system installs; userbase for per-user installs. Then let the rest default to subdirectories; sysconfig.get_config_vars on windows explicitly doesn't provide as many variables as unix, just INCLUDEPY (which should default to {usedbase}/include) and LIBDEST and BINLIBDEST (both of which should default to {usedbase}/lib). And no, I'm not forgetting data or scripts. As best I can tell, sysconfig doesn't actually expose them, and there is no Scripts directory on my machine (except inside Tools). Perhaps some installers create it when they install their own extensions? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Docs of weak stdlib modules should encourage exploration of 3rd-party alternatives
In http://mail.python.org/pipermail/python-dev/2012-March/117570.html Steven D'Aprano posted: Need is awfully strong. I don't believe it is the responsibility of the standard library to be judge and reviewer of third party packages that it doesn't control. It is, however, user-friendly to indicate when the stdlib selections are particularly likely to be for reasons other than A bunch of experts believe this is the best way to do this. Cpython's documentation is (de facto) the documentation for python in general, and pointing people towards other resources (particularly pypi itself) is quite reasonable. Many modules are in the stdlib in part because they are an *acceptable* way of doing something, and the best ways are either changing too quickly or are so complicated that it doesn't make sense to burden the *standard* libary for specialist needs. In those cases, I do think the documentation should say so. Specific examples: http://docs.python.org/library/numeric.html quite reasonably has subsections only for what ships with Python. But I think the introductory paragraph could stand to have an extra sentence explaining why and when people should look beyond the stanard library, such as: Applications centered around mathematics may benefit from specialist 3rd party libraries, such as numpy http://pypi.python.org/pypi/numpy/ , gmpy http://pypi.python.org/pypi/gmpy , and scipy http://pypi.python.org/pypi/scipy . I would add a similar sentence to the web section, or the internet protocols section if web is still not broken out separately. http://docs.python.org/dev/library/internet.html Note that some web conventions are still evolving too quickly for covenient encapsulation in a stable library. Many applications will therefore prefer functional replacements from third parties, such as requests or httplib2, or frameworks such as Django and Zope. www-related products can be found by browsing PyPI for top internet subtopic www/http. http://pypi.python.org/pypi?:action=browsec=319c=326 [I think that searching by classifier -- which first requires browse, and can't be reached from the list of classifiers -- could be improved.] Should we recommend wxPython over Pyjamas or PyGUI or PyGtk? Actually, I think the existing http://docs.python.org/library/othergui.html does a pretty good job; I would not object to adding mentions of other tools as well, but wiki reference is probably sufficient. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Issue #10278 -- why not just an attribute?
In http://mail.python.org/pipermail/python-dev/2012-March/117762.html Georg Brandl posted: + If available, a monotonic clock is used. By default, if *strict* is False, + the function falls back to another clock if the monotonic clock failed or is + not available. If *strict* is True, raise an :exc:`OSError` on error or + :exc:`NotImplementedError` if no monotonic clock is available. This is not clear to me. Why wouldn't it raise OSError on error even with strict=False? Please clarify which exception is raised in which case. Passing strict as an argument seems like overkill since it will always be meaningless on some (most?) platforms. Why not just use a function attribute? Those few users who do care can check the value of time.steady.monotonic before calling time.steady(); exceptions raised will always be whatever the clock actually raises. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Rename time.steady(strict=True) to time.monotonic()?
In http://mail.python.org/pipermail/python-dev/2012-March/118024.html Steven D'Aprano wrote: What makes this steady, given that it can be adjusted and it can go backwards? It is best-effort for steady, but putting best in the name would be an attractive nuisance. Is steady() merely a convenience function to avoid the user having to write something like this? try: mytimer = time.monotonic except AttributeError: mytimer = time.time That would still be worth doing. But I think the main point is that the clock *should* be monotonic, and *should* be as precise as possible. Given that it returns seconds elapsed (since an undefined start), perhaps it should be time.seconds() or even time.counter() -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] time.clock_info() field names
In http://mail.python.org/pipermail/python-dev/2012-April/119134.html Benjamin Peterson wrote: I see PEP 418 gives time.clock_info() two boolean fields named is_monotonic and is_adjusted. I think the is_ is unnecessary and a bit ugly, and they could just be renamed monotonic and adjusted. I agree with monotonic, but I think it should be adjustable. To me, adjusted and is_adjusted both imply that an adjustment has already been made; adjustable only implies that it is possible. I do remember concerns (including Stephen J. Turnbull's CAL_0O19nmi0+zB+tV8poZDAffNdTnohxo9y5dbw+E2q=9rx...@mail.gmail.com ) that adjustable should imply at least a list of past adjustments, and preferably a way to make an adjustment. I just think that stating it is adjustable (without saying how, or whether and when it already happened) is less wrong than claiming it is already adjusted just in case it might have been. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 362: 4th edition
Summary: *Every* Parameter attribute is optional, even name. (Think of builtins, even if they aren't automatically supported yet.) So go ahead and define some others that are sometimes useful. Instead of defining a BoundArguments class, just return a copy of the Signature, with value attributes added to the Parameters. Use subclasses to distinguish the parameter kind. (Replacing most of the is_ methods from the 3rd version.) [is_]implemented is important information, but the API isn't quite right; even with tweaks, maybe we should wait a version before freezing it on the base class. But I would be happy to have Larry create a Signature for the os.* functions, whether that means a subclass or just an extra instance attribute. I favor passing a class to Signature.format, because so many of the formatting arguments would normally change in parallel. But my tolerance for nested structures may be unusually high. I make some more specific suggestions below. In http://mail.python.org/pipermail/python-dev/2012-June/120305.html Yury Selivanov wrote: A Signature object has the following public attributes and methods: * return_annotation : object The annotation for the return type of the function if specified. If the function has no annotation for its return type, this attribute is not set. This means users must already be prepared to use hasattr with the Signature as well as the Parameters -- in which case, I don't see any harm in a few extra optional properties. I would personally prefer to see the name (and qualname) and docstring, but it would make perfect sense to implement these by keeping a weakref to the original callable, and just delegating there unless/until the properties are explicitly changed. I suspect others will have a use for additional delegated attributes, such as the self of boundmethods. I do agree that __eq__ and __hash__ should depend at most on the parameters (including their order) and the annotation. * parameters : OrderedDict An ordered mapping of parameters' names to the corresponding Parameter objects (keyword-only arguments are in the same order as listed in ``code.co_varnames``). For a specification, that feels a little too tied to the specific implementation. How about: Parameters will be ordered as they are in the function declaration. or even just: Positional parameters will be ordered as they are in the function declaration. because: def f(*, a=4, b=5): pass and: def f(*, b=5, a=4): pass should probably have equal signatures. Wild thought: Instead of just *having* an OrderedDict of Parameters, should a Signature *be* that OrderedDict (with other attributes)? That is, should signature(testfn)[foo] get the foo parameter? * bind(\*args, \*\*kwargs) - BoundArguments Creates a mapping from positional and keyword arguments to parameters. Raises a ``BindError`` (subclass of ``TypeError``) if the passed arguments do not match the signature. * bind_partial(\*args, \*\*kwargs) - BoundArguments Works the same way as ``bind()``, but allows the omission of some required arguments (mimics ``functools.partial`` behavior.) Are those descriptions actually correct? I would expect the mapping to be from parameters (or parameter names) to values extracted from *args and **kwargs. And I'm not sure the current patch does even that, since it seems to instead return a non-Mapping object (but with a mapping attribute) that could be used to re-create *args, **kwargs in canonical form. (Though that canonicalization is valuable for calls; it might even be worth an as_call method.) I think it should be explicit that this mapping does not include parameters which would be filled by default arguments. In fact, if you stick with this interface, I would like a 3rd method that does fill out everything. But I think it would be simpler to just add an optional attribute to each Parameter instance, and let bind fill that in on the copies, so that the return value is also a Signature. (No need for the BoundArguments class.) Then the user can decide whether or not to plug in the defaults for missing values. * format(...) - str Formats the Signature object to a string. Optional arguments allow for custom render functions for parameter names, annotations and default values, along with custom separators. I think it should state explicitly that by default, the return value will be a string that could be used to declare an equivalent function, if Signature were replaced with def funcname. There are enough customization parameters that would often be changed together (e.g., to produce HTML output) that it might make sense to use overridable class defaults -- or even to make format a class itself. I also think it would make sense to delegate formatting the individual parameters to the parameter objects.
[Python-Dev] backported Enum
(On June 19, 2013) Barry Warsaw wrote about porting mailman from flufl.enum to the stdlib.enum: Switching from call syntax to getitem syntax for looking up an enum member by name, e.g. -delivery_mode = DeliveryMode(data['delivery_mode']) +delivery_mode = DeliveryMode[data['delivery_mode']] Switching from getitem syntax to call syntax for looking up an enum member by value, e.g. -return self._enum[value] +return self._enum(value) Interesting that these two were exactly opposite from flufl.enum. Is there a reason why these were reversed? I can sort of convince myself that it makes sense because dicts work better with strings than with ints, but ... it seems like such a minor win that I'm not sure it is worth backwards incompatibility. (Of course, I also don't know how much use stdlib.enum has already gotten with the current syntax.) Switching from int() to .value to get the integer value of an enum member, e.g. -return (member.list_id, member.address.email, int(member.role)) +return (member.list_id, member.address.email, member.role.value) Is just this a style preference? Using a .value attribute certainly makes sense, but I don't see it mentioned in the PEP as even optional, let alone recommended. If you care that the value be specifically an int (as opposed to any object), then a int constructor may be better. [Some additional changes that mean there will be *some* changes, which does reduce the pressure for backwards compatibility.] ... An unexpected difference is that failing name lookups raise a KeyError instead of a ValueError. I could understand either, as well as AttributeError, since the instance that would represent that value isn't a class attribute. Looking at Enum creation, I think ValueError would be better than TypeError for complaints about duplicate names. Was TypeError chosen because it should only happen during setup? I would also not be shocked if some people expect failed value lookups to raise an IndexError, though I expect they would adapt if they get something else that makes sense. Would it be wrong to create an EnumError that subclasses (ValueError, KeyError, AttributeError) and to raise that subclass from everything but _StealthProperty and _get_mixins? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 454 (tracemalloc) disable == clear?
(Tue Oct 29 12:37:52 CET 2013) Victor Stinner wrote: For consistency, you cannot keep traces when tracing is disabled. The free() must be enabled to remove allocated memory blocks, or next malloc() may get the same address which would raise an assertion error (you cannot have two memory blocks at the same address). That seems like an a quirk of the implementation, particularly since the actual address is not returned to the user. Nor do I see any way of knowing when that allocation is freed. Well, unless I missed it... I don't see how to get anything beyond the return value of get_traces, which is a (time-ordered?) list of allocation size with then-current call stack. It doesn't mention any attribute for indicating that some entries are de-allocations, let alone the actual address of each allocation. For the reason explained above, it's not possible to disable the whole module temporarly. Internally, tracemalloc uses a thread-local variable (called the reentrant flag) to disable temporarly tracing allocations in the current thread. It only disables tracing new allocations, deallocations are still proceed. Even assuming the restriction is needed, this just seems to mean that disabling (or filtering) should not affect de-allocation events, for fear of corrupting tracemalloc's internal structures. In that case, I would expect disabling (and filtering) to stop capturing new allocation events for me, but I would still expect tracemalloc to do proper internal maintenance. It would at least explain why you need both disable *and* reset; reset would empty those internal structures, so that tracemalloc could shortcut that maintenance. I would NOT assume that I needed to call reset when changing the filters, nor would I assume that changing them threw out existing traces. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Which direction is UnTransform? / Unicode is different
(Fri Nov 15 16:57:00 CET 2013) Stephen J. Turnbull wrote: Serhiy Storchaka wrote: If the transform() method will be added, I prefer to have only one transformation method and specify a direction by the transformation name (bzip2/unbzip2). Me too. Until I consider special cases like compress, or lower, and realize that there are enough special cases to become a major wart if generic transforms ever became popular. People think about these transformations as en- or de-coding, not transforming, most of the time. Even for a transformation that is an involution (eg, rot13), people have an very clear idea of what's encoded and what's not, and they are going to prefer the names encode and decode for these (generic) operations in many cases. I think this is one of the major stumbling blocks with unicode. I originally disagreed strongly with what Stephen wrote -- but then I realized that all my counterexamples involved unicode text. I can tell whether something is tarred or untarred, zipped or unzipped. But an 8-bit (even Latin-1, let alone ASCII) bytestring really doesn't seem encoded, and it doesn't make sense to decode a perfectly readable (ASCII) string into a sequence of code units. Nor does it help that http://www.unicode.org/glossary/#code_unit defines code unit as The minimal bit combination that can represent a unit of encoded text for processing or interchange. The Unicode Standard uses 8-bit code units in the UTF-8 encoding form, 16-bit code units in the UTF-16 encoding form, and 32-bit code units in the UTF-32 encoding form. (See definition D77 in Section 3.9, Unicode Encoding Forms.) I have to read that very carefully to avoid mentally translating it into Code Units are *en*coded, and there are lots of different complicated encodings that I wouldn't use unless I were doing special processing or interchange. If I'm not using the network, or if my interchange format already looks like readable ASCII, then unicode sure sounds like a complication. I *will* get confused over which direction is encoding and which is decoding. (Removing .decode() from the (unicode) str type in 3 does help a lot, if I have a Python 3 interpreter running to check against.) I'm not sure exactly what implications the above has, but it certainly supports separating the Text Processing from the generic codecs, both in the documentation and in any potential new methods. Instead of relying on introspection of .decodes_to and .encodes_to, it would be useful to have charsetcodecs and tranformcodecs as entirely different modules, with their own separate registries. I will even note that the existing help(codecs) seems more appropriate for charsetcodecs than it does for the current conjoined module. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Python3 complexity - 2 use cases
Steven D'Aprano wrote: I think that heuristics to guess the encoding have their role to play, if the caller understands the risks. Ben Finney wrote: In my opinion, content-type guessing heuristics certainly don't belong in the standard library. It would be great if there were never any need to guess. But in the real world, there is -- and often the user won't know any more than python does. So when it is time to guess, a source of good guesses is an important battery to include. The HTML5 specifications go through some fairly extreme contortions to document what browsers actually do, as opposed to what previous standards have mandated. They don't currently specify how to guess (though I think a draft once tried, since the major browsers all do it, and at the time did it similarly), but the specs do explicitly support such a step, and do provide an implementation note encouraging user-agents to do at least minimal auto-detection. http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#determining-the-character-encoding My own opinion is therefore that Python SHOULD provide better support for both of the following use cases: (1) Treat this file like it came from the web -- including autodetection and even overriding explicit charset declarations for certain charsets. We should explicitly treat autodetection like time zone data -- there is no promise that the right answer (or at least the best guess) won't change, even within a release. I offer no opinion on whether chardet in particular is still too volatile, but the docs should warn that the API is driven by possibly changing external data. (2) Treat this file as ASCII+, where anything non-ASCII will (at most) be written back out unchanged; it doesn't even need to be converted to text. At this time, I don't know whether the right answer is making it easy to default to surrogate-escape for all error-handling, adding more bytes methods, encouraging use of python's latin-1 variant, offering a dedicated (new?) codec, or some new suggestion. I do know that this use case is important, and that python 3 currently looks clumsy compared to python 2. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 460 -- adding explicit assumptions
As best I can tell, some people (apparently including Guido and PEP author Antoine) are taking some assumptions almost for granted, while other people (including me, before Nick's messages) were not assuming them at all. Since these assumptions (or, possibly, rejections of them?) are likely to decide the outcome, the assumptions should be explicit in the PEP. (1) The bytes-related classes do include methods that are only useful when the already-contained data is encoded ASCII. They do not (and will not) include any operations that *require* an encoding assumption. This implies that no non-bytes data can be added without an explicit encoding. (1a) Not even by assuming ASCII with strict error handling. (1b) Not even for numbers, where ASCII/strict really is sufficient. Note that this doesn't rule out a solution where objects (or maybe just numbers and ASCII-kind text) provide their own encoding to bytes -- but that has to be done by the objects themselves, not by the bytes container or by the interpreter. (2) Most python programmers are still in the future. So an API that confuses people who are still learning about Unicode and the text model is bad -- even if it would work fine for those who do already understand it. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Automatic encoding detection [was: Re: Python3 complexity - 2 use cases]
So when it is time to guess [at the character encoding of a file], a source of good guesses is an important battery to include. The barrier for entry to the standard library is higher than mere usefulness. Agreed. But most programs will need it, and people will either include (the same) 3rd-party library themselves, or write their own workaround, or have buggy code *is* sufficient. The points of contention are (1) How many programs have to deal with documents written outside their control -- and probably originating on another system. I'm not ready to say most programs in general, but I think that barrier is met for both web clients (for which we already supply several batteries) and quick-and-dirty utilities. (2) How serious are the bugs / How annoying are the workarounds? As someone who mostly sticks to English, and who tends to manually ignore stray bytes when dealing with a semi-binary file format, the bugs aren't that serious for me personally. So I may well choose to write buggy programs, and the bug may well never get triggered on my own machine. But having a batch process crash one run in ten (where it didn't crash at all under Python 2) is a bad thing. There are environments where (once I knew about it) I would add chardet (if I could get approval for the 3rd-party component). (3) How clearcut is the *right* answer? As I said, at one point (several years ago), the w3c and whatwg started to standardize the right answer. They backed that out, because vendors wanted the option to improve their detection in the future without violating standards. There are certainly situations where local knowledge can do better than a global solution like chardet, but ... the right answer is clear most of the time. Just ignoring the problem is still a 99% answer, because most text in ASCII-mostly environments really is close enough. But that is harder (and the One Obvious Way is less reliable) under Python 3 than it was under Python 2. An alias for open that defaulted to surrogate-escape (or returned the new ASCIIstr bytes hybrid) would probably be sufficient to get back (almost) to Python 2 levels of ease and reliability. But it would tend to encourage ASCII/English-only assumptions. You could fix most of the remaining problems by scripting a web browser, except that scripting the browser in a cross-platform manner is slow and problematic, even with webbrowser.py. Whatever a recent Firefox does is (almost by definition) good enough, and is available ... but maybe not in a convenient form, which is one reason that chardet was created as a port thereof. Also note that firefox assumes you will update more often than Python does. Whatever chardet said at the time the Python release was cut is almost certainly good enough too. The browser makers go to great lengths to match each other even in bizarre corner cases. (Which is one reason there aren't more competing solutions.) But that doesn't mean it is *impossible* to construct a test case where they disagree -- or even one where a recent improvement in the algorithms led to regressions for one particular document. That said, such regressions should be limited to documents that were not properly labeled in the first place, and should be rare even there. Think of the changes as obscure bugfixes, akin to a program starting to handle NaN properly, in a place where it should not ever see one. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 460 reboot
Nick Coghlan wrote: Arbitrary binary data and ASCII compatible binary data are *different things* and the only argument in favour of modelling them with a single type is because Python 2 did it that way. Greg Ewing replied: I would say that ASCII compatible binary data is a *subset* of arbitrary binary data. As such, a type designed for arbitrary binary data is a perfectly good way of representing ASCII compatible binary data. But not when you care about the ASCII-compatible part; then you should use a subclass. Obviously, it is too late for separating bytes from AsciiStructuredBytes. PBP *may* even mean that just using the subclass for everything (and just the ignoring the ASCII specific methods when they aren't appropriate) was always the right implementation choice. But in terms of explaining the text model, that separation is important enough that (1) We should be reluctant to strengthen the its really just ASCII messages. (2) It *may* be worth creating a virtual split in the documentation. I'm willing ot work on (2) if there is general consensus that it would be a good idea. As a rough sketch, I would change places like http://docs.python.org/3/library/stdtypes.html#typebytes from: Bytes objects are immutable sequences of single bytes. Since many major binary protocols are based on the ASCII text encoding, bytes objects offer several methods that are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways. to something more like: Bytes objects are immutable sequences of single bytes. A Bytes object could represent anything, and is appropriate as the underlying storage for a sound sample or image file. Virtual subclass ASCIIStructuredBytes One particularly common use of bytes is to represent the contents of a file, or of a network message. In these cases, the bytes will often represent Text *in a specific encoding* and that encoding will usually be a superset of ASCII. Rather than create and support an ASCIIStructuredBytes subclass, Python simply added support for these use cases straight to Bytes objects, and assumes that this support simply won't be used when when it does not make sense. For example, bytes literals *could* be used to construct a sound sample, but the literals will be far easier to read when they are used to represent (encoded) ASCII text, such as OPEN. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 460 reboot
Greg Ewing replied: ... ASCII compatible binary data is a *subset* of arbitrary binary data. I wrote: But in terms of explaining the text model, that separation is important enough that (2) It *may* be worth creating a virtual split in the documentation. (rough sketch below) Ethan likes the idea, but points out that the term Virtual is confusing here. Alas, I'm not sure what the correct term is. In addition to Go for it! / Don't waste your time, I'm looking for advice on: (A) What word should I use instead of Virtual? Imaginary? Pretend? (B) Would it be good/bad/at least make the docs easier to create an actual class (or alias)? (C) Same question for a pair of classes provided only in the documentation, like example code. (D) What about an abstract class, or several? e.g., replacing the XXX TODO of collections.abc.ByteString with separate abstract classes for ByteSequence, String, ByteString, and ASCIIByteString? (ByteString already includes any bytes or bytearray instance, so backward compatibility means the String suffix isn't sufficient for an opt-in-by-instances class.) I'm willing ot work on (2) if there is general consensus that it would be a good idea. As a rough sketch, I would change places like http://docs.python.org/3/library/stdtypes.html#typebytes from: Bytes objects are immutable sequences of single bytes. Since many major binary protocols are based on the ASCII text encoding, bytes objects offer several methods that are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways. to something more like: Bytes objects are immutable sequences of single bytes. A Bytes object could represent anything, and is appropriate as the underlying storage for a sound sample or image file. Virtual subclass ASCIIStructuredBytes One particularly common use of bytes is to represent the contents of a file, or of a network message. In these cases, the bytes will often represent Text *in a specific encoding* and that encoding will usually be a superset of ASCII. Rather than create and support an ASCIIStructuredBytes subclass, Python simply added support for these use cases straight to Bytes objects, and assumes that this support simply won't be used when when it does not make sense. For example, bytes literals *could* be used to construct a sound sample, but the literals will be far easier to read when they are used to represent (encoded) ASCII text, such as OPEN. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 461: Adding % formatting to bytes and bytearray -- Final, Take 2
Victor Stinner wrote: Will ascii() ever emit an antislash representation? Try ascii(chr(0x1f)). In which version? I get: ValueError: chr() arg not in range(0x11) How do you plan to use this output? Write it into a socket or a file? When I debug, I use print logging which both expect text string. So I think that b'%a' is useless. Sad Use Case 1: There is not yet a working implementation of the file or wire format. Either I am still writing it, or the file I need to parse is coming from a partner who configured rather than wrote the original program. I write (or request that they write) something recognizable to the actual stream, as a landmark. Case 1a: I want a repr of the same object that is supposedly being represented in the official format, so I can see whether the problem is bad data or bad serialization. Use Case 2: Fallback for some sort of serialization format; I expect not to ever use the fallback in production, but better something ugly than a failure, let alone a crash. Use Case 3: Shortcut for serialization of objects whose repr is good enough. (My first instinct would probably be to implement the __bytes__ special method, but if I thought that was supposed to expose the real data, as opposed to a serialized copy, then I would go for %a.) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 463: Exception-catching expressions
Yury Selivanov wrote: I think the Motivation section is pretty weak. I have normally wished for this when I was (semi- interactively) exploring a weakly structured dataset. Often, I start with a string, split it into something hopefully like records, and then start applying filters and transforms. I would prefer to write a comprehension instead of a for loop. Alas, without pre-editing, I can be fairly confident that the data is dirty. Sometimes I can solve it with a filter (assuming that I remember and don't mind the out-of-order evaluation): # The if value happens first, # so the 1/value turns out to be safe. [1/value for value in working_list if value] Note that this means dropping the bad data, so that items in this list will have different indices than those in the parent working_list. I would rather have written: [1/value except (TypeError, ZeroDivisionError): None] which would keep the matching indices, and clearly indicate where I now had missing/invalid data. Sometimes I solve it with a clumsy workaround: sum((e.weight if hasattr(e, 'weight') else 1.0) for e in working_list) But the hasattr implies that I am doing some sort of classification based on whether or not the element has a weight. The true intent was to recognize that while every element does have a weight, the representation that I'm starting from didn't always bother to store it -- so I am repairing that before processing. sum(e.weight except AttributeError: 1) Often I give up, and create a junky helper function, or several. But to avoid polluting the namespace, I may leave it outside the class, or give it a truly bad name: def __only_n2(worklist): results = [] for line in worklist: line=line.strip() if not line: # or maybe just edit the input file... continue split1=line.split(, ) if 7 != len(split1): continue if n2 == split1[3]: results.append(split1) return results worklist_n2 = __only_n2(worklist7) In real life code, even after hand-editing the input data to fix a few cases, I recently ended up with: class VoteMark: ... @classmethod def from_property(cls, voteline): # print (voteline) count, _junk, prefs = voteline.partition(: ) return cls(count, prefs) ... # module level scope def make_votes(vs=votestring): return [VoteMark.from_property(e) for e in vs.splitlines()] vs=make_votes() You can correctly point out that I was being sloppy, and that I *should* have gone back to clean it up. But I wouldn't have had to clean up either the code or the data (well, not as much), if I had been able to just keep the step-at-a-time transformations I was building up during development: vs=[(VoteMark(*e.strip().split(: )) except (TypeError, ValueError): None) for e in votestring.splitlines()] Yes, the first line is still doing too much, and might be worth a helper function during cleanup. But it is already better than an alternate constructor that exists only to simplify a single (outside the class) function that is only called once. Which in turn is better than the first draft that was so ugly that I actually did fix it during that same work session. Inconvenience of dict[] raising KeyError was solved by introducing the dict.get() method. And I think that dct.get('a', 'b') is 1000 times better than dct['a'] except KeyError: 'b' I don't. dct.get('a', default='b') would be considerably better, but it would still imply that missing values are normal. So even after argclinic is fully integrated, there will still be times when I prefer to make it explicit that I consider this an abnormal case. (And, as others have pointed out, .get isn't a good solution when the default is expensive to compute.) Consider this example of a two-level cache:: for key in sequence: x = (lvl1[key] except KeyError: (lvl2[key] except KeyError: f(key))) I'm sorry, it took me a minute to understand what your example is doing. I would rather see two try..except blocks than this. Agreed -- like my semi-interactive code above, it does too much on one line. I don't object as much to: for key in sequence: x = (lvl1[key] except KeyError: (lvl2[key] except KeyError: f(key))) Retrieve an argument, defaulting to None:: cond = args[1] except IndexError: None # Lib/pdb.py:803: try: cond = args[1] except IndexError: cond = None cond = None if (len(args) 2) else args[1] This is an area where tastes will differ. I view the first as saying that not having a cond would be unusual, or at least a different kind of call. I view your version as a warning that argument parsing will be complex, and that
Re: [Python-Dev] PEP 463: Exception-catching expressions
Greg Ewing suggested: This version might be more readable: value = lst[2] except No value if IndexError Ethan Furman asked: It does read nicely, and is fine for the single, non-nested, case (which is probably the vast majority), but how would it handle nested exceptions? With parentheses. Sometimes, the parentheses will make a complex expression ugly. Sometimes, a complex expression should really be factored into pieces anyway. Hopefully, these times are highly correlated. The above syntax does lend itself somewhat naturally to multiple *short* except clauses: value = (lst[2] except No value if IndexError except Bad Input if TypeError) and nested exception expressions are at least possible, but deservedly ugly: value = (lvl1[name] except (lvl2[name] except (compute_new_answer(name) except None if AppValueError) if KeyError) if KeyError) This also makes me wonder whether the cost of a subscope (for exception capture) could be limited to when an exception actually occurs, and whether that might lower the cost enough to make the it a good tradeoff. def myfunc1(a, b, e): assert main scope e value == e e = main scope e value value = (myfunc1(val1, val2, e) except e.reason if AppError as e) assert main scope e value == e -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Alternative forms [was: PEP 463: Exception-catching expressions]
The PEP currently says: Alternative Proposals = Discussion on python-ideas brought up the following syntax suggestions:: value = expr except default if Exception [as e] This one was rejected because of the out-of-order evaluation. Note, however, that the (farthest left) expr is always evaluated first; the only out-of-order evaluation is default if Exception. default if Exception is precisely the same evaluation order (clause after the if skips ahead of the clause before the if) as in the existing if-expression, and the existing if-filters in comprehensions. The same justifications for that order violation generally apply here too. You can argue that they weren't sufficient justification in the first place, but that is water under the bridge; *re*-using out-of-order-if shouldn't add any additional costs. [Err... unless people take the if too literally, and treat the Exception clause as a boolean value, instead of as an argument to the except keyword.] The advantages of this form get much stronger with [as e] or multiple different except clauses, but some of them do apply to even the simplest form. Notably, the say it like you would in English that convinced Perl still applies: if *without* a then is normally an extra condition added after the main point: Normally ham, but fish if it's a Friday. (Admittedly, the word then *can* be elided (and represented by a slight pause), and python programmers are already used to seeing it represented only by :\n) I also give a fair amount of weight to the fact that this form starts to look awkward at pretty much the same time the logic gets too complicated for an expression -- that should discourage abuse. [The analogies to if-expressions and if-filters and to spoken English, along with discouragement for abuse, make this my preferred form.] ... value = expr except (Exception [as e]: default) (and the similar but unmentioned) value = expr except (Exception [as e] - default) The mapping analogy for : is good -- and is the reason to place parentheses there, as opposed to around the whole expression. Your preferred form -- without the internal parentheses -- looks very much like a suite-introduction, and not at all like the uses where an inline colon is acceptable. I do understand your concern that the parentheses make except (...) look too much like a function call -- but I'm not sure that is all bad, let alone as bad as looking like a suite introduction. Both : and - are defined for signatures; the signature meaning of : is tolerable, and the signature meaning of - is good. ... value = expr except Exception [as e] continue with default This one works for me, but only because I read continue with as a compound keyword. I assume the parser would too. :D But I do recognize that it is a poor choice for those who see the space as a more solid barrier. ... value = expr except(Exception) default # Catches only the named type(s) This looks too much like the pre-as way of capturing an exception. value = default if expr raise Exception (Without new keyword raises,) I would have to work really hard not to read that as: __temp = default if expr: raise Exception value = __temp value = expr or else default if Exception To me, this just seems like a wordier and more awkward version of expr except (default if Exception [as e]) including the implicit parentheses around default if Exception. value = expr except Exception [as e] - default Without parens to group Exception and default, this looks too much like an annotation describing what the expr should return. value = expr except Exception [as e] pass default I would assume that this skipped the statement, like an if-filter in a comprehension. All forms involving the 'as' capturing clause have been deferred from this proposal in the interests of simplicity, but are preserved in the table above as an accurate record of suggestions. Nick is right that you should specify whether it is deferred or rejected, because the simplest implementation may lock you into too broad a scope if it is added later. The four forms most supported by this proposal are, in order:: value = (expr except Exception: default) value = (expr except Exception - default) ... If there are not parentheses after except, it will be very tempting (and arguably correct) to (at least mentally) insert them around the first two clauses -- which are evaluated first. But that leaks into value = (expr except Exception): default which strongly resembles the suite-starter :, but has very little in common with the mapping : or the signature :. value = (expr except Exception) - default which looks like an annotation, rather than a part of the value-determination. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ
Re: [Python-Dev] Alternative forms [was: PEP 463: Exception-catching expressions]
(Thu Mar 6 23:26:47 CET 2014) Chris Angelico responded: On Fri, Mar 7, 2014 at 7:29 AM, Jim J. Jewett jimjjewett at gmail.com wrote: [ note that x if y already occurs in multiple contexts, and always evaluates y before x. ] Yes, but that's still out of order. Yeah, but local consistency is more important than global guidelines. :D ... *re*-using out-of-order-if shouldn't add any additional costs. The other thing to note is that it's somewhat ambiguous. Until you find that there isn't an else clause, it could be the equally valid expr except (default if cond else other_default), with the actual if Exception part still to come. True -- and I'm not a parser expert. But my belief is that the current parser allows lookahead for exactly one token, and that the else would fit within that limit. ... humans reading the code have to assume style guides mightn't be followed. True ... but I hope any non-trivial use of this (including use with a non-trivial ternary if) will look bad enough to serve as its own warning. The advantages of this form get much stronger with [as e] or multiple different except clauses, but some of them do apply to even the simplest form. Multiple different except clauses would make for an even messier evaluation order: expr1 except expr3 if expr2 except expr5 if expr4 If you consider the exception type to be the condition, then this makes sense (that is, if you read it as if isinstance(thrown_exception, Exception)); [but the most obvious reading is boolean; as always True] I phrased that badly. I agree that without parentheses for good spacing, the above is at least ambiguous -- that is what you get for stringing multiple clauses together without internal grouping. I do think parentheses help, (but are less important when there is only a single if) and I strongly prefer that they be internal (which you fear looks too much like calling a function named except). In that case, it is: expr1 except (expr3 if expr2) and the extension to multiple except clauses would be: expr1 except (expr3 if expr2, expr5 if expr4) though as I discuss later, placing parentheses there also makes a colon or arrow more tolerable. It does this because the nearby parens make it look more like the existing (non-lambda) uses of inline-colon to associate the two things on either side. (Without nearby brackets, the scope of the colon or arrow is more easily taken to be the whole line.) expr1 except (expr2: expr3, expr4: expr5) expr1 except (expr2 - expr3, expr4 - expr5) Notably, the say it like you would in English that convinced Perl still applies: if *without* a then is normally an extra condition added after the main point: Normally ham, but fish if it's a Friday. That's not how Python words ternary if, though. Agreed ... the say it like you would in English applies only to the expr if expr form (proposed here and) used by comprehensions: [1/x for x in data if x] value = expr except (Exception [as e]: default) (and the similar but unmentioned) value = expr except (Exception [as e] - default) The parenthesizing question and the choice of tokens are considered independent, so not all the cross-multiplications are listed. The mapping analogy for : is good -- and is the reason to place parentheses there, as opposed to around the whole expression. Your preferred form -- without the internal parentheses -- looks very much like a suite-introduction, and not at all like the uses where an inline colon is acceptable. I have some notes on that down the bottom: http://www.python.org/dev/peps/pep-0463/#colons-always-introduce-suites I know that they don't always introduce suites. I can't speak to the lambda precedent, but I do know that I personally often stumble when trying to parse it, so I don't consider it a good model. The other three inline uses (dict display, slide notation, and function parameter annotation) are effectively conjunction operators, saying that expr1 and expr2 are bound more tightly than you would assume if they were separated by commas. They only occur inside a fairly close bracket (of some sort), and if the bracket isn't *very* close, then there are usually multiple associates-colons inside the same bracket. data[3:5] data[-1:-3:-1] def myfunc(a:int=5, b:str=Jim, c:float=3.14) {'b': 2, 'c': 3, 'a': 1} With parentheses after the except, the except expression will match this pattern too -- particularly if there are multiple types of exception treated differently. expr1 except (expr2: expr3) Without (preferably internal) parentheses, it will instead look like a long line with a colon near the end, and a short continuation suite that got moved up a line because it was only one statement long. def nullfunc(self, a): pass expr1 except expr3: expr2 value = expr
[Python-Dev] What is the precise problem? [was: Reference cycles in Exception.__traceback__]
On Wed Mar 5 17:37:12 CET 2014, Victor Stinner wrote: Python 3 now stores the traceback object in Exception.__traceback__ and exceptions can be chained through Exception.__context__. It's convenient but it introduced tricky reference cycles if the exception object is used out of the except block. ... see Future.set_exception() of the ayncio module. ... frame.clear() raises an RuntimeError if the frame is still running. And it doesn't break all reference cycles. An obvious workaround is to store the traceback as text, but this operation is expensive especially if the traceback is only needed in rare cases. I tried to write views of the traceback (and frames), but Exception.__traceback__ rejects types other than traceback and traceback instances cannot be created. It's possible to store the traceback somewhere else and set Exception.__traceback__ to None, but there is still the problem with chained exceptions. Any idea for a generic fix to such problem? Could you clarify what the problem actually is? I can imagine any of the following: (A) Exceptions take a lot of memory, because of all the related details. + But sometimes the details are needed, so there is no good solution. (B) Exceptions take a lot of memory, because of all the related details. There is a common use case that knows it will never need certain types of details, and releasing just those details would save a lot of memory. But frame.clear() picks the wrong details to release, at least for this case. + So write another function (or even a method) that does work, and have your framework call it. (Also see (F)) + Instead of saving the original exception, could you instead create and store a new (copied?) one, which obviously won't (yet) be referenced by the traceback you assign to it? (C) Exceptions take a lot of memory, because of all the related details. There is a common use case that knows it can make do with a summary of certain types of details, and releasing just those details would save a lot of memory. But generating the summary is expensive. + It would help to have the summarize method available. + It would help to have feedback from gc saying when there is enough memory pressure to make this call worthwhile. (D) Exceptions are not released until cyclic gc, and so they eat a lot of memory for a long time prior to that. + This may be like case B + Are there references that can be replaced by weak references? + Are there references that you can replace with weak references when your framework stores the exception? (Also see (F)) (E) Exceptions are not released even during cyclic gc, because of ambiguity over which __del__ to run first. + This may be like case B or case E + This may be a concrete use case for the __close__ protocol. __close__ is similar to __del__, except that it promises not to care about order of finalization, and it is run eagerly. As soon as an instance is known to be in a garbage cycle, __close__ should be run without worrying about whether other objects also have __close__ or __del__ methods. Hopefully, this will break the cycle, or at least reduce the number of objects with __del__ methods. (Whether to require that __close__ be idempotent, or to guarantee that it is run only once/instance -- that would be part of the bikeshedding.) (F) You know what to delete (or turn into weakrefs), but can't actually do it without changing a type. (F1) Why does Exception.__traceback__ reject other objects which are neither tracebacks nor None? + Can that restriction be relaxed? + Can you create a mixin subtype of Exception, which relaxes the constraint, and gets used by your framework? + Can the restriction on creating tracebacks be relaxed? + Can traceback's restriction on frames' types be relaxed? (F2) Do you need the original Exception? (see (B)) (F3) Do you care about frame.clear() raising a runtime exception? Could you suppress it (or, better, get clear() to raise something more specific, and suppress that)? It would still have released what memory it reasonably could. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Why not make frames? [was: Reference cycles in Exception.__traceback__]
On Thu Mar 6 16:52:56 CET 2014, Antoine Pitrou wrote: IMO it is absolutely out of question to allow creation of arbitrary frames from Python code, because the structure and initialization of frames embody too many low-level implementation details. So? Does any of that matter until the frame is used to actually evaluate something? So what is the harm in creating a (likely partially invalid) frame for inspection purposes? For that matter, what is the point in tracebacks requiring frames, as opposed to any object, with the caveat that not having the expected attributes may cause grief -- as happens with any duck typing? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Why not make frames? [was: Alternative forms [was: PEP 463: Exception-catching expressions]]
TL;DR: expr except (default if exc_expr) expr (except default if exc_expr) expr except (exc_expr: default) expr (except exc_expr: default) (1) Group the exceptions with the default they imply. (2) inline-: still needs () or [] or {}. (3) Consider the expression inside a longer line. (3a) Does the except expression need to be general, or would it work if it were limited to a subclause of variable assignments? (3b) What about comprehensions? On Fri Mar 7 20:54:31 CET 2014, Chris Angelico wrote: On Sat, Mar 8, 2014 at 5:58 AM, Jim J. Jewett jimjjewett at gmail.com wrote: (Thu Mar 6 23:26:47 CET 2014) Chris Angelico responded: On Fri, Mar 7, 2014 at 7:29 AM, Jim J. Jewett jimjjewett at gmail.com wrote: [ note that x if y already occurs in multiple contexts, and always evaluates y before x. ] ... I don't see except expressions as fundamentally more associated with if/else than with, say, an or chain, which works left to right. I do, because of the skipping portion. Short-circuiting operators, such as an or chain, never skip a clause unless they are skipping *every* subsequent clause. An if statement sometimes skips the (unlabeled in python) then clause, but still processes the even-later else clause. A try statement sometimes skips the remainder of the try suite but still executes the later subordinate except and finally clauses. Note that this only explains why I see except as more closely related to if than to or; it isn't sufficient to justify going back to execute the skipped clause later. That said, going back to a previous location is a lot easier to excuse after an error handler than in regular code. Analysis of the Python standard library suggests that the single-if situation is *by far* the most common, to the extent that it'd hardly impact the stdlib at all to add multiple except clauses to the proposal. Do you have a strong use-case for the more full syntax? I do not. I dislike the arbitrary restriction, and I worry that lifting it later (while maintaining backwards compatibility) will result in a syntax wart, but I do not have a compelling use case for that later relaxation. and I strongly prefer that they [the parentheses] be internal (which you fear looks too much like calling a function named except). In that case, it is: expr1 except (expr3 if expr2) I'm still not really seeing how this is better. For one thing, it makes it clear that the if keyword may be messing with the order of evaluation. I don't claim that syntax is perfect. I do think it is less flawed than the no-parentheses (or external parentheses) versions: (expr1 except expr3 if expr2) expr1 except expr3 if expr2 because the tigher parentheses correctly indicate that expr2 and expr3 should be considered as a (what-to-do-in-case-of-error) group, which interacts (as a single unit) with the main expression. I also think it is (very slighly) better than the colon+internal-parentheses version: expr1 except (expr2: expr3) which in turn is far, far better than the colon versions with external or missing parentheses: (expr1 except expr2: expr3) expr1 except expr2: expr3 because I cannot imagine reading an embedded version of either of those without having to mentally re-parse at the colon. An example assuming a precedence level that may not be what the PEP proposes: if myfunc(5, expr1 except expr2: expr3, label): for i in range(3, 3*max(data) except TypeError: 9, 3): ... if myfunc(5, (expr1 except expr2: expr3), label): for i in range(3, (3*max(data) except TypeError: 9), 3): ... if myfunc(5, expr1 except (expr2: expr3), label): for i in range(3, 3*max(data) except (TypeError: 9), 3): ... if myfunc(5, expr1 except (expr2: expr3), label): for i in range(3, 3*max(data) (except TypeError: 9), 3): ... if myfunc(5, expr1 except (expr3 if expr3), label): for i in range(3, 3*max(data) (except 9 if TypeError), 3): ... if myfunc(5, expr1 except (expr3 if expr3), label): for i in range(3, 3*max(data) except (9 if TypeError), 3): myarg = expr1 except (expr3 if expr2) if myfunc(5, myarg, label): limit = 3*max(data) except (9 if TypeError) for i in range(3, limit, 3): Yes, I would prefer to create a variable naming those expressions, but these are all still simple enough that I would expect to have to read them. (I like constructions that get ugly just a bit faster than they get hard to understand.) If I have to parse any of them, the ones at the bottom are less difficult than the ones at the top. With the colon version, it looks very much like dict display, which is good, since that is one of the acceptable uses of inline-colon. only with different brackets around it; in some fonts, that'll be very easily confused. I've had more trouble with comma vs period than
[Python-Dev] Scope issues [was: Alternative forms [was: PEP 463: Exception-catching expressions]]
On Fri Mar 7 20:54:31 CET 2014, Chris Angelico wrote: On Sat, Mar 8, 2014 at 5:58 AM, Jim J. Jewett jimjjewett at gmail.com wrote: (Thu Mar 6 23:26:47 CET 2014) Chris Angelico responded: ...[as-capturing is] deferred until there's a non-closure means of creating a sub-scope. The problem is that once it is deployed as leaking into the parent scope, backwards compatibility may force it to always leak into the parent scope. (You could document the leakage as a bug or as implementation-defined, but ... those choices are also sub-optimal.) It'll never be deployed as leaking, for the same reason that the current 'except' statement doesn't leak: I don't think that is the full extent of the problem. From Nick's description, this is a nasty enough corner case that there may be glitches no one notices in time. The PEP should therefore explicitly state that implementation details may force the deferral to be permanent, and that this is considered an acceptable trade-off. -jJ -- Sorry for the botched subject line on the last previous message. If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] What is the precise problem? [was: Reference cycles in Exception.__traceback__]
On Mon Mar 10 18:56:17 CET 2014 (and earlier quotes), Maciej Fijalkowski wrote: Maciej: You should not rely on __del__s being called timely one way or Maciej: another. Why would you require this for the program to work Maciej: correctly in the particular example of __traceback__? To the extent that I understand, he isn't requiring it for correctness; he is instead saying that without timely __del__, the Quality of Implementation suffers. I suspect there are aspects of tulip (or event processing in general) that make it more common for the frame graph to be painfully cyclic, so that live frames keep dead ones from being collected. It may also be more common to have multiple __del__ methods in the same cycle, if cycles are created by a framework. So the problems aren't new, but they may have become considerably more painful. Victor: For asyncio, it's very useful to see unhandled exceptions as early as Victor: possible. Otherwise, your program is blocked and you don't know why. ... Maciej: twisted goes around it by attaching errback by hand. Would that work Maciej: for tulip? Maciej: deferred.addErrback(callback_that_writes_to_log) What do you mean by hand? Does the framework automatically add a log the exception errback to every task, or every task that doesn't have its own errback of some sort? Or do you mean that users should do so by hand, but that it is a well-known recipe? Maciej: I'm very skeptical about changing details of __traceback__ and Maciej: frames, just in order to make refcounting work (since it would Maciej: create something that would not work on pypy for example). How about just loosening some constraints on exceptions, in order to permit more efficient operation, but in a way that may be particularly useful to a refcounting scheme? Can I assume that you don't object to frame.clear()? How about a hypothethetical traceback.pack() that made it easier to reclaim memory held by frame/traceback cycles? If standard traceback printing were the only likely future use, each frame/traceback pair could be replaced by 4 pointers, and allocating space/copying those 4 would be the only work that wasn't already needed for the eventual deallocation. Today, the setters for __cause__, __context__, and __traceback do typechecks to ensure that those properties are (None or) the expected type; __traceback__ doesn't even allow subclasses. The constructors for frame and traceback are similarly strict. What would be the harm in allowing arbitrary objects, let alone a few specific alternative implementations? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Keyword meanings [was: Accept just PEP-0426]
Vinay Sajip reworded the 'Provides-Dist' definition to explicitly say: The use of multiple names in this field *must not* be used for bundling distributions together. It is intended for use when projects are forked and merged over time ... (1) Then how *should* the bundle-of-several-components case be represented? (2) How is 'Provides-Dist' different from 'Obsoletes-Dist'? The only difference I can see is that it may be a bit more polite to people who do want to install multiple versions of a (possibly abstract) package. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] dict and required hashing
(1) I believe the recent consensus was that the number of comparisons made in a dict lookup is an implementation detail. (Please correct me if I am wrong.) (2) Is the item will be hashed at least once a language guarantee? For small mappings, it might well be more efficient to just store the 2-3 key/value pairs and skip the bucket calculation. On the other hand, if a key is not hashable, discovering that long after it has already been added to the dict is suboptimal. Of course, that sort of delayed exception can already happen if it is the __eq__ method that is messed up ... -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] API and process questions (sparked by Claudiu Popa on 16104
(1) Should fixes to a docstring go in with a patch, even if they aren't related to the changing functionality? Bytestring compilation has several orthogonal parameters. Most -- but not all -- are documented in the docstring. (Specifically, there is no explanation of the rx parameter which acts as a filter, and no mention that symbolic links to directories are ignored.) It is best if a commit changes one small thing at a time. On the other hand, Nick recently posted that the minimal overhead of a patch commit is about half an hour. Is that overhead enough to override the one-issue-per-patch guideline? (2) The patch adds new functionality to use multiple processes in parallel. The normal parameter values are integers indicating how many processes to use. The parameter also needs two special values -- one to indicate use os.cpu_count, and the other to indicate don't use multiprocessing at all. (A) Is there a Best Practices for this situation, with two odd cases? (B) Claudiu originally copied the API from a similar APU for regrtest. What is the barrier for do it sensibly vs stick with precedent elsewhere? (Apparently regrtest treats any negative number as a request for the cpu_count calculation; I suspect that -5 is more likely to be an escaping error for 5 than it is to be a real request for auto-calculation that just happened to choose -5 instead of -1.) (C) How important is it to keep the API consistent between a top-level CLI command and the internal implementation? At the the moment, the request for cpu_count is handled in the CLI wrapper, and not available to interactive callers. On the other hand, interactive callers could just call cpu_count themselves... (D) How important is it to maintain consistency with other uses of the same tool -- multiprocessing has its own was of requesting auto-calculation. (So someone used to multiprocessing might assume that None meant to auto-calculate, as opposed to don't use multiprocessing at all.) -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] API and process questions (sparked by Claudiu Popa on 16104
On Mon, Apr 28, 2014 at 12:56 PM, Charles-François Natali cf.nat...@gmail.com wrote: Why would the user care if multiprocessing is used behind the scene? Err ... that was another set of questions that I forgot to ask. (A) Why bother raising an error if multiprocessing is unavailable? After all, there is a perfectly fine fallback... On other hand, errors should not pass silently. If a user has explicitly asked for multiprocessing, there should be some notice that it didn't happen. And builds are presumably something that a developer will monitor to respond to the Exception. (A1) What sort of Error? I'm inclined to raise the original ImportError, but the patch prefers a ValueError. (B) Assuming the exception, I suppose your question adds a 3rd special case of Whatever the system suggests, and I don't care whether or not it involves multiprocessing. It would be strange for processes=1 to fail if multiprocessing is not available. As Claudiu pointed out, processes=1 should really mean 1 worker process, which is still different from do everything in the main process. I'm not sure that level of control is really worth the complexity, but I'm not certain it isn't. processes = 0: use os.cpu_count() I could understand doing that for 0 or -1; what is the purpose of doing it for both, let alone for -4? Are we at the point where the parameter should just take positive integers or one of a set of specified string values? -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Internal representation of strings and Micropython (Steven D'Aprano's summary)
Steven D'Aprano wrote: (1) I asked if it would be okay for MicroPython to *optionally* use nominally Unicode strings limited to ASCII. Pretty much the only response to this as been Guido saying That would be a pretty lousy option, and since nobody has really defended the suggestion, I think we can assume that it's off the table. Lousy is not quite the same as forbidden. Doing it in good faith would require making the limit prominent in the documentation, and raising some sort of CharacterNotSupported exception (or at least a warning) whenever there is an attempt to create a non-ASCII string, even via the C API. (2) I asked if it would be okay ... to use an UTF-8 implementation even though it would lead to O(N) indexing operations instead of O(1). There's been some opposition to this, including Guido's: [Non-ASCII character removed.] It is bad when quirks -- even good quirks -- of one implementation lead people to write code that will perform badly on a different Python implementation. Cpython has at least delayed obvious optimizations for this reason. Changing idiomatic operations from O(1) to O(N) is big enough to cause a concern. That said, the target environment itself apparently limits N to small enough that the problem should be mostly theoretical. If you want to be good citizens, then do put a note in the documentation warning that particularly long strings are likely to cause performance issues unique to the MicroPython implementation. (Frankly, my personal opinion is that if you're really optimizing for space, then long strings will start getting awkward long before N is big enough for algorithmic complexity to overcome constant factors.) ... those strings will need to be transcoded to UTF-8 before they can be written or printed, so keeping them as UTF-8 ... That all assumes that the external world is using UTF-8 anyhow. Which is more likely to be true if you document it as a limitation of MicroPython. ... but many strings may never be written out: print(prefix + s[1:].strip().lower().center(80) + suffix) creates five strings that are never written out and one that is. But looking at the actual strings -- UTF-8 doesn't really hurt much. Only the slice and center() are more complex, and for a string less than 80 characters long, O(N) is irrelevant. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fix Unicode-disabled build of Python 2.7
On 6/24/2014 4:22 AM, Serhiy Storchaka wrote: I submitted a number of patches which fixes currently broken Unicode-disabled build of Python 2.7 (built with --disable-unicode configure option). I suppose this was broken in 2.7 when C implementation of the io module was introduced. It has frequently been broken. Without a buildbot, it will continue to break. I have given at least a quick look at all your proposed changes; most are fixes to test code, such as skip decorators. People checked in tests without the right guards because it did work on their own builds, and on all stable buildbots. That will probably continue to happen unless/until a --disable-unicode buildbot is added. It would be good to fix the tests (and actual library issues). Unfortunately, some of the specifically proposed changes (such as defining and using _unicode instead of unicode within python code) look to me as though they would trigger problems in the normal build (where the unicode object *does* exist, but would no longer be used). Other changes, such as the use of \x escapes, appear correct, but make the tests harder to read -- and might end up removing a test for correct unicode funtionality across different spellings. Even if we assume that the tests are fine, and I'm just an idiot who misread them, the fact that there is any confusion means that these particular changes may be tricky enough to be for a bad tradeoff for 2.7. It *might* work if you could make a more focused change. For example, instead of leaving the 'unicode' name unbound, provide an object that simply returns false for isinstance and raises a UnicodeError for any other method call. Even *this* might be too aggressive to 2.7, but the fact that it would only appear in the --disable-unicode builds, and would make them more similar to the regular build are points in its favor. Before doing that, though, please document what the --disable-unicode mode is actually *supposed* to do when interacting with byte-streams that a standard defines as UTF-8. (For example, are the changes to _xml_dumps and _xml_loads at http://bugs.python.org/file35758/multiprocessing.patch correct, or do those functions assume they get bytes as input, or should the functions raise an exception any time they are called?) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sum(...) limitation
Sat Aug 2 12:11:54 CEST 2014, Julian Taylor wrote (in https://mail.python.org/pipermail/python-dev/2014-August/135623.html ) wrote: Andrea Griffini agriff at tin.it wrote: However sum([[1,2,3],[4],[],[5,6]], []) concatenates the lists. hm could this be a pure python case that would profit from temporary elision [ https://mail.python.org/pipermail/python-dev/2014-June/134826.html ]? lists could declare the tp_can_elide slot and call list.extend on the temporary during its tp_add slot instead of creating a new temporary. extend/realloc can avoid the copy if there is free memory available after the block. Yes, with all the same problems. When dealing with a complex object, how can you be sure that __add__ won't need access to the original values during the entire computation? It works with matrix addition, but not with matric multiplication. Depending on the details of the implementation, it could even fail for a sort of sliding-neighbor addition similar to the original justification. Of course, then those tricky implementations should not define an _eliding_add_, but maybe the builtin objects still should? After all, a plain old list is OK to re-use. Unless the first evaluation to create it ends up evaluating an item that has side effects... In the end, it looks like a lot of machinery (and extra checks that may slow down the normal small-object case) for something that won't be used all that often. Though it is really tempting to consider a compilation mode that assumes objects and builtins will be normal, and lets you replace the entire above expression with compile-time [1, 2, 3, 4, 5, 6]. Would writing objects to that stricter standard and encouraging its use (and maybe offering a few AST transforms to auto-generate the out-parameters?) work as well for those who do need the speed? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Backwards compatibility after certificate autovalidation
Summary: There needs to be a simple way to opt out at install time. It would be far better to offer more fine-grained control, but leaving that better solution to downstream is acceptable. On 3 September 2014 01:19, Antoine Pitrou solipsis at pitrou.net wrote: RFC 2818 (HTTP over TLS) has the following language in section 3.1: If the hostname is available, the client MUST check it against the server's identity as presented in the server's Certificate message, in order to prevent man-in-the-middle attacks. If the client has external information as to the expected identity of the server, the hostname check MAY be omitted. This second case is pretty common, in my experience. I still see it on the public internet, but mismatches are almost the expected case on the intranet, and many installation guides begin by saying to ignore the security warnings. I think it best not to name my employer in this context, but I work for an IT firm large enough that you've heard of it. As bad as our internal situation is, it is still better than a typical client's infrastructure, except that clients often have fewer surfaces to expose in the first place. Internal websites and applications tend to have information that needs protection only because saying otherwise requires a long bureaucratic process with little payoff. (Also true at many clients.) Nick has already posted a subset of the reasons why a site may be signed with a certificate that is self-signed, expired, and/or limited to the wrong hostname/subdomain. In the long run, I agree that it is better to default to secure. But in the short and medium term, there has to be a workaround, and I would prefer that the simplest workaround not be retire the application, and don't use python again. I believe that the minimal acceptable workaround is that the Release Notes have an URL pointing to an install-time recipe telling the admin how to change the default back globally. Examples of good enough: Add this file to site-packages Add this environment variable with this setting Add this command line switch to your launch script Examples of not good enough: Edit your application to change ... Edit your system store ... (affecting more than python) Obviously, it would be great to offer finer control, so that the stricter default can be used when it is OK. (Per installation? Per application? Per run? Per domain? Per protocol? Per certificate? Per rejection reason? Treat anything in subdomain1.example.com as valid for hostname.example.com? Self-signing is OK for this IP range?) I would be pleasantly surprised if this level of API can even be standardized in time, and I agree that it is reasonable to leave it to 3rd party modules and downstream distributions. But I think Python itself should provide at least the single big hammer -- and that hammer should be something that can be used once at installation time (perhaps by changing the launch script), instead of requiring user interaction. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backwards compatibility after certificate autovalidation
On Mon, Sep 8, 2014 at 3:44 PM, Cory Benfield c...@lukasa.co.uk wrote: On 8 September 2014 18:23, Jim J. Jewett jimjjew...@gmail.com wrote: Summary: There needs to be a simple way to opt out at install time. It would be far better to offer more fine-grained control, but leaving that better solution to downstream is acceptable. Does this argument apply to a hypothetical 2.7 backport of this change, or does it apply to making the change in 3.5? (Or of course both.) I believe the argument applies even to 3.5, given that there was no deprecation period. The concern is obviously stronger for maintenance releases. I am not saying that secure-by-default should wait until until 3.6; I am saying that the rush requires even more attention than usual to backwards compatibility. This actually argues *for* backporting the fix as at least opt-in, so that 2.7/3.4 can serve as the make your changes now, test them without all the other new features releases. Nick's suggestion of a monkey-patching .pth file would be sufficient backwards compatibility support, if the recipe were referenced from the release notes (not just the python lib documentation). Support for partial opt-in -- whether per-process, per call, per address, etc -- would be nice, but it isn't required for backwards compatibility. I think that means an -X option for noverifyhttps should NOT be added. It doesn't get users closer to the final solution; it just adds the noise of a different workaround. I assume that adding _unverified_urlopen or urlopen(context=...) do provide incremental improvements compatible with the eventual full opt-in. If so, adding them is probably reasonable, but I think the PEP should explicitly list all such approved half-measures as a guard against API feature creep. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Backwards compatibility after certificate autovalidation
On Tue, Sep 9, 2014 at 12:11 PM, Christian Heimes christ...@python.org wrote: On 09.09.2014 05:03, Nick Coghlan wrote: On 9 Sep 2014 10:48, Jim J. Jewett jimjjew...@gmail.com mailto:jimjjew...@gmail.com wrote: From Guido's and your feedback, I think we may need two things to approve this for 3.4.2 (putting 2.7 aside for now): 1. context parameter support in urllib.request (to opt out on a per-call basis) 2. a documented way to restore the old behaviour via sitecustomize (which may involve monkeypatching) What's with our plan to introduce sslcustomize? Is the idea for a configuration module and named contexts off the table? In a perfect world, half-measures would not be needed, and so neither would sslcustomize. In the real world, half-measures are needed, but offering too many of them adds so much confusion that things can actually get worse in practice. In other words, sslcustomize could be great, but getting it wrong would be a step backwards -- so start it as a 3rd party module. Since the biggest users are likely supported customers of downstream distributions, it makes sense to let them take the lead, though I'm sure they would appreciate a proof of concept. I still prefer the general idea over the monkey patching idea because it provides a clean but simple interface for structured configuration. Monkey patching of stdlib modules is ugly and error-prone. The primary use case for monkey patching is to support Separation of Roles. (Exact titles will of course differ by business.) If you need structured configuration, then you are already treating some calls differently from others, which means that you are already doing partial remediation. I agree that monkey patching is the wrong choice if you are doing partial remediation. But this partial remediation also implies that a Developer and Business Owner are involved to decide which calls need to be changed, and whether to change the call vs dropping the functionality vs convincing the owner of the other end of the connection to do things right in the first place. A Developer in charge of her own environment doesn't need to monkey patch -- but she could just do the right thing today, or switch to a framework that does. sslcustomize may be a really good way for her to document these are the strange exceptions in our existing environment, if it is done right. A Deployment Engineer may not even know python, and is certainly not authorized to make changes beyond configuration. Convincing someone that a .py file is a configuration knob probably requires an exception that is painful to get. (And saying oh, this is just where we list security stuff that we're ignoring won't make it easier.) Changing the the urlopen calls would therefore be unacceptable even if source code were available -- and sometimes it isn't. The Deployment Engineer is often responsible for upgrading the infrastructure components (possibly including python) for security patches, so he has to be able to deploy 3.4.x or 2.7.y (though *probably* not 3.5.0) without any changes to the application itself -- and usually without access to whatever regression tests the application itself uses. (Ideally, someone else who does have that access is involved, but ... not always.) What the Deployment Engineer *can* do is modify the environment around the application. He can write a shell script that sets environment variables and or command line options. He can probably even add a required component -- which might in practice just be a pre-written module like sslcustomize, or a .pth file that does the monkey patch on each launch. But *adding* such a component is orders of magnitude simpler (from a bureaucratic perspective) than *modifying* one that already exists. The end user often can't do anything outside the application's own UI, which is why the change has to be handled once at deployment, instead of as break-fix per call site or per bad certificate. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog
On September 11, 2014, Jeff Allen wrote: ... the area of code point space used for the smuggling of bytes under PEP-383 is not a Unicode Private Use Area, but a portion of the trailing surrogate range. This is a code violation, which I imagine is why surrogateescape is an error handler, not a codec. True, but I believe that is a CPython implementation detail. Other implementations (including jython) should implement the surrogatescape API, but I don't think it is important to use the same internal representation for the invalid bytes. (Well, unless you want to communicate with external tools (GUIs?) that are trying to directly use (effectively bytes rather than strings) in that particular internal encoding when communicating with python.) lone surrogates preclude a naive use of the platform string library Invalid input often causes problems. Are you saying that there are situations where the platform string library could easily handle invalid characters in general, but has a problem with the specific case of lone surrogates? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Multilingual programming article on the Red Hat Developer blog
On Sat Sep 13 00:16:30 CEST 2014, Jeff Allen wrote: 1. Java does not really have a Unicode type, therefore not one that validates. It has a String type that is a sequence of UTF-16 code units. There are some String methods and Character methods that deal with code points represented as int. I can put any 16-bit values I like in a String. Including lone surrogates, and invalid characters in general? 2. With proper accounting for indices, and as long as surrogates appear in pairs, I believe operations like find or endswith give correct answers about the unicode, when applied to the UTF-16. This is an attractive implementation option, and mostly what we do. So use it. The fact that you're having to smuggle bytes already guarantees that your data is either invalid or misinterpreted, and bug-free isn't possible. In terms of best-effort, it is reasonable to treat the smuggled bytes as representing a character outside of your unicode repertoire -- so it won't ever match entirely valid strings, except perhaps via a wildcard. And it should still work for .endswith(the same invalid characters). 3. I'm fixing some bugs where we get it wrong beyond the BMP, and the fix involves banning lone surrogates (completely). At present you can't type them in literals but you can sneak them in from Java. So how will you ban them, and what will you do when some java class sends you an invalid sequence anyhow? That is exactly the use case for these smuggled bytes... If you distinguish between a fully constructed PyString and a code-unit-sequence-that-could-be-made-into-a-PyString-later, then you could always have your constructor return an InvalidPyString subclass on the rare occasions when one is needed. If you want to avoid invalid surrogates even then, just use the replacement character and keep a separate list of original characters that got replaced in this string -- a hassle, but no worse than tracking indices for surrogates. 4. I think (with Antoine) if Jython supported PEP-383 byte smuggling, it would have to do it the same way as CPython, as it is visible. It's not impossible (I think), but is messy. Some are strongly against. If you allow direct write access to the underlying charsequence (as CPython does to C extensions), then you can't really ban invalid sequences. If callers have to go through an API -- even something as minimal as getBytes or getChars -- then you can use whatever internal representation you prefer. Hopefully, the vast majority of strings won't actually have smuggled bytes. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 479
I have a strong suspicion that I'm missing something; I have been persuaded both directions too often to believe I have a grip on the real issue. So I'm putting out some assumptions; please tell me if I'm wrong, and maybe make them more explicit in the PEP. (1) The change will only affect situations where StopIteration is currently raised as an Exception -- i.e., it leaks past the bounds of a loop. (2) This can happen because of an explicit raise StopIteration. This is currently a supported idiom, and that is changing with PEP 479. (2a) Generators in the unwind path will now need to catch and reraise. (3) It can also happen because of an explicit next statement (as opposed the the implicit next of a loop). This is currently supported; after PEP 479, the next statement should be wrapped in a try statement, so that the intent will be explicit. (4) It can happen because of yield from yielding from an iterator, rather than a generator? (5) There is no other case where this can happen? (So the generator comprehension case won't matter unless it also includes one of the earlier cases.) -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] hg vs Github [was: PEP 481 - Migrate Some Supporting Repositories to Git and Github]
M. Cepl asked: What I really don't understand is why this discussion is hg v. GitHub, when it should be hg v. git. Particular hosting is a secondary issue I think even the proponents concede that git isn't better enough to justify a switch in repositories. They do claim that GitHub (the whole environment; not just the hosting) is so much better that a switch to GitHub is justified. Github + hg offers far fewer benefits than Github + git, so also switching to git is part of the price. Whether that is an intolerable markup or a discount is disputed, as are the value of several other costs and benefits. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] My thinking about the development process
Brett Cannon wrote: 4. Contributor creates account on bugs.python.org and signs the [contributor agreement](https://www.python.org/psf/contrib/contrib-form/) Is there an expiration on such forms? If there doesn't need to be (and one form is good for multiple tickets), is there an objection (besides not done yet) to making signed the form part of the bug reporter account, and required to submit to the CI process? (An I can't sign yet, bug me later option would allow the current workflow without the this isn't technically a patch workaround for small enough patches from those with slow-moving employers.) There's the simple spelling mistake patches and then there's the code change patches. There are a fair number of one-liner code patches; ideally, they could also be handled quickly. For the code change patches, contributors need an easy way to get a hold of the code and get their changes to the core developers. For a fair number of patches, the same workflow as spelling errors is appropriate, except that it would be useful to have an automated state saying yes, this currently merges fine, so that committers can focus only on patches that are (still) at least that ready. At best core developers tell a contributor please send your PR against 3.4, push-button merge it, update a local clone, merge from 3.4 to default, do the usual stuff, commit, and then push; Is it common for a patch that should apply to multiple branches to fail on some but not all of them? In other words, is there any reason beyond not done yet that submitting a patch (or pull request) shouldn't automatically create a patch per branch, with pushbuttons to test/reject/commit? Our code review tool is a fork that probably should be replaced as only Martin von Loewis can maintain it. Only he knows the innards, or only he is authorized, or only he knows where the code currently is/how to deploy an update? I know that there were times in the (not-so-recent) past when I had time and willingness to help with some part of the infrastructure, but didn't know where the code was, and didn't feel right making a blind offer. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] libffi embedded in CPython
On Thu, Dec 18, 2014, at 14:13, Maciej Fijalkowski wrote: ... http://bugs.python.org/issue23085 ... is there any reason any more for libffi being included in CPython? [And why a fork, instead of just treating it as an external dependency] Benjamin Peterson responded: It has some sort of Windows related patches. No one seems to know whether they're still needed for newer libffi. Unfortunately, ctypes doesn't currently have a maintainer. Are any of the following false? (1) Ideally, we would treat it as an external dependency. (2) At one point, it was intentionally forked to get in needed patches, including at least some for 64 bit windows with MSVC. (3) Upstream libffi maintenance has picked back up. (4) Alas, that means the switch merge would not be trivial. (5) In theory, we could now switch to the external version. [In particular, does libffi have a release policy such that we could assume the newest released version is safe, so long as our integration doesn't break?] (6) By its very nature, libffi changes are risky and undertested. At the moment, that is also true of its primary user, ctypes. (7) So a switch is OK in theory, but someone has to do the non-trivial testing and merging, and agree to support both libffi and and ctypes in the future. Otherwise, stable wins. (8) The need for future support makes this a bad candidate for patches wanted/bug bounty/GSoC. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] libffi embedded in CPython
On Thu, Dec 18, 2014, at 14:13, Maciej Fijalkowski wrote: ... http://bugs.python.org/issue23085 ... is there any reason any more for libffi being included in CPython? Paul Moore wrote: Probably the easiest way of moving this forward would be for someone to identify the CPython-specific patches in the current version ... Christian Heimes wrote: That's easy. All patches are tracked in the diff file https://hg.python.org/cpython/file/3de678cd184d/Modules/_ctypes/libffi.diff That (200+ lines) doesn't seem to have all the C changes, such as the win64 sizeof changes from issue 11835. Besides http://bugs.python.org/issue23085, there is at least http://bugs.python.org/issue22733 http://bugs.python.org/issue20160 http://bugs.python.org/issue11835 which sort of drives home the point that making sure we have a good merge isn't trivial, and this isn't an area where we should just assume that tests will catch everything. I don't think it is just a quicky waiting on permission. I've no doubt that upstream libffi is better in many ways, but those are ways people have already learned to live with. That said, I haven't seen any objections in principle, except perhaps from Steve Dower in the issues. (I *think* he was just saying not worth the time to me, but it was ambiguous.) I do believe that Christian or Maciej *could* sort things out well enough; I have no insight into whether they have (or someone else has) the time to actually do so. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 441 - Improving Python ZIP Application Support
Barry Warsaw wrote: I don't know exactly what the procedure would be to claim .pyz for *nix, e.g. updating /etc/mime.types, but I think the PEP should at least mention this. I think we want to get as official support for .pyz files on *nix as possible. Paul Moore wrote: I'll add a note to the PEP, but I have no idea how we would even go about that, so that's all I can do, unfortunately. Are you just looking for http://www.iana.org/assignments/media-types/media-types.xhtml and its references, including the registration procedures http://tools.ietf.org/html/rfc6838#section-4.2.5 and the application form at http://www.iana.org/form/media-types ? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 441 - Improving Python ZIP Application Support
On Wed, Feb 18, 2015 at 4:16 PM, Paul Moore p.f.mo...@gmail.com wrote: On 18 February 2015 at 20:48, Jim J. Jewett jimjjew...@gmail.com wrote: Barry Warsaw wrote: I don't know exactly what the procedure would be to claim .pyz for *nix, e.g. updating /etc/mime.types, ... Are you just looking for http://www.iana.org/assignments/media-types/media-types.xhtml and ... That covers mime types, but not file extensions, so it's not really what *I* thought Barry was talking about. Question 13 at http://www.iana.org/form/media-types asks for additional information, and specifically calls out Magic Number and File Extension, among others. I doubt there is any more official repository for file extension meaning within MIME or unix. Also, I don't think reserving anything is something I, as an individual (and specifically a non-Unix user) should do. It probably should be handled by the PSF, as the process seems to need a contact email address... Ideally, it would be a long-lasting organizational address, such as pep-edi...@python.org. But often, it isn't. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 489: Redesigning extension module loading
On 16 March 2015 Petr Viktorin wrote: If PyModuleCreate is not defined, PyModuleExec is expected to operate on any Python object for which attributes can be added by PyObject_GetAttr* and retrieved by PyObject_SetAttr*. I assume it is the other way around (add with Set and retrieve with Get), rather than a description of the required form of magic. PyObject *PyModule_AddCapsule( PyObject *module, const char *module_name, const char *attribute_name, void *pointer, PyCapsule_Destructor destructor) What happens if module_name doesn't match the module's __name__? Does it become a hidden attribute? A dotted attribute? Is the result undefined? Later, there is void *PyModule_GetCapsulePointer( PyObject *module, const char *module_name, const char *attribute_name) with the same apparently redundant arguments, but not a PyModule_SetCapsulePointer. Are capsule pointers read-only, or can they be replaced with another call to PyModule_AddCapsule, or by a simple PyObject_SetAttr? Subinterpreters and Interpreter Reloading ... No user-defined functions, methods, or instances may leak to different interpreters. By user-defined do you mean defined in python, as opposed to in the extension itself? If so, what is the recommendation for modules that do want to support, say, callbacks? A dual-layer mapping that uses the interpreter as the first key? Naming it _module and only using it indirectly through module.py, which is not shared across interpreters? Not using this API at all? To achieve this, all module-level state should be kept in either the module dict, or in the module object. I don't see how that is related to leakage. A simple rule of thumb is: Do not define any static data, except built-in types with no mutable or user-settable class attributes. What about singleton instances? Should they be per-interpreter? What about constants, such as PI? Where should configuration variables (e.g., MAX_SEARCH_DEPTH) be kept? What happens if this no-leakage rule is violated? Does the module not load, or does it just maybe lead to a crash down the road? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Thoughts on running Python 3.5 on Windows (path, pip install --user, etc)
On 10 March 2015 at slightly after midnight. Paul Moore wrote: Personally I doubt it would make much difference. If the docs say pygmentize I'm unlikely to dig around to find that the incantation python -m pygments.somemodule:main does the same thing using 3 times as many characters. I'd just add Python to my PATH and say stuff it. There is value in getting the incantation down to a single (preferably short) line, because then it can be used as a shortcut. That means it can be created as a shortcut at installation time, and that someone writing their own batch file can just cut and paste from the shortcut properties' target. Not as simple as just adding to the path, but simpler than adding several directories to the path, or modifying other environment variables, or fighting an existing but conflicting python installation already on the path. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Request for Pronouncement: PEP 441 - Improving Python ZIP Application Support
On 24 February 2015 at 18:58, Guido van Rossum guido at python.org wrote: The naming of the functions feels inconsistent -- maybe pack(directory, target) - create_archive(directory, archive), and set_interpreter() - copy_archive(archive, new_archive)? Paul Moore wrote: One possible source of confusion with copy_archive (and its command line equivalent python -m zipapp old.pyz -o new.pyz) is that it isn't technically a copy, as it changes the shebang line (if you omit the interpreter argument it removes the existing shebang). Is the difference between create and copy important? e.g., is there anything wrong with create_archive(old_archive, output=new_archive) working as well as create_archive(directory, archive)? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Request for Pronouncement: PEP 441 - Improving Python ZIP Application Support
On Wed, Feb 25, 2015 at 2:33 PM, Paul Moore p.f.mo...@gmail.com wrote: On 25 February 2015 at 17:06, Paul Moore p.f.mo...@gmail.com wrote: I've included the resulting API documentation below. It looks pretty good to me. Me too. I have a few nits anyhow. .. function:: create_archive(directory, target=None, interpreter=None, main=None) Create an application archive from *source*. The source can be any of the following: (1) *source* makes me think of source code, as opposed to binary. This is only a small objection, in part because I can't think of anything better. (2) If you do keep *source*, I think that the the directory parameter should be renamed to source. (3) * The name of an existing application archive file, in which case the file is copied to the target. == * The name of an existing application archive file, in which case the file is copied (possibly with changes) to the target. My concern is that someone who does want just another copy will use this, see copied, not read the other options, and be surprised when the shebang is dropped. * A file object open for reading in bytes mode. The content of the file should be an application archive, and the file object is assumed to be positioned at the start of the archive. I like this way of ducking the does it need to be seekable question. The *target* argument determines where the resulting archive will be written: * If it is the name of a file, the archive will be written to that file. (4) Note that the filename is not required to end with pyz, although that is good practice. Or maybe just be explicit that the function itself does not add a .pyz, and assumes that the caller will do so when appropriate. The *interpreter* argument specifies the name of the Python interpreter with which the archive will be executed. ... ... Omitting the *interpreter* results in no shebang line being written. (5) even if there was an explicit shebang line in the source archive. If an interpreter is specified, and the target is a filename, the executable bit of the target file will be set. (6) (target is a filename, or None) Or does that clarification just confuse the issue, and only benefit people so careful they'll verify it themselves anyway? (7) That is a good idea, but not quite as clear cut as it sounds. On unix, there are generally 3 different executable bits specifying *who* can run it. Setting the executable bit only for the owner is probably a conservative but sensible default. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 492: What is the real goal?
On Tue Apr 28 23:49:56 CEST 2015, Guido van Rossum quoted PEP 492: Rationale and Goals === Current Python supports implementing coroutines via generators (PEP 342), further enhanced by the ``yield from`` syntax introduced in PEP 380. This approach has a number of shortcomings: * it is easy to confuse coroutines with regular generators, since they share the same syntax; async libraries often attempt to alleviate this by using decorators (e.g. ``@asyncio.coroutine`` [1]_); So? PEP 492 never says what coroutines *are* in a way that explains why it matters that they are different from generators. Do you really mean coroutines that can be suspended while they wait for something slow? As best I can guess, the difference seems to be that a normal generator is using yield primarily to say: I'm not done; I have more values when you want them, but an asynchronous (PEP492) coroutine is primarily saying: This might take a while, go ahead and do something else meanwhile. As shown later in this proposal, the new ``async with`` statement lets Python programs perform asynchronous calls when entering and exiting a runtime context, and the new ``async for`` statement makes it possible to perform asynchronous calls in iterators. Does it really permit *making* them, or does it just signal that you will be waiting for them to finish processing anyhow, and it doesn't need to be a busy-wait? As nearly as I can tell, async with doesn't start processing the managed block until the asynchronous call finishes its work -- the only point of the async is to signal a scheduler that the task is blocked. Similarly, async for is still linearized, with each step waiting until the previous asynchronous step was not merely launched, but fully processed. If anything, it *prevents* within-task parallelism. It uses the ``yield from`` implementation with an extra step of validating its argument. ``await`` only accepts an *awaitable*, which can be one of: What justifies this limitation? Is there anything wrong awaiting something that eventually uses return instead of yield, if the this might take a while signal is still true? Is the problem just that the current implementation might not take proper advantage of task-switching? Objects with ``__await__`` method are called *Future-like* objects in the rest of this PEP. Also, please note that ``__aiter__`` method (see its definition below) cannot be used for this purpose. It is a different protocol, and would be like using ``__iter__`` instead of ``__call__`` for regular callables. It is a ``TypeError`` if ``__await__`` returns anything but an iterator. What would be wrong if a class just did __await__ = __anext__ ? If the problem is that the result of __await__ should be iterable, then why isn't __await__ = __aiter__ OK? ``await`` keyword is defined differently from ``yield`` and ``yield from``. The main difference is that *await expressions* do not require parentheses around them most of the times. Does that mean The ``await`` keyword has slightly higher precedence than ``yield``, so that fewer expressions require parentheses? class AsyncContextManager: async def __aenter__(self): await log('entering context') Other than the arbitrary keyword must be there limitations imposed by this PEP, how is that different from: class AsyncContextManager: async def __aenter__(self): log('entering context') or even: class AsyncContextManager: def __aenter__(self): log('entering context') Will anything different happen when calling __aenter__ or log? Is it that log itself now has more freedom to let other tasks run in the middle? It is an error to pass a regular context manager without ``__aenter__`` and ``__aexit__`` methods to ``async with``. It is a ``SyntaxError`` to use ``async with`` outside of a coroutine. Why? Does that just mean they won't take advantage of the freedom you offered them? Or are you concerned that they are more likely to cooperate badly with the scheduler in practice? It is a ``TypeError`` to pass a regular iterable without ``__aiter__`` method to ``async for``. It is a ``SyntaxError`` to use ``async for`` outside of a coroutine. The same questions about why -- what is the harm? The following code illustrates new asynchronous iteration protocol:: class Cursor: def __init__(self): self.buffer = collections.deque() def _prefetch(self): ... async def __aiter__(self): return self async def __anext__(self): if not self.buffer: self.buffer = await self._prefetch() if not self.buffer: raise StopAsyncIteration return self.buffer.popleft() then the ``Cursor`` class can be used as follows:: async for row in Cursor():
Re: [Python-Dev] PEP 492: What is the real goal?
On Wed, Apr 29, 2015 at 2:26 PM, Paul Moore p.f.mo...@gmail.com wrote: On 29 April 2015 at 18:43, Jim J. Jewett jimjjew...@gmail.com wrote: So? PEP 492 never says what coroutines *are* in a way that explains why it matters that they are different from generators. ... Looking at the Wikipedia article on coroutines, I see an example of how a producer/consumer process might be written with coroutines: var q := new queue coroutine produce loop while q is not full create some new items add the items to q yield to consume coroutine consume loop while q is not empty remove some items from q use the items yield to produce (To start everything off, you'd just run produce). I can't even see how to relate that to PEP 429 syntax. I'm not allowed to use yield, so should I use await consume in produce (and vice versa)? I think so ... but the fact that nothing is actually coming via the await channel makes it awkward. I also worry that it would end up with an infinite stack depth, unless the await were actually replaced with some sort of framework-specific scheduling primitive, or one of them were rewritten differently to ensure it returned to the other instead of calling it anew. I suspect the real problem is that the PEP is really only concerned with a very specific subtype of coroutine, and these don't quite fit. (Though it could be done by somehow making them both await on the queue status, instead of on each other.) -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 492: What is the real goal?
On Wed Apr 29 20:06:23 CEST 2015,Yury Selivanov replied: As best I can guess, the difference seems to be that a normal generator is using yield primarily to say: I'm not done; I have more values when you want them, but an asynchronous (PEP492) coroutine is primarily saying: This might take a while, go ahead and do something else meanwhile. Correct. Then I strongly request a more specific name than coroutine. I would prefer something that refers to cooperative pre-emption, but I haven't thought of anything that is short without leading to other types of confusion. My least bad idea at the moment would be self-suspending coroutine to emphasize that suspending themselves is a crucial feature. Even PEP492-coroutine would be an improvement. Does it really permit *making* them [asynchronous calls], or does it just signal that you will be waiting for them to finish processing anyhow, and it doesn't need to be a busy-wait? I does. Bad phrasing on my part. Is there anything that prevents an asynchronous call (or waiting for one) without the async with? If so, I'm missing something important. Either way, I would prefer different wording in the PEP. It uses the ``yield from`` implementation with an extra step of validating its argument. ``await`` only accepts an *awaitable*, which can be one of: What justifies this limitation? We want to avoid people passing regular generators and random objects to 'await', because it is a bug. Why? Is it a bug just because you defined it that way? Is it a bug because the await makes timing claims that an object not making such a promise probably won't meet? (In other words, a marker interface.) Is it likely to be a symptom of something that wasn't converted correctly, *and* there are likely to be other bugs caused by that same lack of conversion? For coroutines in PEP 492: __await__ = __anext__ is the same as __call__ = __next__ __await__ = __aiter__ is the same as __call__ = __iter__ That tells me that it will be OK sometimes, but will usually be either a mistake or an API problem -- and it explains why. Please put those 3 lines in the PEP. This is OK. The point is that you can use 'await log' in __aenter__. If you don't need awaits in __aenter__ you can use them in __aexit__. If you don't need them there too, then just define a regular context manager. Is it an error to use async with on a regular context manager? If so, why? If it is just that doing so could be misleading, then what about async with mgr1, mgr2, mgr3 -- is it enough that one of the three might suspend itself? class AsyncContextManager: def __aenter__(self): log('entering context') __aenter__ must return an awaitable Why? Is there a fundamental reason, or it is just to avoid the hassle of figuring out whether or not the returned object is a future that might still need awaiting? Is there an assumption that the scheduler will let the thing-being awaited run immediately, but look for other tasks when it returns, and a further assumption that something which finishes the whole task would be too slow to run right away? It doesn't make any sense in using 'async with' outside of a coroutine. The interpeter won't know what to do with them: you need an event loop for that. So does the PEP also provide some way of ensuring that there is an event loop? Does it assume that self-suspending coroutines will only ever be called by an already-running event loop compatible with asyncio.get_event_loop()? If so, please make these contextual assumptions explicit near the beginning of the PEP. It is a ``TypeError`` to pass a regular iterable without ``__aiter__`` method to ``async for``. It is a ``SyntaxError`` to use ``async for`` outside of a coroutine. The same questions about why -- what is the harm? I can imagine that as an implementation detail, the async for wouldn't be taken advtange of unless it was running under an event loop that knew to look for aync for as suspension points. I'm not seeing what the actual harm is in either not happening to suspend (less efficient, but still correct), or in suspending between every step of a regular iterator (because, why not?) For debugging this kind of mistakes there is a special debug mode in asyncio, in which ``@coroutine`` ... decorator makes the decision of whether to wrap or not to wrap based on an OS environment variable ``PYTHONASYNCIODEBUG``. (1) How does this differ from the existing asynchio.coroutine? (2) Why does it need to have an environment variable? (Sadly, the answer may be backwards compatibility, if you're really just specifying the existing asynchio interface better.) (3) Why does it need [set]get_coroutine_wrapper, instead of just setting the asynchio.coroutines.coroutine attribute? (4) Why do the get/set need to be in sys? Is the intent to do anything more than preface execution with: import asynchio.coroutines
Re: [Python-Dev] ABCs - Re: PEP 492: async/await in Python; version 4
On Sun May 3 08:32:02 CEST 2015, Stefan Behnel wrote: Ok, fair enough. So, how would you use this new protocol manually then? Say, I already know that I won't need to await the next item that the iterator will return. For normal iterators, I could just call next() on it and continue the for-loop. How would I do it for AIterators? Call next, then stick it somewhere it be waited on. Or is that syntactically illegal, because of the separation between sync and async? The asych for seems to assume that you want to do the waiting right now, at each step. (At least as far as this thread of the logic goes; something else might be happening in parallel via other threads of control.) BTW, I guess that this AIterator, or rather AsyncIterator, needs to be a separate protocol (and ABC) then. Implementing __aiter__() and __anext__() seems perfectly reasonable without implementing (or using) a Coroutine. That means we also need an AsyncIterable as a base class for it. Agreed. That might even help us to decide if we need new builtins (or helpers) aiter() and anext() in order to deal with these protocols. I hope not; they seem more like specialized versions of functions, such as are found in math or cmath. Ideally, as much as possible of this PEP should live in asycio, rather than appearing globally. Which reminds me ... *should* the await keyword work with any future, or is it really intentionally restricted to use with a single library module and 3rd party replacements? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 492: What is the real goal?
On Fri, May 1, 2015 at 2:59 PM, Guido van Rossum gu...@python.org wrote: On Fri, May 1, 2015 at 11:26 AM, Jim J. Jewett jimjjew...@gmail.com wrote: On Thu, Apr 30, 2015 at 3:32 PM, Guido van Rossum gu...@python.org wrote: (Guido:) Actually that's not even wrong. When using generators as coroutines, PEP 342 style, yield means I am blocked waiting for a result that the I/O multiplexer is eventually going to produce. So does this mean that yield should NOT be used just to yield control if a task isn't blocked? (e.g., if its next step is likely to be long, or low priority.) Or even that it wouldn't be considered a co-routine in the python sense? I'm not sure what you're talking about. Does next step refer to something in the current stack frame or something that you're calling? The next piece of your algorithm. None of the current uses of yield (the keyword) in Python are good for lowering priority of something. If there are more tasks than executors, yield is a way to release your current executor and go to the back of the line. I'm pretty sure I saw several examples of that style back when coroutines were first discussed. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 492: What is the real goal?
On Thu, Apr 30, 2015 at 3:32 PM, Guido van Rossum gu...@python.org wrote: (me:) A badly worded attempt to say Normal generator: yield (as opposed to return) means that the function isn't done, and there may be more things to return later. but an asynchronous (PEP492) coroutine is primarily saying: This might take a while, go ahead and do something else meanwhile. (Yuri:) Correct. (Guido:) Actually that's not even wrong. When using generators as coroutines, PEP 342 style, yield means I am blocked waiting for a result that the I/O multiplexer is eventually going to produce. So does this mean that yield should NOT be used just to yield control if a task isn't blocked? (e.g., if its next step is likely to be long, or low priority.) Or even that it wouldn't be considered a co-routine in the python sense? If this is really just about avoiding busy-wait on network IO, then coroutine is way too broad a term, and I'm uncomfortable restricting a new keyword (async or await) to what is essentially a Domain Specific Language. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 492: What is the real goal?
On Thu Apr 30 21:27:09 CEST 2015, Yury Selivanov replied: On 2015-04-30 2:41 PM, Jim J. Jewett wrote: Bad phrasing on my part. Is there anything that prevents an asynchronous call (or waiting for one) without the async with? If so, I'm missing something important. Either way, I would prefer different wording in the PEP. Yes, you can't use 'yield from' in __exit__/__enter__ in current Python. I tried it in 3.4, and it worked. I'm not sure it would ever be sensible, but it didn't raise any errors, and it did run. What do you mean by can't use? For coroutines in PEP 492: __await__ = __anext__ is the same as __call__ = __next__ __await__ = __aiter__ is the same as __call__ = __iter__ That tells me that it will be OK sometimes, but will usually be either a mistake or an API problem -- and it explains why. Please put those 3 lines in the PEP. There is a line like that: https://www.python.org/dev/peps/pep-0492/#await-expression Look for Also, please note... line. It was from reading the PEP that the question came up, and I just reread that section. Having those 3 explicit lines goes a long way towards explaining how an asychio coroutine differs from a regular callable, in a way that the existing PEP doesn't, at least for me. This is OK. The point is that you can use 'await log' in __aenter__. If you don't need awaits in __aenter__ you can use them in __aexit__. If you don't need them there too, then just define a regular context manager. Is it an error to use async with on a regular context manager? If so, why? If it is just that doing so could be misleading, then what about async with mgr1, mgr2, mgr3 -- is it enough that one of the three might suspend itself? 'with' requires an object with __enter__ and __exit__ 'async with' requires an object with __aenter__ and __aexit__ You can have an object that implements both interfaces. I'm not still not seeing why with (let alone await with) can't just run whichever one it finds. await with won't actually let the BLOCK run until the future is resolved. So if a context manager only supplies __enter__ instead of __aenter__, then at most you've lost a chance to switch tasks while waiting -- and that is no worse than if the context manager just happened to be really slow. For debugging this kind of mistakes there is a special debug mode in Is the intent to do anything more than preface execution with: import asynchio.coroutines asynchio.coroutines._DEBUG = True This won't work, unfortunately. You need to set the debug flag *before* you import asyncio package (otherwise we would have an unavoidable performance cost for debug features). If you enable it after you import asyncio, then asyncio itself won't be instrumented. Please see the implementation of asyncio.coroutine for details. Why does asynchio itself have to wrapped? Is that really something normal developers need to debug, or is it only for developing the stdlib itself? If it if only for developing the stdlib, than I would rather see workarounds like shoving _DEBUG into builtins when needed, as opposed to adding multiple attributes to sys. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 492: What is the real goal?
On Fri, May 1, 2015 at 4:10 PM, Guido van Rossum gu...@python.org wrote: On Fri, May 1, 2015 at 12:48 PM, Jim J. Jewett jimjjew...@gmail.com wrote: If there are more tasks than executors, yield is a way to release your current executor and go to the back of the line. I'm pretty sure I saw several examples of that style back when coroutines were first discussed. Could you dig up the actual references? It seems rather odd to me to mix coroutines and threads this way. I can try in a few days, but the primary case (and perhaps the only one with running code) was for n_executors=1. They assumed there would only be a single thread, or at least only one that was really important to the event loop -- the pattern was often described as an alternative to relying on threads. FWIW, Ron Adam's yielding in https://mail.python.org/pipermail/python-dev/2015-May/139762.html is in the same spirit. You replied it would be better if that were done by calling some method on the scheduling loop, but that isn't any more standard, and the yielding function is simple enough that it will be reinvented. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 492: What is the real goal?
On Fri May 1 23:58:26 CEST 2015, Yury Selivanov wrote: Yes, you can't use 'yield from' in __exit__/__enter__ in current Python. What do you mean by can't use? It probably executed without errors, but it didn't run the generators. True. But it did return the one created by __enter__, so it could be bound to a variable and iterated within the block. There isn't an easy way to run the generator created by __exit__, and I'm not coming up with any obvious scenarios where it would be a sensible thing to do (other than using with on a context manager that *does* return a future instead of finishing). That said, I'm still not seeing why the distinction is so important that we have to enforce it at a language level, as opposed to letting the framework do its own enforcement. (And if the reason is performance, then make the checks something that can be turned off, or offer a fully instrumented loop as an alternative for debugging.) Is the intent to do anything more than preface execution with: import asynchio.coroutines asynchio.coroutines._DEBUG = True If you enable it after you import asyncio, then asyncio itself won't be instrumented. Why does asynchio itself have to wrapped? Is that really something normal developers need to debug, or is it only for developing the stdlib itself? Yes, normal developers need asyncio to be instrumented, otherwise you won't know what you did wrong when you called some asyncio code without 'await' for example. I'll trust you that it *does* work that way, but this sure sounds to me as though the framework isn't ready to be frozen with syntax, and maybe not even ready for non-provisional stdlib inclusion. I understand that the disconnected nature of asynchronous tasks makes them harder to debug. I heartily agree that the event loop should offer some sort of debug facility to track this. But the event loop is supposed to be pluggable. Saying that this requires not merely a replacement, or even a replacement before events are added, but a replacement made before python ever even loads the default version ... That seems to be much stronger than sys.settrace -- more like instrumenting the ceval loop itself. And that is something that ordinary developers shouldn't have to do. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 492: async/await in Python; version 5
On Tue May 5 18:29:44 CEST 2015, Yury Selivanov posted an updated PEP492. Where are the following over-simplifications wrong? (1) The PEP is intended for use (almost exclusively) with asychronous IO and a scheduler such as the asynchio event loop. (2) The new syntax is intended to make it easier to recognize when a task's execution may be interrupted by arbitrary other tasks, and the interrupted task therefore has to revalidate assumptions about shared data. With threads, CPython can always suspend a task between op-codes, but with a sufficiently comprehensive loop (and sufficiently coooperative tasks), tasks *should* only be suspended when they make an explicit request to *wait* for an answer, and these points *should* be marked syntactically. (3) The new constructs explicitly do NOT support any sort of concurrent execution within a task; they are for use precisely when otherwise parallel subtasks are being linearized by pausing and waiting for the results. Over-simplifications 4-6 assume a world with standardized futures based on concurrent.futures, where .result either returns the result or raises the exception (or raises another exception about timeout or cancellation). [Note that the actual PEP uses iteration over the results of a new __await__ magic method, rather than .result on the object itself. I couldn't tell whether this was for explicit marking, or just for efficiency in avoiding future creation.] (4) await EXPR is just syntactic sugar for EXPR.result except that, by being syntax, it better marks locations where unrelated tasks might have a chance to change shared data. [And that, as currently planned, the result of an await isn't actually the result; it is an iterator of results.] (5) async def is just syntactic sugar for def, except that, by being syntax, it better marks the signatures of functions and methods where unrelated tasks might have a chance to change shared data after execution has already begun. (5A) As the PEP currently stands, it is also a promise that the function will NOT produce a generator used as an iterator; if a generator-iterator needs to wait for something else at some point, that will need to be done differently. I derive this limitation from It is a ``SyntaxError`` to have ``yield`` or ``yield from`` expressions in an ``async`` function. but I don't understand how this limitation works with things like a per-line file iterator that might need to wait for the file to be initially opened. (6) async with EXPR as VAR: would be equivalent to: with EXPR as VAR: except that __enter__() would be replaced by next(await __enter__()) # __enter__().result __exit__() would be replaced by next(await __exit__()) # __exit__().result (7) async for elem in iter: would be shorthand for: for elem in iter: elem = next(await elem) # elem.result -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 492: Please mention the Event Loop
On Tue May 5 21:44:26 CEST 2015,Brett Cannon wrote: It's not as complicated as it seems when you realize there is an event loop driving everything (which people have been leaving out of the conversation since it doesn't tie into the syntax directly). Another reason people don't realize it is that the PEP goes out of its way to avoid saying so. I understand that you (and Yuri) don't want to tie the PEP too tightly to the specific event loop implementation in asyncio.events.AbstractEventLoop, but ... that particular conflation isn't really what people are confused about. coroutines often brings up thoughts of independent tasks. Yuri may well know that (Python has asymmetric coroutines, that's it), but others have posted that this was a surprise -- and the people posting here have far more python experience than most readers will. Anyone deeply involved enough to recognize that this PEP is only about (1) a particular type of co-routine -- a subset even of prior python usage (2) used for a particular purpose (3) coordinated via an external scheduler will already know that they can substitute other event loops. Proposed second paragraph of the abstract: This PEP assumes that the asynchronous tasks are scheduled and coordinated by an Event Loop similar to that of stdlib module asyncio.events.AbstractEventLoop. While the PEP is not tied to any specific Event Loop implementation, it is relevant only to the kind of coroutine that uses yield as a signal to the scheduler, indicating that the coroutine will be waiting until an event (such as IO) is completed. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 492: async/await in Python; version 4
Tue May 5 21:48:36 CEST 2015, Yury Selivanov wrote: As for terminology, I view this discussion differently. It's not about the technical details (Python has asymmetric coroutines, that's it), but rather on how to disambiguate coroutines implemented with generators and yield-from, from new 'async def' coroutines. Not just How?, but Why?. Why do they *need* to be disambiguated? With the benefit of having recently read all that discussion (as opposed to just the PEP), my answer is ... uh ... that generators vs async def is NOT an important distinction. What matters (as best I can tell) is: something using yield (or yield from) to mark execution context switches vs other kinds of callables, including those using yield to make an iterator I'm not quite sure that the actual proposal even really separates them effectively, in part because the terminology keeps suggesting other distinctions instead. (The glossary does help; just not enough.) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unicode 8.0 and 3.5
On Thu Jun 18 20:33:13 CEST 2015, Larry Hastings asked: On 06/18/2015 11:27 AM, Terry Reedy wrote: Unicode 8.0 was just released. Can we have unicodedata updated to match in 3.5? What does this entail? Data changes, code changes, both? Note that the unicode 7 changes also need to be considered, because python 3.4 used unicode 6.3. There are some changes to the recommendations on what to use in identifiers. Python doesn't follow precisely the previous rules, but it would be good to ensure that any newly allowed characters are intentional -- particularly for the newly defined characters. My gut feel is that it would have been fine during beta, but for the 3rd RC I am not so sure. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Importance of async keyword
On Fri Jun 26 16:51:13 CEST 2015, Paul Sokolovsky wrote: So, currently in Python you know if you do: socket.write(buf) Then you know it will finish without interruptions for entire buffer. How do you know that? Are you assuming that socket.write is a builtin, rather than a python method? (Not even a python wrapper around a builtin?) Even if that were true, it would only mean that the call itself is processed within a single bytecode ... there is no guarantee that the write method won't release the GIL or call back into python (and thereby allow a thread switch) as part of its own logic. And if you write: await socket.write(buf) then you know there may be interruption points inside socket.write(), in particular something else may mutate it while it's being written. I would consider that external mutation to be bad form ... at least as bad as violating the expectation of an atomic socket.write() up above. So either way, nothing bad SHOULD happen, but it might anyhow. I'm not seeing what the async-coloring actually bought you... -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of PEP 484 and the typing module
Mark Shannon wrote: PY2, etc. really need to go. Assuming that this code type checks OK: if typing.PY2: type_safe_under_py2_only() else: type_safe_under_py3_only() Is the checker supposed to pass this: if sys.hexversion 0x0300: type_safe_under_py2_only() else: type_safe_under_py3_only() If it should pass, then why have PY2, etc. at all. My immediate response was that there really is a difference, when doing the equivalent of cross-compilation. It would help to make this explicit in the PEP. But ... If it should fail, well that is just stupid and annoying. so I'm not sure regular authors (as opposed to typing tools) would ever have reason to use it, and making stub files more different from regular python creates an attractive nuisance bigger than the clarification. So in the end, I believe PY2 should merely be part of the calling convention for type tools, and that may not be worth standardizing yet. It *is* worth explaining why they were taken out, though. And it is worth saying explicitly that typing tools should override the sys module when checking for non-native environments. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Status of PEP 484 and the typing module
At Thu May 21 22:27:50 CEST 2015, Guido wrote: I want to encourage users to think about annotations as types, and for most users the distinction between type and class is too subtle, So what is the distinction that you are trying to make? That a type refers to a variable (name), and a class refers to a piece of data (object) that might be bound to that name? Whatever the intended distinction is, please be explicit in the PEP, even if you decide to paper it over in normal code. For example, the above distinction would help to explain why the typing types can't be directly instantiated, since they aren't meant to refer to specific data. (They can still be used as superclasses because practicality beats purity, and using them as a marker base class is practical.) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Preserving the definition order of class namespaces.
On Sun May 24 12:06:40 CEST 2015, Nick Coghlan wrote: On 24 May 2015 at 19:44, Mark Shannon mark at hotpy.org wrote: On 24/05/15 10:35, Nick Coghlan wrote: If we leave __definition_order__ out for the time being then, for the vast majority of code, the fact that the ephemeral namespace used to evaluate the class body switched from being a basic dictionary to an ordered one would be a hidden implementation detail, rather than making all type objects a little bigger. and a little slower. The runtime namespace used to store the class attributes is remaining a plain dict object regardless, Lookup isn't any slower in the ordereddict. Inserts are slower -- and those would happen in the ordereddict, as the type object is being defined. Note that since we're talking about the type objects, rather than the instances, most* long-running code won't care, but it will hurt startup time. *code which creates lots of throwaway classes is obviously an exception. FWIW, much of the extra per-insert cost is driven by either the need to keep deletion O(1) or the desire to keep the C layout binary compatible. A different layout (with its own lookdict) could optimize for the insert-each-value-once case, or even for small dicts (e.g., keyword dicts). I could imagine this producing a speedup, with the ordering being just a side benefit. It is too late to use such a new layout by default in 3.5, but we should be careful not to close it off. (That said, I don't think __definition_order__ would actually close it off, though it might start to look like a wart.) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEPs and PEP 8 changes
PEP 498 is only the latest PEP where part of the concern is fear that it may encourage certain types of bad code. Would it be reasonable to ask PEPs to start including a section on any recommended changes to PEP8? (e.g., "If an embedded expression doesn't fit on a single line, factor it out to a named variable.") I realize that there will be times when best practices (or common mistakes) aren't obvious in advance, but I'm a bit uncomfortable with "PEP 8 will probably grow advice"... if we expect to need such advice, then we should probably include it from the start. (PEP 8 is, after all, only advice.) -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 509
(1) Please make it clear within the abstract what counts as a change. (1a) E.g., a second paragraph such as "Adding or removing a key, or replacing a value, counts as a change. Modifying an object in place, or replacing it with itself may not be picked up." (1b) Is there a way to force a version update? d[k]=d[k] seems like it should do that (absent the optimization to prevent it), but I confess that I can't come up with a good use case that doesn't start seeming internal to a specific optimizer. (1c) Section "Guard against changing dict during iteration" says "Sadly, the dictionary version proposed in this PEP doesn't help to detect dictionary mutation." Why not? Wouldn't that mutation involve replacing a value, which ought to trigger a version change? (2) I would like to see a .get on the guard object, so that it could be used in place of the dict lookup even from python. If this doesn't make sense (e.g., doesn't really save time since the guard has to be used from python), please mention that in the Guard Example text. (3) It would be possible to define the field as reserved in the main header, and require another header to use it even from C. (3a) This level of privacy might be overkill, but I would prefer that the decision be explicit. (3b) The change should almost certainly be hidden from the ABI / Py_LIMITED_API -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Update PEP 7 to require curly braces in C
> On Jan 17, 2016, at 11:10, Brett Cannon wrote: >> While doing a review of http://bugs.python.org/review/26129/ >> ... update PEP 7 to remove the optionality of curly braces On Mon Jan 18 03:39:42 EST 2016, Andrew Barnert pointed out: > There are two ways you could do that. [The one most people are talking about, which often makes an if-clause visually too heavy ... though Alexander Walters pointed out that "Any excuse to break code out into more functions... is usually the right idea."] if (!obj) { return -1; } > Alternatively, it could say something like "braces must not be omitted; > when other C styles would use a braceless one-liner, a one-liner with > braces should be used instead; otherwise, they should be formatted as follows" That "otherwise" gets a bit awkward, but I like the idea. Perhaps "braces must not be omitted, and should normally be formatted as follows. ... Where other C styles would permit a braceless one-liner, the expression and braces may be moved to a single line, as follows: " if (x > 5) { y++ } I think that is clearly better, but it may be *too* lightweight for flow control. if (!obj) { return -1; } does work for me, and I think the \n{} may actually be useful for warning that flow control takes a jump. One reason I posted was to point to a specific example already in PEP 7 itself: if (type->tp_dictoffset != 0 && base->tp_dictoffset == 0 && type->tp_dictoffset == b_size && (size_t)t_size == b_size + sizeof(PyObject *)) return 0; /* "Forgive" adding a __dict__ only */ For me, that return is already visually lost, simply because it shares an indentation with the much larger test expression. Would that be better as either: /* "Forgive" adding a __dict__ only */ if (type->tp_dictoffset != 0 && base->tp_dictoffset == 0 && type->tp_dictoffset == b_size && (size_t)t_size == b_size + sizeof(PyObject *)) { return 0; } or: /* "Forgive" adding a __dict__ only */ if (type->tp_dictoffset != 0 && base->tp_dictoffset == 0 && type->tp_dictoffset == b_size && (size_t)t_size == b_size + sizeof(PyObject *)) { return 0; } -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] pathlib (was: Defining a path protocol)
(1) I think the "built-in" should instead be a module-level function in the pathlib. If you aren't already expecting pathlib paths, then you're just expecting strings to work anyhow, and a builtin isn't likely to be helpful. (2) I prefer that the function be explicit about the fact that it is downcasting the representation to a string. e.g., pathlib.path_as_string(my_path) But if the final result is ospath or fspath or ... I won't fight too hard, particularly since the output may be a bytestring rather than a str. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-ideas] Add citation() to site.py
On Sun Mar 20 16:26:03 EDT 2016, Nikolaus Rath wrote: > Which I believe makes it completely pointless to cite Python at all. As > far as I can see, nowadays citations are given for two reasons: > 1. To give the reader a starting point to get more information on a >topic. I don't often see references to good "starting points", but I'll grant the "get more information". > 2. To formally acknowledge the work done by someone else (who ends up >with an increased number of citations for the cited publication, >which is unfortunately a crucial metric in most academic hiring and >evaluation processes). There is a third category, of reader service. When I as a reader have wanted to follow a citation, it was because I wanted to know more about the specific claim it supposedly supported. In a few cases -- and these were probably the cases most valuable to the authors -- I wanted to build on the work, or test it out under new conditions. Ideally, my first step was to replicate the original result, to ensure that anything new I found was really caused by the intentional changes. If I was looking at a computational model, I really didn't even have the excuse of "too expensive to run that many subjects." For papers more than a few years old, even if the code was available, it generally didn't run -- and often didn't even compile. Were there a few missing utility files, or had they been using a language variant different from what had eventually become the standard? Obviously, it would have been better to just get a copy of the original environment, PDP and all. In real life, it was very helpful to know which version of which compiler the authors had been using. Even the authors who had managed to save their code didn't generally remember that level of detail about the original environment. Python today has much better backwards compatibility, but ... if some junior grad student (maybe not in CS) today came across code raising strings instead of Exceptions, how confident would she be that she had the real code, as opposed to a mangled transcription? Would it help if the paper had a citation that specified CPython 2.1 and she could still download a version of that ... where it worked? -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 509: Add a private version to dict
On Thu Apr 14 11:19:42 EDT 2016, Victor Stinner posted the latest draft of PEP 509; dict version_tag (1) Meta Question: If this is really only for CPython, then is "Standards Track" the right classification? (2) Why *promise* not to update the version_tag when replacing a value with itself? Isn't that the sort of quality-of-implementation issue that got pushed to a note for objects that happen to be represented as singletons, such as small integers or ASCII chars? I think it is a helpful optimization, and worth documenting ... I just think it should be at the layer of "this particular patch", rather than something that sounds like part of the contract. e.g., ... The global version is also incremented and copied to the dictionary version at each dictionary change. The following dict methods can trigger changes: * ``clear()`` * ``pop(key)`` * ``popitem()`` * ``setdefault(key, value)`` * ``__detitem__(key)`` * ``__setitem__(key, value)`` * ``update(...)`` .. note:: As a quality of implementation issue, the actual patch does not increment the version_tag when it can prove that there was no actual change. For example, clear() on an already-empty dict will not trigger a version_tag change, nor will updating a dict with itself, since the values will be unchanged. For efficiency, the analysis considers only object identity (not equality) when deciding whether to increment the version_tag. [2A] Do you want to promise that replacing a value with a non-identical object *will* trigger a version_tag update *even* if the objects are equal? I would vote no, but I realize backwards-compatibility may create such a promise implicitly. (3) It is worth being explicit on whether empty dicts can share a version_tag of 0. If this PEP is about dict content, then that seems fine, and it may well be worth optimizing dict creation. There are times when it is important to keep the same empty dict; I can't think of any use cases where it is important to verify that some *other* code has done so, *and* I can't get a reference to the correct dict for an identity check. (4) Please be explicit about the locking around version++; it is enough to say that the relevant methods already need to hold the GIL (assuming that is true). (5) I'm not sure I understand the arguments around a per-entry version. On the one hand, you never need a strong reference to the value; if it has been collected, then it has obviously been removed from the dict and should trigger a change even with per-dict. On the other hand, I'm not sure per-entry would really allow finer-grained guards to avoid lookups; just because an entry hasn't been modified doesn't prove it hasn't been moved to another location, perhaps by replacing a dummy in a slot it would have preferred. (6) I'm also not sure why version_tag *doesn't* solve the problem of dicts that fool the iteration guards by mutating without changing size ( https://bugs.python.org/issue19332 ) ... are you just saying that the iterator views aren't allowed to rely on the version-tag remaining stable, because replacing a value (as opposed to a key-value pair) is allowed? I had always viewed the failing iterators as a supporting-this-case- makes-the-code-too-slow-and-ugly limitation, rather than a data integrity check. When I do care about the data not changing, (an exposed variant of) version_tag is as likely to be what I want as a hypothetical keys_version_tag would be. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 509: Add a private version to dict
On Fri, Apr 15, 2016 at 4:41 PM, Victor Stinner <victor.stin...@gmail.com> wrote: > 2016-04-15 19:54 GMT+02:00 Jim J. Jewett <jimjjew...@gmail.com>: >> (2) Why *promise* not to update the version_tag when replacing a >> value with itself? > It's an useful property. For example, let's say that you have a guard > on globals()['value']. The guard is created with value=3. An unit test > replaces the value with 50, but then restore the value to its previous > value (3). Later, the guard is checked to decide if an optimization > can be used. > If the dict version is increased, you need a lookup. If the dict > version is not increased, the guard is cheap. I would expect the version to be increased twice, and therefore to require a lookup. Are you suggesting that unittest should provide an example of resetting the version back to the original value when it cleans up after itself? > In C, it's very cheap to implement the test "new_value == old_value", > it just compares two pointers. Yeah, I understand that it is likely a win in terms of performance, and a good way to start off (given that you're willing to do the work). I just worry that you may end up closing off even better optimizations later, if you make too many promises about exactly how you will do which ones. Today, dict only cares about ==, and you (reasonably) think that full == isn't always worth running ... but when it comes to which tests *are* worth running, I'm not confident that the answers won't change over the years. >> [2A] Do you want to promise that replacing a value with a >> non-identical object *will* trigger a version_tag update *even* >> if the objects are equal? > It's already written in the PEP: I read that as a description of what the code does, rather than a spec for what it should do... so it isn't clear whether I could count on that remaining true. For example, if I know that my dict values are all 4-digit integers, can I write: d[k] = d[k] + 0 and be assured that the version_tag will bump? Or is that something that a future optimizer might optimize out? >> (3) It is worth being explicit on whether empty dicts can share >> a version_tag of 0. If this PEP is about dict content, then that >> seems fine, and it may well be worth optimizing dict creation. > This is not part of the PEP yet. I'm not sure that I will modify the > PEP to use the version 0 for empty dictionaries. Antoine doesn't seem > to be convinced :-) True. But do note that "not hitting the global counter an extra time for every dict creation" is a more compelling reason than "we could speed up dict.clear(), sometimes". >> (4) Please be explicit about the locking around version++; it >> is enough to say that the relevant methods already need to hold >> the GIL (assuming that is true). > I don't think that it's important to mention it in the PEP. It's more > an implementation detail. The version can be protected by atomic > operations. Now I'm the one arguing from a specific implementation. :D My thought was that any sort of locking (including atomic operations) is slow, but if the GIL is already held, then there is no *extra* locking cost. (Well, a slightly longer hold on the lock, but...) >> (5) I'm not sure I understand the arguments around a per-entry >> version. >> On the one hand, you never need a strong reference to the value; >> if it has been collected, then it has obviously been removed from >> the dict and should trigger a change even with per-dict. > > Let's say that you watch the key1 of a dict. The key2 is modified, it > increases the version. Later, you test the guard: to check if the key1 > was modified, you need to lookup the key and compare the value. You > need the value to compare it. And the value for key1 is still there, so you can. The only reason you would notice that the key2 value had gone away is if you also care about key2 -- in which case the cached value is out of date, regardless of what specific value it used to hold. >> (6) I'm also not sure why version_tag *doesn't* solve the problem >> of dicts that fool the iteration guards by mutating without changing >> size ( https://bugs.python.org/issue19332 ) ... are you just saying >> that the iterator views aren't allowed to rely on the version-tag >> remaining stable, because replacing a value (as opposed to a >> key-value pair) is allowed? > If the dictionary values are modified during the loop, the dict > version is increased. But it's allowed to modify values when you > iterate on *keys*. Sure. So? I see three cases: (A) I don't care that the collection changed. The python implementation might, but I don't. (So no bug even today.) (B) I want to process exactly the collection that I started wit
Re: [Python-Dev] Updated PEP 509
On Sat, Apr 16, 2016 at 5:01 PM, Victor Stinnerwrote: > * I mentionned that version++ must be atomic, and that in the case of > CPython, it's done by the GIL Better; if those methods *already* hold the GIL, it is worth saying "already", to indicate that the change is not expensive. > * I removed the dict[key]=value; dict[key]=value. It's really a > micro-optimization. I also fear that Raymond will complain because it > adds an if in the hot code of dict, and the dict type is very > important for Python performance. That is an acceptable answer. Though I really do prefer explicitly *refusing to promise* either way when the replacement/replaced objects are ==. dicts (and other collections) already assume sensible ==, even explicitly allowing self-matches of objects that are not equal to themselves. I don't like the idea of making new promises that violate (or rely on violations of) that sensible == assumption. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 509: Add a private version to dict
On Fri, Apr 15, 2016 at 7:31 PM, Victor Stinner <victor.stin...@gmail.com> wrote: > .2016-04-15 23:45 GMT+02:00 Jim J. Jewett <jimjjew...@gmail.com>: ... >> I just worry that you may end up closing off even better optimizations >> later, if you make too many promises about exactly how you will do >> which ones. >> Today, dict only cares about ==, and you (reasonably) think that full >> == isn't always worth running ... but when it comes to which tests >> *are* worth running, I'm not confident that the answers won't change >> over the years. > I checked, currently there is no unit test for a==b, only for a is b. > I will add add a test for a==b but a is not b, and ensure that the > version is increased. Again, why? Why not just say "If an object is replaced by something equal to itself, the version_tag may not be changed. While the initial heuristics are simply to check for identity but not full equality, this may change in future releases." >> For example, if I know that my dict values are all 4-digit integers, >> can I write: >> >> d[k] = d[k] + 0 >> >> and be assured that the version_tag will bump? Or is that something >> that a future optimizer might optimize out? > Hum, I will try to clarify that. I would prefer that you clarify it to say that while the initial patch doesn't optimize that out, a future optimizer might. > The problem with storing an identifier (a pointer in C) with no strong > reference is when the object is destroyed, a new object can likely get > the same identifier. So it's likely that "dict[key] is old_value_id" > can be true even if dict[key] is now a new object. Yes, but it shouldn't actually be destroyed until it is removed from the dict, which should change version_tag, so that there will be no need to compare it. > Do you want to modify the PEP 509 to fix this issue? Or you don't > understand why the PEP 509 cannot be used to fix the issue? I'm > lost... I believe it *does* fix the issue in some (but not all) cases. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PyWeakref_GetObject() borrows its reference from... whom?
On Mon, Oct 10, 2016, at 14:04, MRAB wrote: > Instead of locking the object, could we keep the GIL, but have it > normally released? > A thread could then still call a function such as PyWeakref_GetObject() > that returns a borrowed reference, but only if it's holding the GIL. It > would be able to INCREF the reference before releasing the GIL again. So you need to get/release the GIL just to run a slightly faster function that doesn't bother with an extra incref/defcref pair? I think anyone willing to make those changes would be willing to switch to a non-borrowing version of that same function, and do an explicit DECREF if that is really what they wanted. On Tue, Oct 11, 2016 at 5:24 AM, Random832 wrote: > So, what stops the other thread which never asks for the GIL from > blowing away the reference? Or is this a special kind of lock that you > can "assert isn't locked" without locking it for yourself, and > INCREF/DECREF does so? On Mon Oct 10 15:36:59 EDT 2016, Chris Angelico wrote: > "assert isn't locked" is pretty cheap Yeah, but so is INCREF/DECREF on memory that is almost certainly in cache anyhow, because you're using the object right next to it. The write part hurts, particularly when trying to use multiple cores with shared memory, but any sort of indirection (even separating the refcount from the object, to allow per-core counters) ... well, it doesn't take much at all to be worse than INCREF/DECREF in even the normal case, let alone amortized across the the "drat, this object now has to be handled specially" cases. Imagine two memory pools, one for "immortal" objects (such as None) that won't be collected, and so don't need their memory dirtied when you INCREF/DECREF. Alas, now *every* INCREF and DECREF has to branch on the address to tell whether or not it should be a no-op. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] startup time repeated? why not daemon
I agree that startup time is a problem, but I wonder if some of the pain could be mitigated by using a persistent process. For example, in https://mail.python.org/pipermail/python-dev/2017-July/148664.html Ben Hoyt mentions that the Google Cloud SDK (CLI) team has found it "especially problematic for shell tab completion helpers, because every time you press tab the shell has to load your Python program" Decades ago, I learned to set my editor to vi instead of emacs for similar reasons -- but there was also an emacsclient option that simply opened a new window from an already running emacs process. tab completion seems like the exactly the sort of thing that should be sent to an existing process instead of creating a new one. Is it too hard to create a daemon server? Is the communication and context switch slower than a new startup? Is the pattern just not well-enough advertised? -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 550 v3 naming
Building on Brett's suggestion: FrameContext: used in/writable by one frame ContextStack: a FrameContext and its various fallbacks -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap
On Thu, Aug 24, 2017 at 1:12 AM, Yury Selivanov > On Thu, Aug 24, 2017 at 12:32 AM, Jim J. Jewett <jimjjew...@gmail.com> wrote: > The key requirement for using immutable datastructures is to make > "get_execution_context" operation fast. Do you really need the whole execution context, or do you just need the current value of a specific key? (Or, sometimes, the writable portion of the context.) > Currently, the PEP doesn't do > a good job at explaining why we need that operation and why it will be > used by asyncio.Task and call_soon, so I understand the confusion. OK, the schedulers need the whole context, but if implemented as a ChainMap (instead of per-key), isn't that just a single constant? As in, don't they always schedule to the same thread? And when they need another map, isn't that because the required context is already available from whichever code requested the scheduling? >> (A) How many values do you expect a typical generator to use? The >> django survey suggested mostly 0, sometimes 1, occasionally 2. So >> caching the values of all possible keys probably won't pay off. > Not many, but caching is still as important, because some API users > want the "get()" operation to be as fast as possible under all > conditions. Sure, but only because they view it as a hot path; if the cost of that speedup is slowing down another hot path, like scheduling the generator in the first place, it may not be worth it. According to the PEP timings, HAMT doesn't beat a copy-on-write dict until over 100 items, and never beats a regular dict.That suggests to me that it won't actually help the overall speed for a typical (as opposed to worst-case) process. >> And, of course, using a ChainMap means that the keys do NOT have to be >> predefined ... so the Key class really can be skipped. > The first version of the PEP had no ContextKey object and the most > popular complaint about it was that the key names will clash. That is true of any global registry. Thus the use of keys with prefixes like com.sun. The only thing pre-declaring a ContextKey buys in terms of clashes is that a sophisticated scheduler would have less difficulty figuring out which clashes will cause thrashing in the cache. Or are you suggesting that the key can only be declared once (as opposed to once per piece of code), so that the second framework to use the same name will see a RuntimeError? -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap
On Aug 24, 2017 11:02 AM, "Yury Selivanov" <yselivanov...@gmail.com> wrote: On Thu, Aug 24, 2017 at 10:05 AM, Jim J. Jewett <jimjjew...@gmail.com> wrote: > On Thu, Aug 24, 2017 at 1:12 AM, Yury Selivanov > On Thu, Aug 24, 2017 > at 12:32 AM, Jim J. Jewett <jimjjew...@gmail.com> wrote: If you look at this small example: foo = new_context_key() async def nested(): await asyncio.sleep(1) print(foo.get()) async def outer(): foo.set(1) await nested() foo.set(1000) l = asyncio.get_event_loop() l.create_task(outer()) l.run_forever() If will print "1", as "nested()" coroutine will see the "foo" key when it's awaited. Now let's say we want to refactor this snipped and run the "nested()" coroutine with a timeout: foo = new_context_key() async def nested(): await asyncio.sleep(1) print(foo.get()) async def outer(): foo.set(1) await asyncio.wait_for(nested(), 10) # !!! foo.set(1000) l = asyncio.get_event_loop() l.create_task(outer()) l.run_forever() So we wrap our `nested()` in a `wait_for()`, which creates a new asynchronous tasks to run `nested()`. That task will now execute on its own, separately from the task that runs `outer()`. So we need to somehow capture the full EC at the moment `wait_for()` was called, and use that EC to run `nested()` within it. If we don't do this, the refactored code would print "1000", instead of "1". I would expect 1000 to be the right answer! By the time it runs, 1000 (or mask_errors=false, to use a less toy example) is what its own controlling scope requested. If you are sure that you want the value frozen earlier, please make this desire very explicit ... this example is the first I noticed it. And please explain what this means for things like signal or warning masking. ContextKey is declared once for the code that uses it. Nobody else will use that key. Keys have names only for introspection purposes, the implementation doesn't use it, iow: var = new_context_key('aa') var.set(1) # EC = [..., {var: 1}] # Note the that EC has a "var" object itself as the key in the mapping, not "a". This I had also not realized. So effectively, they keys are based on object identity, with some safeguards to ensure that even starting with the same (interned) name will *not* produce the same object unless you passed it around explicitly, or are in the same same code unit (file, typically). This strikes me as reasonable, but still surprising. (I think of variables as typically named, rather than identified by address.) So please make this more explicit as well. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 550 and other python implementations
Should PEP 550 discuss other implementations? E.g., the object space used in pypy? -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Pep 550 and None/masking
Does setting an ImplicitScopeVar to None set the value to None, or just remove it? If it removes it, does that effectively unmask a previously masked value? If it really sets to None, then is there a way to explicitly unmask previously masked values? Perhaps the initial constructor should require an initial value (defaulting to None) and the docs should give examples both for using a sensible default value and for using a special "unset" marker. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Pep 550 module
I think there is general consensus that this should go in a module other than sys. (At least a submodule.) The specific names are still To Be Determined, but I suspect seeing the functions and objects as part of a named module will affect what works. So I am requesting that the next iteration just pick a module name, and let us see how that looks. E.g import dynscopevars user=dynscopevars.Var ("username") myscope=dynscopevars.get_current_scope() childscope=dynscopevars.Scope (parent=myscope,user="bob") -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 550 leak-in vs leak-out, why not just a ChainMap
In https://mail.python.org/pipermail/python-dev/2017-August/148869.html Nick Coghlan wrote: > * what we want to capture at generator creation time is > the context where writes will happen, and we also > want that to be the innermost context used for lookups So when creating a generator, we want a new (empty) ImplicitContext map to be the head of the ChainMap. Each generator should have one of its own, just as each generator has its own frame. And the ChainMap delegation goes up the call stack, just as an exception would. Eventually, it hits the event loop (or other Executor) which is responsible for ensuring that the ChainMap eventually defers to the proper (Chain)Map for this thread or Task. > While the context is defined conceptually as a nested chain of > key:value mappings, we avoid using the mapping syntax because of the > way the values can shift dynamically out from under you based on who > called you ... > instead of having the problem of changes inside the > generator leaking out, we instead had the problem of > changes outside the generator *not* making their way in I still don't see how this is different from a ChainMap. If you are using a stack(chain) of [d_global, d_thread, d_A, d_B, d_C, d_mine] maps as your implicit context, then a change to d_thread map (that some other code could make) will be visible unless it is masked. Similarly, if the fallback for d_C changes from d_B to d_B1 (which points directly to d_thread), that will be visible for any keys that were previously resolved in d_A or d_B, or are now resolved in dB1. Those seem like exactly the cases that would (and should) cause "shifting values". This does mean that you can't cache everything in the localmost map, but that is a problem with the optimization regardless of how the implementation is done. In https://mail.python.org/pipermail/python-dev/2017-August/148873.html Yury Selivanov wrote: > Any code that uses EC will not see any difference [between mutable vs immutable but replaced LC maps], > because it can only work with the top LC. > Back to generators. Generators have their own empty LCs when created > to store their *local* EC modifications. OK, so just as they have their own frame, they have their own ChainMap, and the event loop is responsible for resetting the fallback when it schedules them. > When a generator is *being* iterated, it pushes its LC to the EC. When > the iteration step is finished, it pops its LC from the EC. I'm not sure it helps to think of a single stack. When the generator is active, it starts with its own map. When it is in the call chain of the active generator, its map will be in the chain of delegations. When neither it nor a descendant are active, no code will end up delegating to it. If the delegation graph has 543 generators delegating directly to the thread-wide map, there is no reason to pop/push an execution stack every time a different generator is scheduled, since only that generator itself (and code it calls) will even care. > HAMT is a way to efficiently implement immutable mappings ... > using regular dicts and copy, set() would be O(log N) Using a ChainMap, set affects only the localmost map and is therefore O(1). get could require stacksize lookups, but ... (A) How many values do you expect a typical generator to use? The django survey suggested mostly 0, sometimes 1, occasionally 2. So caching the values of all possible keys probably won't pay off. (B) Other than the truly global context and thread-level context, how many of these maps do you expect to be non-empty? (C) How deep do you expect the stack to get? Are we talking about 100 layers of mappings to check between the current generator and the thread-wide defaults? Even if we are, verifying that there hasn't been a change in some mid-level layer requires tracking the versions of each mid-level layer. (If version is globally unique, that would also ensure that the stack hasn't changed.) Is that really faster than checking that the map is empty? And, of course, using a ChainMap means that the keys do NOT have to be predefined ... so the Key class really can be skipped. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 560: bases classes / confusion
(1) I found the following (particularly "bases classes") very confusing: """ If an object that is not a class object appears in the bases of a class definition, then ``__mro_entries__`` is searched on it. If found, it is called with the original tuple of bases as an argument. The result of the call must be a tuple, that is unpacked in the bases classes in place of this object. (If the tuple is empty, this means that the original bases is simply discarded.) """ Based on the following GenericAlias/NewList/Tokens example, I think I now I understand what you mean, and would have had somewhat less difficulty if it were expressed as: """ When an object that is not a class object appears in the (tuple of) bases of a class definition, then attribute ``__mro_entries__`` is searched on that non-class object. If ``__mro_entries__`` found, it is called with the entire original tuple of bases as an argument. The result of the call must be a tuple, which is unpacked and replaces only the non-class object in the tuple of bases. (If the tuple is empty, this means that the original bases is simply discarded.) """ Note that this makes some assumptions about the __mro_entries__ signature that I wasn't quite sure about from the example. So building on that: class ABList(A, NewList[int], B): I *think* the following will happen: "NewList[int]" will be evaluated, and __class_getitem__ called, so that the bases tuple will be (A, GenericAlias(NewList, int), B) # (A) I *think* __mro_entries__ gets called with the full tuple, # instead of just the object it is found on. # (B) I *think* it is called on the results of evaluating # the terms within the tuple, instead of the original # string representation. _tmp = __mro_entries__(A, GenericAlias(NewList, int), B) # (C) I *think* __mro_entries__ returns a replacement for # just the single object, even though it was called on # the whole tuple, without knowing which object it # represents. bases = (A, _tmp, B) # (D) If there are two non-class objects, I *think* the # second one gets the same arguments as the first, # rather than an intermediate tuple with the first such # object already substituted out. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] unfrozen dataclasses and __hash__ (subsets are OK)
I understand auto-generating the __hash__ (and __eq__) for a frozen container; that is just convenient. But why is there any desire to autogenerate a __hash__ for something that isn't frozen? Like a list or dict, the normal case would be for it not to have a hash at all, and the author *should* write out any explicit exceptions. The objection to that seems to be that someone might forget to add another field to the hash during later maintenance -- but so what? __hash__ should reference a subset of the fields used for equality, and strict subsets are OK. It *should* ignore some fields if that will provide the right balance between quick calculation and sufficient dispersion. If the record is complicated enough that forgetting a field is a likely problem, then the hash is probably already sufficiently complex without those new fields. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Dataclasses, frozen and __post_init__
On Mon, Feb 19, 2018 at 5:06 PM, Chris Barker - NOAA Federal < chris.barker at noaa.gov> wrote: > If I have this right, on the discussion about frozen and hash, a use > case was brought up for taking a few steps to create an instance (and > thus wanting it not frozen) and then wanting it hashable. > Which pointed to the idea of a β freeze this from now onβ method. > This seems another use case β maybe it would be helpful to be able to > freeze an instance after creation for multiple use-cases? Yes, it would be helpful. But in practice, I've just limited the hash function to only the attributes that are available before I need to stick the object in a dict. In practice, that has always been more than sufficient. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Nickname Binding (PEP 572)
I think part of the disconnect is that this enhancement could very easily be abused, and it seems likely that it will be, because the problems aren't visible while writing the code -- only when reading it later. I therefore suggest making it very clear in the PEP -- and probably in PEP 8 -- how these expressions should be limited. Simply renaming them to "nickname binding" would be start, but here is a rough draft for wording. When scanning code by eye, it is helpful that assignments are (almost) always at the start of a line. Even def and class statements can cause confusion if the reader didn't realize that the name referred to a class, rather than an instance. Moving assignments to the middle of a line will make it harder for someone else to read your code -- so don't do that. A nickname is just a regular name, except that it also suggests an intimate environment. If the name is purely for documentation, or will be used only later in the same expression (or, possibly, the same block or just after), then a nickname may be appropriate. But * If you are wondering what to do about type hints, then the expression is probably too complex to leave inline. Separate it out into a regular assignment statement; nicknames do not support type hints. * If you will be saving the value -- even as an attribute on self -- there is a chance it will be retrieved in a very different context. Use a regular assignment statement; nicknames are just simple names, not attributes or keys. * If you will be using the value somewhere else in your code, use a regular assignment statement. This makes it easier to find, and warns people that the value may be used again later. -jJ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com