[issue39812] Avoid daemon threads in concurrent.futures
Change by Josh Rosenberg : -- Removed message: https://bugs.python.org/msg416876 ___ Python tracker <https://bugs.python.org/issue39812> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39812] Avoid daemon threads in concurrent.futures
Josh Rosenberg added the comment: I think this is causing a regression for code that explicitly desires the ThreadPoolExecutor to go away abruptly when all other non-daemon threads complete (by choosing not to use a with statement, and if shutdown is called, calling it with wait=False, or even with those conditions, by creating it from a daemon thread of its own). It doesn't seem like it's necessary, since the motivation was "subinterpreters forbid daemon threads" and the same release that contained this change (3.9.0alpha6) also contained #40234's change that backed out the change that forbade spawning daemon threads in subinterpreters (because they now support them by default). If the conflicts with some uses of subinterpreters that make it necessary to use non-daemon threads, could that be made a configurable option (ideally defaulting to the pre-3.9 choice to use daemon threads)? -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue39812> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46175] Zero argument super() does not function properly inside generator expressions
Josh Rosenberg added the comment: Carlos: This has nothing to do with reloading (as Alex's repro shows, no reload calls are made). super() *should* behave the same as super(CLASS_DEFINED_IN, self), and it looks like the outer function is doing half of what it must do to make no-arg super() work in the genexpr (dis.dis reports that __class__ is being loaded, and a closure constructed from the genexpr that includes it, so __class__, which no-arg super pulls from closure scope to get its first argument, is there). The problem is that super() *also* assumes the first argument to the function is self, and a genexpr definitionally receives just one argument, the iterator (the outermost one for genexprs with nested loops). So no-arg super is doing the equivalent of: super(__class__, iter(vars)) when it should be doing: super(__class__, self) Only way to fix it I can think of would be one of: 1. Allow a genexpr to receive multiple arguments to support this use case (ugly, requires significant changes to current design of genexprs and probably super() too) 2. Somehow teach super() to pull self (positional argument #1 really; super() doesn't care about names) from closure scope (and make the compiler put self in the closure scope when it builds the closure) when run in a genexpr. Both options seem... sub-optimal. Better suggestions welcome. Note that the same problem affects the various forms of comprehension as well (this isn't specific to the lazy design of genexprs; listcomps have the same problem). -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue46175> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46148] Optimize pathlib
Josh Rosenberg added the comment: Note: attrgetter could easily be made faster by migrating it to use vectorcall. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue46148> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46082] type casting of bool
Josh Rosenberg added the comment: Agreed, this is not a bug. The behavior of the bool constructor is not a parser (unlike, say, int), it's a truthiness detector. Non-empty strings are always truthy, by design, so both "True" and "False" are truthy strings. There's no bug to address here. -- nosy: +josh.r resolution: -> not a bug stage: -> resolved status: pending -> closed ___ Python tracker <https://bugs.python.org/issue46082> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45707] Variable reassginment triggers incorrect behaviors of locals()
Josh Rosenberg added the comment: This is a documented feature of locals() (it's definitionally impossible to auto-vivify *real* locals, because real locals are statically assigned to specific indices in a fixed size array at function compile time, and the locals() function is returning a copy of said bindings, not a live view of them). -- nosy: +josh.r resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue45707> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45520] Frozen dataclass deep copy doesn't work with __slots__
Josh Rosenberg added the comment: You're right that in non-dataclass scenarios, you'd just use __slots__. The slots=True thing was necessary for any case where any of the dataclass's attributes have default values (my_int: int = 0), or are defined with fields (my_list: list = field(default_factory=list)). The problem is that __slots__ is implemented by, after the class definition ends, creating descriptors on the class to access the data stored at known offsets in the underlying PyObject structure. Those descriptors themselves being class attributes means that when the type definition machinery tries to use __slots__ to create them, it finds conflicting class attributes (the defaults/fields) that already exist and explodes. Adding support for slots=True means it does two things: 1. It completely defines the class without slots, extracts the stuff it needs to make the dataclass separately, then deletes it from the class definition namespace and makes a *new* class with __slots__ defined (so no conflict occurs) 2. It checks if the dataclass is also frozen, and applies alternate __getstate__/__setstate__ methods that are compatible with a frozen, slotted dataclass #2 is what fixes this bug (while #1 makes it possible to use the full range of dataclass features without sacrificing the ability to use __slots__). If you need this to work in 3.9, you could borrow the 3.10 implementations that make this work for frozen dataclasses to explicitly define __getstate__/__setstate__ for your frozen slotted dataclasses: def __getstate__(self): return [getattr(self, f.name) for f in fields(self)] def __setstate__(self, state): for field, value in zip(fields(self), state): # use setattr because dataclass may be frozen object.__setattr__(self, field.name, value) I'm not closing this since backporting just the fix for frozen slotted dataclasses (without backporting the full slots=True functionality that's a new feature) is possibly within scope for a bugfix release of 3.9 (it wouldn't change the behavior of working code, and fixes broken code that might reasonably be expected to work). -- ___ Python tracker <https://bugs.python.org/issue45520> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45520] Frozen dataclass deep copy doesn't work with __slots__
Josh Rosenberg added the comment: When I define this with the new-in-3.10 slots=True argument to dataclass rather than manually defining __slots__ it works just fine. Looks like the pickle format changes rather dramatically to accommodate it. >>> @dataclass(frozen=True, slots=True) ... class FrozenData: ... my_string: str ... >>> deepcopy(FrozenData('initial')) FrozenData(my_string='initial') Is there a strong motivation to support manually defined __slots__ on top of slots=True that warrants fixing it for 3.10 onward? -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue45520> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45450] Improve syntax error for parenthesized arguments
Josh Rosenberg added the comment: Why not "lambda parameters cannot be parenthesized" (optionally "lambda function")? def-ed function parameters are parenthesized, so just saying "Function parameters cannot be parenthesized" seems very weird. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue45450> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45414] pathlib.Path.parents negative indexing is wrong for absolute paths
Josh Rosenberg added the comment: On the subject of sleep-deprived and/or sloppy, just realized: return self.__getitem__(len(self) + idx) should really just be: idx += len(self) no need to recurse. -- ___ Python tracker <https://bugs.python.org/issue45414> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45414] pathlib.Path.parents negative indexing is wrong for absolute paths
Josh Rosenberg added the comment: "We'll definitely want to make sure that we're careful about bad indices ... since it would be easy to get weird behavior where too-large negative indexes start 'wrapping around'" When I noticed the problem, I originally thought "Hey, the test for a negative index can come *before* the range check and save some work for negative indices". Then I realized, while composing this bug report, that that would make p.parents[-4] with len(p.parents) == 3 → p.parents[-1] as you said, and die with a RecursionError for p.parents[-3000] or so. I'm going to ignore the possibility I'm sleep-deprived and/or sloppy, and assume a lot of good programmers would think to make that "optimization" and accidentally introduce new bugs. :-) So yeah, all the tests. -- ___ Python tracker <https://bugs.python.org/issue45414> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45340] Lazily create dictionaries for plain Python objects
Josh Rosenberg added the comment: Hmm... And there's one other issue (that wouldn't affect people until they actually start worrying about memory overhead). Right now, if you want to determine the overhead of an instance, the options are: 1. Has __dict__: sys.getsizeof(obj) + sys.getsizeof(obj.__dict__) 2. Lacks __dict__ (built-ins, slotted classes): sys.getsizeof(obj) This change would mean even checking if something using this setup has a __dict__ creates one. Without additional introspection support, there's no way to tell the real memory usage of the instance without changing the memory usage (for the worse). -- ___ Python tracker <https://bugs.python.org/issue45340> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45340] Lazily create dictionaries for plain Python objects
Josh Rosenberg added the comment: Hmm... Key-sharing dictionaries were accepted largely without question because they didn't harm code that broke them (said code gained nothing, but lost nothing either), and provided a significant benefit. Specifically: 1. They imposed no penalty on code that violated the code-style recommendation to initialize all variables consistently in __init__ (code that always ended up using a non-sharing dict). Such classes don't benefit, but neither do they get penalized (just a minor CPU cost to unshare when it realized sharing wouldn't work). 2. It imposes no penalty for using vars(object)/object.__dict__ when you don't modify the set of keys (so reading or changing values of existing attributes caused no problems). The initial version of this worsens case #2; you'd have to convert to key-sharing dicts, and possibly to unshared dicts a moment later, if the set of attributes is changed. And when it happens, you'd be paying the cost of the now defunct values pointer storage for the life of each instance (admittedly a small cost). But the final proposal compounds this, because the penalty for lazy attribute creation (directly, or dynamically by modifying via vars()/__dict__) is now a per-instance cost of n pointers (one for each value). The CPython codebase rarely uses lazy attribute creation, but AFAIK there is no official recommendation to avoid it (not in PEP 8, not in the official tutorial, not even in PEP 412 which introduced Key-Sharing Dictionaries). Imposing a fairly significant penalty on people who aren't even violating language recommendations, let alone language rules, seems harsh. I'm not against this initial version (one pointer wasted isn't so bad), but the additional waste in the final version worries me greatly. Beyond the waste, I'm worried how you'd handle the creation of the first instance of such a class; you'd need to allocate and initialize an instance before you know how many values to tack on to the object. Would the first instance use a real dict during the first __init__ call that it would use to realloc the instance (and size all future instances) at the end of __init__? Or would it be realloc-ing for each and every attribute creation? In either case, threading issues seem like a problem. Seems like: 1. Even in the ideal case, this only slightly improves memory locality, and only provides a fixed reduction in memory usage per-instance (the dict header and a little allocator round-off waste), not one that scales with number of attributes. 2. Classes that would benefit from this would typically do better to use __slots__ (now that dataclasses.dataclass supports slots=True, encouraging that as a default use case adds little work for class writers to use them) If the gains are really impressive, might still be worth it. But I'm just worried that we'll make the language penalize people who don't know to avoid lazy attribute creation. And the complexity of this layered: 1. Not-a-dict 2. Key-sharing-dict 3. Regular dict approach makes me worry it will allow subtle bugs in key-sharing dicts to go unnoticed (because so little code would still use them). -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue45340> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21041] pathlib.PurePath.parents rejects negative indexes
Josh Rosenberg added the comment: Negative indexing is broken for absolute paths, see #45414. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue21041> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45414] pathlib.Path.parents negative indexing is wrong for absolute paths
New submission from Josh Rosenberg : At least on PosixPath (not currently able to try on Windows to check WindowsPath, but from a quick code check I think it'll behave the same way), the negative indexing added in #21041 is implemented incorrectly for absolute paths. Passing either -1 or -2 will return a path representing the root, '/' for PosixPath (which should only be returned for -1), and passing an index of -3 or beyond returns the value expected for that index + 1, e.g. -3 gets the result expected for -2, -4 gets the result for -3, etc. And for the negative index that should be equivalent to index 0, you end up with an IndexError. The underlying problem appears to be that absolute paths (at least, those created from a string) are represented in self._parts with the root '/' included (redundantly, since self._root has it too), so all the actual components of the path are offset by one. This does not affect slicing (slicing is implemented using range and slice.indices to perform normalization from negative to positive indices, so it never indexes with a negative index). Example: >>> from pathlib import Path >>> p = Path('/1/2/3') >>> p._parts ['/', '1', '2', '3'] >>> p.parents[:] (PosixPath('/1/2'), PosixPath('/1'), PosixPath('/')) >>> p.parents[-1] PosixPath('/') >>> p.parents[-1]._parts # Still behaves normally as self._root is still '/' [] >>> p.parents[-2] PosixPath('/') >>> p.parents[-2]._parts ['/'] >>> p.parents[-3] PosixPath('/1') >>> p.parents[-4] Traceback (most recent call last): ... IndexError: -4 It looks like the underlying problem is that the negative indexing code doesn't account for the possibility of '/' being in _parts and behaving as a component separate from the directory/files in the path. Frankly, it's a little odd that _parts includes '/' at all (Path has a ._root/.root attribute that stores it too, and even when '/' isn't in the ._parts/.parts, the generated complete path includes it because of ._root), but it looks like the docs guaranteed that behavior in their examples. It looks like one of two options must be chosen: 1. Fix the negative indexing code to account for absolute paths, and ensure absolute paths store '/' in ._parts consistently (it should not be possible to get two identical Paths, one of which includes '/' in _parts, one of which does not, which is possible with the current negative indexing bug; not sure if there are any documented code paths that might produce this warped sort of object outside of the buggy .parents), or 2. Make no changes to the negative indexing code, but make absolute paths *never* store the root as the first element of _parts (.parts can prepend self._drive/self._root on demand to match documentation). This probably involves more changes (lots of places assume _parts includes the root, e.g. the _PathParents class's own __len__ method raises a ValueError when called on the warped object returned by p.parents[-1], because it adjusts for the root, and the lack of one means it returns a length of -1). I think #1 is probably the way to go. I believe all that would require is to add: if idx < 0: return self.__getitem__(len(self) + idx) just before: return self._pathcls._from_parsed_parts(self._drv, self._root, self._parts[:-idx - 1]) so it never tries to use a negative idx directly (it has to occur after the check for valid index in [-len(self), len(self) so very negative indices don't recurse until they become positive). This takes advantage of _PathParents's already adjusting the reported length for the presence of drive/root, keeping the code simple; the alternative I came up with that doesn't recurse changes the original return line: return self._pathcls._from_parsed_parts(self._drv, self._root, self._parts[:-idx - 1]) to: adjust = idx >= 0 or not (self._drv or self._root) return self._pathcls._from_parsed_parts(self._drv, self._root, self._parts[:-idx - adjust]) which is frankly terrible, even if it's a little faster. -- components: Library (Lib) messages: 403488 nosy: josh.r priority: normal severity: normal status: open title: pathlib.Path.parents negative indexing is wrong for absolute paths versions: Python 3.10, Python 3.11 ___ Python tracker <https://bugs.python.org/issue45414> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17792] Unhelpful UnboundLocalError due to del'ing of exception target
Josh Rosenberg added the comment: Aaron: Your understanding of how LEGB works in Python is a little off. Locals are locals for the *entire* scope of the function, bound or unbound; deleting them means they hold nothing (they're unbound) but del can't actually stop them from being locals. The choice of whether to look something up in the L, E or GB portions of LEGB scoping rules is a *static* choice made when the function is defined, and is solely about whether they are assigned to anywhere in the function (without an explicit nonlocal/global statement to prevent them becoming locals as a result). Your second example can be made to fail just by adding a line after the print: def doSomething(): print(x) x = 1 and it fails for the same reason: def doSomething(): x = 10 del x print(x) fails; a local is a local from entry to exit in a function. Failure to assign to it for a while doesn't change that; it's a local because you assigned to it at least once, along at least one code path. del-ing it after assigning doesn't change that, because del doesn't get rid of locals, it just empties them. Imagine how complex the LOAD_FAST instruction would get if it needed to handle not just loading a local, but when the local wasn't bound, had to choose *dynamically* between: 1. Raising UnboundLocalError (if the value is local, but was never assigned) 2. Returning a closure scoped variable (if the value was local, but got del-ed, and a closure scope exists) 3. Raising NameError (if the closure scope variable exists, but was never assigned) 4. Returning a global/builtin variable (if there was no closure scope variable *or* the closure scope variable was created, but explicitly del-ed) 5. Raising NameError (if no closure, global or builtin name exists) That's starting to stretch the definition of "fast" in LOAD_FAST. :-) -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue17792> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45333] += operator and accessors bug?
Josh Rosenberg added the comment: This has nothing to do with properties, it's 100% about using augmented assignment with numpy arrays and mixed types. An equivalent reproducer is: a = np.array([1,2,3]) # Implicitly of dtype np.int64 a += 0.5 # Throws the same error, no properties involved The problem is that += is intended to operate in-place on mutable types, numpy arrays *are* mutable types (unlike normal integers in Python), you're trying to compute a result that can't be stored in a numpy array of integers, and numpy isn't willing to silently make augmented assignment with incompatible types make a new copy with a different dtype (they *could* do this, but it would lead to surprising behavior, like += on the *same* numpy array either operating in place or creating a new array with a different dtype and replacing the original based on the type on the right-hand side). The short form is: If your numpy computation is intended to produce a new array with a different data type, you can't use augmented assignment. And this isn't a bug in CPython in any event; it's purely about the choices (reasonable ones IMO) numpy made implementing their __iadd__ overload. -- nosy: +josh.r resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue45333> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44547] fraction.Fraction does not implement __int__.
Josh Rosenberg added the comment: Seems like an equally reasonable solution would be to make class's with __trunc__ but not __int__ automatically generate a __int__ in terms of __trunc__ (similar to __str__ using __repr__ when the latter is defined but not the former). The inconsistency is in both methods existing, but having the equivalence implemented in int() rather than in the type (thereby making SupportsInt behave unexpectedly, even though it's 100% true that obj.__int__() would fail). -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue44547> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44140] WeakKeyDictionary should support lookup by id instead of hash
Josh Rosenberg added the comment: Andrei: If designed appropriately, a weakref callback attached to the actual object would delete the associated ID from the dictionary when the object was being deleted to avoid that problem. That's basically how WeakKeyDictionary works already; it doesn't store the object itself (if it did, that strong reference could never be deleted), it just stores a weak reference for it that ensures that when the real object is deleted, a callback removes the weak reference from the WeakKeyDictionary; this just adds another layer to that work. I don't think this would make sense as a mere argument to WeakKeyDictionary; the implementation would differ significantly, and probably deserves a separate class. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue44140> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44470] 3.11 docs.python.org in Polish not English?
Josh Rosenberg added the comment: I just visited the link, and it's now *mostly* English, but with random bits of Korean in it (mostly in links and section headers). The first warning block for instance begins: 경고: The parser module is deprecated... Then a few paragraphs later I'm told: For full information on the language syntax, refer to 파이썬 언어 레퍼런스. where the Korean is a hyperlink to the Python Language Reference. Very strange. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue44470> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14995] PyLong_FromString documentation should state that the string must be null-terminated
Josh Rosenberg added the comment: The description is nonsensical as is; not sure the patch goes far enough. C-style strings are *defined* to end at the NUL terminator; if it really needs a NUL after the int, saying it "points to the first character which follows the representation of the number" is highly misleading; the NUL isn't logically a character in the C-string way of looking at things. The patch is also wrong; the digits need not end in a NUL byte (trailing whitespace is allowed). AFAICT, the function really uses pend for two purposes: 1. If it succeeds in parsing, then pend reports the end of the string, nothing else 2. If it fails, because the string is not a legal input (contains non-numeric, or non-leading/terminal whitespace or whatever), pend tells you where the first violation character that couldn't be massaged to meet the rules for int() occurred. #1 is a mostly useless bit of info (strlen would be equally informative, and if the value parsed, you rarely care how long it was anyway), so pend is, practically speaking, solely for error-checking/reporting. The rewrite should basically say what is allowed (making it clear anything beyond the single parsable integer value with optional leading/trailing whitespace is illegal), and making it clear that pend always points to the end of the string on success (not just after the representation of the number, it's after the trailing whitespace too), and on failure indicates where parsing failed. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue14995> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44318] Asyncio classes missing __slots__
Josh Rosenberg added the comment: Andrei: The size of an instance of Semaphore is 48 bytes + 104 more bytes for the __dict__ containing its three attributes (ignoring the cost of the attributes themselves). A slotted class with three attributes only needs 56 bytes of overhead per-instance (it has no __dict__, so the 56 is the total cost). Dropping overhead of the instances by >60% can make a difference if you're really making many thousands of them. Personally, I think Python level classes should generally default to using __slots__ unless the classes are explicitly not for subclassing; not using __slots__ means all subclasses have their hands tied by the decision of the parent class. Perhaps explicitly opting in to __weakref__ (which __slots__ removes by default) to allow weak referencing, but it's fairly rare a class *needs* to otherwise allow the creation of arbitrary attributes. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue44318> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44175] What do "cased" and "uncased" mean?
Josh Rosenberg added the comment: See the docs for the title method on what they mean by "titlecased"; "a" is self-evidently not titlecased. https://docs.python.org/3/library/stdtypes.html#str.title -- ___ Python tracker <https://bugs.python.org/issue44175> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44175] What do "cased" and "uncased" mean?
Josh Rosenberg added the comment: "Cased": Characters which are either lowercase or uppercase (they have some other equivalent form in a different case) "Uncased": Characters which are neither uppercase nor lowercase. Do you have a suggested alternate wording? -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue44175> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43824] array.array.__deepcopy__() accepts a parameter of any type
Josh Rosenberg added the comment: __deepcopy__ is required to take a second argument by the rules of the copy module; the second argument is supposed to be a memo dictionary, but there's no reason to use it for array.array (it can't contain Python objects, and you only use the memo dictionary when recursing to Python objects you contain). Sure, the second argument isn't being type-checked, but it's not used at all, and it's only supposed to be invoked indirectly via copy.deepcopy (that passes a dict). Can you explain what is wrong here that needs to be fixed? Seems like a straightforward "protocol requires argument, but use case doesn't have anything to do with it, so it ignores it". Are you suggesting adding type-checks for something that never gets used? -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue43824> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43464] set intersections should short-circuit
New submission from Josh Rosenberg : At present, set_intersection (the C name for set.intersection) optimizes for pairs of sets by iterating the smallest set and only adding entries found in the larger, meaning work is proportionate to the smallest input. But when the other input isn't a set, it goes with a naive solution, iterating the entire non-set, and adding entries found in the set. This is fine when the intersection will end up smaller than the original set (there's no way to avoid exhausting the non-set when that's the case), but when the intersection ends up being the same size as the original, we could add a cheap length check and short-circuit at that point. As is, {4}.intersection(range(1)) takes close to 1000 times longer than {4}.intersection(range(10)) despite both of them reaching the point where the outcome will be {4} at the same time. Since the length check for short-circuiting only needs to be performed when input set actually contains the value, the cost should be fairly low. I figure this would be the worst case for the change: {3, 4}.intersection((4,) * 1) where it performs the length check every time, and doesn't benefit from short-circuiting. But cases like: {4}.intersection((4,) * 1) or {4}.intersection(range(1)) would finish much faster. A similar optimization to set_intersection_multi (to stop when the intermediate result is empty) would make cases like: {4000}.intersection([1], range(1), range(10, 20)) also finish dramatically quicker in the (I'd assume not uncommon case) where the intersection of many iterables is empty, and this could be known quite early on (the cost of this check would be even lower, since it would only be performed once per iterable, not per-value). Only behavioral change this would cause is that errors resulting from processing items in an iterable that is no longer run to exhaustion due to short-circuiting wouldn't happen ({4}.intersection([4, []]) currently dies, but would succeed with short-circuiting; same foes for {4}.intersection([5], [[]]) if set_intersection_multi is optimized), and input iterators might be left only partially consumed. If that's acceptable, the required code changes are trivial. -- components: C API keywords: easy (C) messages: 388442 nosy: josh.r priority: normal severity: normal status: open title: set intersections should short-circuit versions: Python 3.10 ___ Python tracker <https://bugs.python.org/issue43464> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43363] memcpy writes to wrong destination
Josh Rosenberg added the comment: Agreed, stack is a PyObject**, so adding an integer (pto_nargs) to the pointer (stack) is implicitly by multiples of sizeof(PyObject*). This is how pointer arithmetic works in all versions of C I'm aware of. The code is correct. -- nosy: +josh.r resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue43363> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43297] bz2.open modes behaving differently than standard open() modes
Josh Rosenberg added the comment: All of the compression modules (gzip, lzma) have this behavior, not just bz2; it's consistent in that sense. Changing it now, after literally decades with the old behavior, would needlessly break existing programs. As you say, it's documented clearly, I'm not seeing a gain to be had strong enough to violate the existing documentation. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue43297> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43209] system cannot find the file specified in subprocess.py
Change by Josh Rosenberg : -- resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue43209> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43246] Dict copy optimization violates subclass invariant
Josh Rosenberg added the comment: The cause is in dict_merge (see here: https://github.com/python/cpython/blob/master/Objects/dictobject.c ); it has a fast path for when the object being merged in (which is what the dict constructor does; it makes an empty dict, then merges the provided dict-like) is: 1. A dict (or subclass thereof) 2. Which has not overridden __iter__ When that's the case, it assumes it's "dict-compatible" and performs the merge with heavy use of dict-internals. When it's not the case (as in your simple wrapper), it calls .keys() on the object, iterates that, and uses it to pull values via bracket lookup-equivalent code. I assume the choice of testing __iter__ (really, the C slot for tp_iter, which is equivalent) is for performance; it's more expensive to check if keys was overridden and/or if the __getitem__ implementation (of which there is more than one possibility for slots at the C layer) has been overridden. What the code is doing is probably logically wrong, but it's significantly faster than doing it the right way, and easy to work around (if you're writing your own dictionary-like thing with wildly different semantics, collections.abc.MutableMapping is probably a better base class to avoid inheriting dict-specific weirdness), so it's probably not worth fixing. Leaving open for others to discuss. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue43246> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43119] asyncio.Queue.put never yields if the queue is unbounded
Josh Rosenberg added the comment: Making literally every await equivalent to: await asyncio.sleep(0) followed by the actual await (which is effectively what you're proposing when you expect all await to be preemptible) means adding non-trivial overhead to all async operations (asyncio is based on system calls of the select/poll/epoll/kpoll variety, which add meaningful overhead when we're talking about an operation that is otherwise equivalent to an extremely cheap simple collections.deque.append call). It also breaks many reasonable uses of asyncio.wait and asyncio.as_completed, where the caller can reasonably expect to be able to await the known-complete tasks without being preempted (if you know the coroutine is actually done, it could be quite surprising/problematic when you await it and get preempted, potentially requiring synchronization that wouldn't be necessary otherwise). Making all await yield to the event loop would be like releasing the GIL before acquiring an uncontended lock; it makes an extremely cheap operation *much* higher overhead to, at best, fix a problem with poorly designed code. In real life, if whatever you're feeding the queue with is infinite and requires no awaiting to produce each value, you should probably just avoid the queue and have the consumer consume the iterable directly. Or just apply a maximum size to the queue; since the source of data to put is infinite and not-awaitable, there's no benefit to an unbounded queue, you may as well use a bound roughly fitted to the number of consumers, because any further items are just wasting memory well ahead of when it's needed. Point is, regular queue puts only block (and potentially release the GIL early) when they're full or, as a necessary consequence of threading being less predictable than asyncio, when there is contention on the lock protecting the queue internals (which is usually resolved quickly); why would asyncio queues go out of their way to block when they don't need to? -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue43119> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42948] bytearray.copy is undocumented
Josh Rosenberg added the comment: Does this need specific documentation? bytearray itself is documented with: > As bytearray objects are mutable, they support the mutable sequence > operations in addition to the common bytes and bytearray operations described > in Bytes and Bytearray Operations. where "mutable" is a link to all the mutable sequence operations ( https://docs.python.org/3/library/stdtypes.html#typesseq-mutable ), including copy. Specifically documenting copy for bytearray is pointless; are we going to add specific documentation for append and remove and all the other mutable sequence operations as well? -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue42948> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42958] filecmp.cmp(shallow=True) isn't actually shallow when only mtime differs
Josh Rosenberg added the comment: This is a problem with the docstring. The actual docs for it are a bit more clear, https://docs.python.org/3/library/filecmp.html#filecmp.cmp : "If shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared." Your patch can't be used because it changes longstanding documented behavior. If you'd like to submit a patch to fix the docstring, that's fine, but we're not going to break existing code to make the function less accurate. The patch should probably make the documentation more clear while it's at it. 1. The original wording could be misinterpreted as having the "Otherwise" apply to shallow=False only, not to the case where shallow=T rue but os.stat doesn't match. 2. The existing wording isn't clear on what an os.stat() "signature" is, which can leave the impression that the entirety of os.stat is compared (which would only match for hardlinks of the same file), when in fact it's just the file type (stat.S_IFMT(st.st_mode), file vs. directory vs. symlink, etc.), size and mtime. Proposed rewording of main docs would be: "If shallow is true, files with identical os.stat() signatures (file type, size, and modification time) are taken to be equal. When shallow is false, or the file signatures are identical, the contents of the files are compared." A similar wording (or at least, a shorter version of the above, rather than a strictly wrong description of the shallow parameter) could be applied to the docstring. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue42958> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42899] Is it legal to eliminate tests of a value, when that test has no effect on control flow?
Josh Rosenberg added the comment: Gregory: Even in a low-level compiled language (say, C++), pretty sure the compiler can't automatically optimize out: if (x) { } unless it has sure knowledge of the implementation of operator bool; if operator bool's implementation isn't in the header file, and link time optimization isn't involved, it has to call it to ensure any side-effects it might have are invoked. It can only bypass the call if it knows the implementation of operator bool and can verify it has no observable side-effects (as-if rule). There are exceptions to the as-if rule for optimizations in special cases (copy elision), but I'm pretty sure operator bool isn't one of them; if the optimizer doesn't know the implementation of operator bool, it must call it just in case it does something weird but critical to the program logic. Point is, I agree that: if x: pass must evaluate non-constant-literal x for truthiness, no matter how silly that seems (not a huge loss, given very little code should ever actually do that). -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue42899> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15373] copy.copy() does not properly copy os.environment
Josh Rosenberg added the comment: Would we remove the functionality of os.environ.copy()? It seems very odd for types to have a .copy() method that works, while not supporting copy.copy, especially when there is zero documentation, on the web or the docstring, to even hint at the difference. I'm strongly in favor of silently doing the right thing and behaving the same way the .copy() method already behaves; if you want a "copy" of os.environ that still modifies the environment, that's just aliasing it (envalias = os.environ), not copying at all. If you're trying to make a shallow copy, not an alias, you're trying to separate it from the parent, which every other dict-like thing does (assuming immutable values), where os.environ is a very weird exception (for copy.copy, but not the .copy() method). Can someone give an example where you'd want copy.copy to produce a "shallow copy" that acts like an alias, not an actual independent copy? -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue15373> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42826] typing.Iterable does not need __getitem__() method
Josh Rosenberg added the comment: As Serhiy says, the glossary term for an iterable is not the same as the documentation for typing.Iterable (which at this point is largely defined in terms of collections.abc.Iterable). True, collections.abc.Iterable does not detect classes that iterate via __getitem__, only via __iter__ (the docs are quite clear on this), but such __getitem__ based classes are still iterable in the broad sense of the term used in the glossary. -- nosy: +josh.r resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue42826> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`
Josh Rosenberg added the comment: This is a pessimization given the current implementation of str.join; it calls PySequence_Fast as the very first step, which is effectively free for a tuple or list input (just reference count manipulation), but must convert a generator expression to a list (which is slower than building the list with a listcomp in the first place). It does this so it can do two passes, one to compute the final length (and max ordinal) of the string, allowing it to allocate just once, and one to build the new string. In theory, it might be rewritten to use PyUnicodeWriter under-the-hood for single-pass operation, but as is, a generator expression is slower than a listcomp for this task. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue42699> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42689] Installation
Change by Josh Rosenberg : -- resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue42689> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42629] PyObject_Call not behaving as documented
Josh Rosenberg added the comment: Pingback from #42033. Proper fix for that issue likely involves moving the work for copying kwargs into PyObject_Call, which would fix this bug by side-effect. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue42629> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42033] Seemingly unnecessary complexification of foo(**kw)
Josh Rosenberg added the comment: Even if making a copy is necessary when the underlying function receives the dict "raw", preemptively performing the copy (before knowing if the function being called is a Vectorcall function) means that when it's a Vectorcall function (e.g. all user-defined functions, right?), instead of just copying from the original dict to the unpacked stack for vectorcall, it makes an intermediate copy, then copies from that copy to the unpacked stack later on; the copy is otherwise completely unused. The extra bytecode isn't even defending against "dict-like" kwargs, because CALL_FUNCTION_EX itself already copies to a true dict for anything that's not an exact dict (that defense shouldn't even be there if the bytecode compiler is already guaranteeing a true dict). Seems like, if preventing the caller's dict from being passed directly to the underlying function is necessary and intended, it should be done in PyObject_Call (which can avoid the copy entirely when call a Vectorcall function and when the reference count on the dict is 1), not at the bytecode interpreter layer. As is, PyObject_Call is already violating the documented behavior by *not* matching the behavior of callable(*args, **kwargs) (see #42629), so moving it to PyObject_Call would fix that problem and improve performance passing a single kwargs. -- ___ Python tracker <https://bugs.python.org/issue42033> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42646] Add function that supports "applying" methods
Josh Rosenberg added the comment: If you're annoyed by having to use two lines, one to copy, one to call the mutating method, you can use the walrus operator: (y := x.copy()).some_method() or: (y := deepcopy(x)).some_method() Does that cover your use case? For the list case, you'd normally just do: arr = lis[::-1] but: (arr = lis.copy()).reverse() also works. Granted, not super pretty. But I'm not seeing enough cases where this ugliness is truly unavoidable (the two lines don't bother me that much, and for built-ins, there is usually a one-liner that works fine, e.g. the reversing slice as shown, sorted over list.sort, etc.). I'll note: Unconditionally calling copy.copy is fine; it knows to try the __copy__ method of the things it is called on (and most things that offer copy alias it to __copy__ or are special-cased in copy.copy as well; if they don't, they should), so you're unlikely to need to perform the "try method, fall back to copy.copy" yourself. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue42646> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42612] Software Designer
Josh Rosenberg added the comment: A rough description is not sufficient. If you have code that reproduces the problem, post the reproducer so we can check, but odds are you've got a bug in your code. -- nosy: +josh.r status: open -> pending ___ Python tracker <https://bugs.python.org/issue42612> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39707] Abstract property setter/deleter implementation not enforced, but documented as such
Josh Rosenberg added the comment: If this is going to be closed as rejected, I think it still needs some improvement to the documentation. Right now, the docs for abstractproperty (deprecated in favor of combining property and abstractmethod) state: "If only some components are abstract, only those components need to be updated to create a concrete property in a subclass:" This heavily implies that if *all* components of the property are abstract, they must *all* be updated to create a concrete property on the subclass, when that is not the case (it's documenting a special way of overriding just one component by borrowing the base class, not a normal means of defining a property). If nothing else, mentioning this quirk in the docs seems like it would save confusion (e.g. https://stackoverflow.com/questions/65224767/python-abstract-property-cant-instantiate-abstract-class-with-abstract-me ). -- assignee: -> docs@python components: +Documentation nosy: +docs@python, josh.r resolution: rejected -> status: closed -> open title: Abstract property setter/deleter implementation not enforced. -> Abstract property setter/deleter implementation not enforced, but documented as such versions: +Python 3.10, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue39707> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42565] Traceback (most recent call last): File "", line 1, in NameError: name 'python' is not defined
Josh Rosenberg added the comment: Looks like someone tried to run python inside an interactive Python shell, rather than the command line. I'm moving to pending and will eventually close unless they add a repro for some actual bug. -- nosy: +josh.r status: open -> pending ___ Python tracker <https://bugs.python.org/issue42565> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42454] Move slice creation to the compiler for constants
Josh Rosenberg added the comment: Yep, Mark Shannon's solution of contextual hashing is what I was trying (without success) when my last computer died (without backing up work offsite, oops) and I gave up on this for a while. And Batuhan Taskaya's note about compiler dictionaries for the constants being a problem is where I got stuck. Switching to lists might work (I never pursued this far enough to profile it to see what the performance impact was; presumably for small functions it would be near zero, while larger functions might compile more slowly). The other approach I considered (and was partway through implementing when the computer died) was to use a dict subclass specifically for the constants dictionaries; inherit almost everything from regular dicts, but with built-in knowledge of slices so it could perform hashing on their behalf (I believe you could use the KnownHash APIs to keep custom code minimal; you just check for slices, fake their hash if you got one and call the KnownHash API, otherwise, defer to dict normally). Just an extension of the code.__hash__ trick, adding a couple more small hacks into small parts of Python so they treat slices as hashable only in that context without allowing non-intuitive behaviors in normal dict usage. -- ___ Python tracker <https://bugs.python.org/issue42454> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41878] python3 fails to use custom mapping object as symbols in eval()
Change by Josh Rosenberg : -- resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue41878> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42454] Move slice creation to the compiler for constants
Josh Rosenberg added the comment: There is an open issue for this already, under #11107 (a reopen of the closed #2268, where the reopen was justified due to Python 3 making slice objects more common), just so you know. I made a stab at this a while ago and gave up due to the problems with making slices constants while trying to keep them unhashable (and I never got to handling the marshal format updates properly). It doesn't seem right to incidentally make: something[::-1] = something actually work, and be completely nonsensical, when "something" happens to be a dict, when previously, you'd get a clear TypeError for trying to do it. I could definitely see code using duck-typing via slices to distinguish sequences from other iterables and mappings, and making mapping suddenly support slices in a nonsensical way is... odd. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue42454> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26290] fileinput and 'for line in sys.stdin' do strange mockery of input buffering
Josh Rosenberg added the comment: For those who find this in the future, the simplest workaround for the: for line in sys.stdin: issue on Python 2 is to replace it with: for line in iter(sys.stdin.readline, ''): The problem is caused by the way file.__next__'s buffering behaves, but file.readline doesn't use that code (it delegates to either fgets or a loop over getc/getc_unlocked that never overbuffers beyond the newline). Two-arg iter lets you make an iterator that calls readline each time you want a line, and considers a return of '' (which is what readline returns when you hit EOF) to terminate iteration. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue26290> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42269] Add ability to set __slots__ in dataclasses
Josh Rosenberg added the comment: Is the plan to allow an argument to auto-generate __slots__, or would this require repeating the names once in __slots__, and once for annotations and the like? -- ___ Python tracker <https://bugs.python.org/issue42269> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42269] Add ability to set __slots__ in dataclasses
Change by Josh Rosenberg : -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue42269> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39931] Global variables are not accessible from child processes (multiprocessing.Pool)
Change by Josh Rosenberg : -- resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue39931> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42033] Seemingly unnecessary complexification of foo(**kw)
Change by Josh Rosenberg : -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue42033> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41972] bytes.find consistently hangs in a particular scenario
Change by Josh Rosenberg : -- type: performance -> behavior ___ Python tracker <https://bugs.python.org/issue41972> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41972] bytes.find consistently hangs in a particular scenario
Josh Rosenberg added the comment: Can reproduce on Alpine Linux, with CPython 3.8.2 (running under WSLv2), so it's not just you. CPU usage is high; seems like it must be stuck in an infinite loop. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue41972> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41924] TextWrap's wrap method throws unhelpful error on bytes object
Josh Rosenberg added the comment: It's not textwrap that's doing it, which is why the error is so unhelpful; the input is assumed to be a str, and the translate method is called on it with a dict argument, which is valid for str.translate, but not for bytes.translate. You'll get other "unhelpful" error messages for other arguments (e.g. most other built-in types die because they lack an expandtabs method). Is it necessary to provide specific error messages when an API is given a type it never claimed to support? I could see issues with a "check for str" check if someone is implementing their own str-like type that matches the API but gets rejected for not being str. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue41924> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41878] python3 fails to use custom mapping object as symbols in eval()
Josh Rosenberg added the comment: Yes, list comprehensions having their own local scope was a change from Py2 to Py3. Python 2 did not do this for list comps initially, and it was left that way during the 2.x timeframe due to back compat constraints, but 2.x did it from the start for generator expressions, as well set and dict comps, and they were all made consistent for Py3. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue41878> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41702] Inconsistent behaviour in strftime
Josh Rosenberg added the comment: 3.8.2 (on Alpine Linux under WSL) produces '0020-10-05', just like your 3.6 example. Not seeing anything obvious in commit history that would break it for 3.7. That said, 3.7 is in security fix only mode at this point (see https://devguide.python.org/#status-of-python-branches ); as this works on the latest release, I'm thinking this won't be fixed for 3.7. -- nosy: +josh.r title: Inconcistent behaviour in strftime -> Inconsistent behaviour in strftime ___ Python tracker <https://bugs.python.org/issue41702> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41694] python3 futures.as_completed timeout broken if future contains undefined reference
Josh Rosenberg added the comment: The problem is a lot simpler than you're making it: 1. You submit a time.sleep(30) task. This begins running immediately 2. You try to submit another task, but a NameError is raised, bypassing the rest of the code (you never call as_completed, with or without a timeout) 3. The ThreadPoolExecutor's __exit__ is invoked, which implicitly invokes shutdown(wait=True). This does not return until the successfully submitted task (time.sleep(30)) finished. 4. At that point, the exception that was interrupted by with statement cleanup resumes bubbling All of this is behaving exactly as documented, no bug is occurring. -- nosy: +josh.r resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue41694> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36172] csv module internal consistency
Change by Josh Rosenberg : -- resolution: -> not a bug stage: -> resolved status: pending -> closed ___ Python tracker <https://bugs.python.org/issue36172> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41652] An Advice on Turning Ellipsis into Keyword
Josh Rosenberg added the comment: I'm closing this as not being worth the costs of adding new keywords. You're welcome to propose it on the python-ideas list (a more appropriate place to propose and suss out the details of significant language changes), but you'll need to formulate a much stronger reason for making this change. -- resolution: -> rejected stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue41652> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41652] An Advice on Turning Ellipsis into Keyword
Josh Rosenberg added the comment: You can do the same thing to replace int, float, dict, len, and all the other built-in classes and functions. Why is Ellipsis so special that it needs protection, especially when, as you note, ... is an available unoverrideable way to refer to it? Making new keywords is a high bar (because it can break existing code). What justifies this one beyond "don't want folks to mess with a barely used name"? -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue41652> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36379] nb_inplace_pow is always called with an invalid argument
Josh Rosenberg added the comment: Zackery, should this be closed? Or is there something missing from the patch? -- ___ Python tracker <https://bugs.python.org/issue36379> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36921] Deprecate yield from and @coroutine in asyncio
Josh Rosenberg added the comment: Was this supposed to deprecate using types.coroutine as a decorator as well? Because that's not clearly documented, which means people can still use it to make generator-based coroutines without async def. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue36921> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40269] Inconsistent complex behavior with (-1j)
Josh Rosenberg added the comment: The final entry is identical to the second to last, because ints have no concept of -0. If you used a float literal, it would match the first two: >>> -0.-1j (-0-1j) I suspect the behavior here is due to -1j not actually being a literal on its own; it's interpreted as the negation of 1j, where 1j is actually 0.0+1.0j, and negating it flips the sign on both the real and imaginary component. >From what I can read of the grammar rules, this is expected; the negation >isn't ever part of the literal (minus signs aren't part of the grammar aside >from exponents in scientific notation). >https://docs.python.org/3/reference/lexical_analysis.html#floating-point-literals If this is a bug, it's a bug in the grammar. I suspect the correct solution here is to include the real part explicitly, as 0.0-1j works just fine. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue40269> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40201] Last digit count error
Josh Rosenberg added the comment: Your script is using "true" division with / , (that produces potentially inaccurate float results) not floor division with // , (which gets int results). When the inputs vastly exceed the integer representational capabilities of floats (52-53 bits, where 10 ** 24 is 80 bits), you'll have problems. This is a bug in your script, not Python. -- nosy: +josh.r resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue40201> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36144] Dictionary union. (PEP 584)
Josh Rosenberg added the comment: Sorry, I think I need examples to grok this in the general case. ChainMap unioned with dict makes sense to me (it's equivalent to update or copy-and-update on the top level dict in the ChainMap). But ChainMap unioned with another ChainMap is less clear. Could you give examples of what the expected end result is for: d1 = {'a': 1, 'b': 2} d2 = {'b': 3, 'c': 4} d3 = {'a': 5, 'd': 6} d4 = {'d': 7, 'e': 8} cm1 = ChainMap(d1, d2) cm2 = ChainMap{d3, d4) followed by either: cm3 = cm1 | cm2 or cm1 |= cm2 ? As in, what is the precise state of the ChainMap cm3 or the mutated cm1, referencing d1, d2, d3 and d4 when they are still incorporated by references in the chain? My impression from what you said is that the plan would be for the updated cm1 to preserve references to d1 and d2 only, with the contents of cm2 (d3 and d4) effectively flattened and applied as an in-place update to d1, with an end result equivalent to having done: cm1 = ChainMap(d1, d2) d1 |= d4 d1 |= d3 (except the key ordering would actually follow d3 first, and d4 second), while cm3 would effectively be equivalent to having done (note ordering): cm3 = ChainMap(d1 | d4 | d3, d2) though again, key ordering would be based on d1, then d3, then d4, not quite matching the union behavior. And a reference to d2 would be preserved in the final result, but not any other original dict. Is that correct? If so, it seems like it's wasting ChainMap's key feature (lazy accumulation of maps), where: cm1 |= cm2 could be equivalent to either: cm1.maps += cm2.maps though that means cm1 wins overlaps, where normal union would have cm2 win, or to hew closer to normal union behavior, make it equivalent to: cm1.map[:0] = cm2.maps prepending all of cm2's maps to have the same duplicate handling rules as regular dicts (right side wins) at the expense of changing which map cm1 uses as the target for writes and deletes. In either case it would hew to the spirit of ChainMap, making dict "union"-ing an essentially free operation, in exchange for increasing the costs of lookups that don't hit the top dict. -- ___ Python tracker <https://bugs.python.org/issue36144> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36144] Dictionary union. (PEP 584)
Josh Rosenberg added the comment: What is ChainMap going to do? Normally, the left-most argument to ChainMap is the "top level" dict, but in a regular union scenario, last value wins. Seems like layering the right hand side's dict on top of the left hand side's would match dict union semantics best, but it feels... wrong, given ChainMap's normal left-to-right precedence. And top-mostness affects which dict receives all writes, so if chain1 |= chain2 operates with dict-like precedence (chain2 layers over chain1), then that also means the target of writes/deletions/etc. changes to what was on top in chain2. -- ___ Python tracker <https://bugs.python.org/issue36144> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39693] tarfile's extractfile documentation is misleading
New submission from Josh Rosenberg : The documentation for extractfile ( https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extractfile ) says: "Extract a member from the archive as a file object. member may be a filename or a TarInfo object. If member is a regular file or a link, an io.BufferedReader object is returned. Otherwise, None is returned." Before reading further, answer for yourself: What do you think happens when a provided filename doesn't exist, based on that documentation? In teaching a Python class that uses tarfile in the final project, and expects students to catch predictable errors (e.g. a random tarball being provided, rather than one produced by a different mode of the program with specific expected files) and convert them to user-friendly error messages, I've found this documentation to confuse students repeatedly (if they actually read it, rather than just guessing and checking interactively). Specifically, the documentation: 1. Says nothing about what happens if member doesn't exist (TarFile.getmember does mention KeyError, but extractfile doesn't describe itself in terms of getmember) 2. Loosely implies that it should return None in such a scenario "If member is a regular file or a link, an io.BufferedReader object is returned. Otherwise, None is returned." The intent is likely to mean "all other member types are None, and we're saying nothing about non-existent members", but everyone I've taught who has read the docs came away with a different impression until they tested it. Perhaps just reword from: "If member is a regular file or a link, an io.BufferedReader object is returned. Otherwise, None is returned." to: "If member is a regular file or a link, an io.BufferedReader object is returned. For all other existing members, None is returned. If member does not appear in the archive, KeyError is raised." Similar adjustments may be needed for extract, and/or both of them could be adjusted to explicitly refer to getmember by stating that filenames are converted to TarInfo objects via getmember. -- assignee: docs@python components: Documentation, Library (Lib) keywords: easy, newcomer friendly messages: 362298 nosy: docs@python, josh.r priority: normal severity: normal status: open title: tarfile's extractfile documentation is misleading versions: Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue39693> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36051] Drop the GIL during large bytes.join operations?
Josh Rosenberg added the comment: This will introduce a risk of data races that didn't previously exist. If you do: ba1 = bytearray(b'\x00') * 5 ba2 = bytearray(b'\x00') * 5 ... pass references to thread that mutates them ... ba3 = b''.join((ba1, ba2)) then two things will change from the existing behavior: 1. If the thread in question attempts to write to the bytearrays in place, then it could conceivably write data that is only partially picked up (ba1[0], ba1[4] = 2, 3 could end up copying the results of the second write without the first; at present, it could only copy the first without the second) 2. If the thread tries to change the size of the bytearrays during the join (ba1 += b'123'), it'll die with a BufferError that wasn't previously possible #1 isn't terrible (as noted, data races in that case already existed, this just lets them happen in more ways), but #2 is a little unpleasant; code that previously had simple data races (the data might be inconsistent, but the code ran and produced some valid output) can now fail hard, nowhere near the actual call to join that introduced the behavioral change. I don't think this sinks the patch (loudly breaking code that was silently broken before isn't awful), but I feel like a warning of some kind in the documentation (if only a simple compatibility note in What's New) might be appropriate. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue36051> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39167] argparse boolean type bug
Change by Josh Rosenberg : -- resolution: -> duplicate stage: -> resolved status: open -> closed superseder: -> ArgumentParser should support bool type according to truth values ___ Python tracker <https://bugs.python.org/issue39167> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38971] codecs.open leaks file descriptor when invalid encoding is passed
Josh Rosenberg added the comment: Any reason not to just defer opening the file until after the codec has been validated, so the resource acquisition comes last? -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue38971> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38934] Dictionaries of dictionaries behave incorrectly when created from dict.fromkeys()
Josh Rosenberg added the comment: That's the expected behavior, and it's clearly documented here: https://docs.python.org/3/library/stdtypes.html#dict.fromkeys Quote: "All of the values refer to just a single instance, so it generally doesn’t make sense for value to be a mutable object such as an empty list. To get distinct values, use a dict comprehension instead." -- nosy: +josh.r resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue38934> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38874] asyncio.Queue: putting items out of order when it is full
Josh Rosenberg added the comment: Yes, five outstanding blocked puts can be bypassed by a put that comes in immediately after a get creates space. But this isn't really a problem; there are no guarantees on what order puts are executed in, only a guarantee that once a put succeeds, it's FIFO ordered with respect to all other puts. Nothing in the docs even implies the behavior you're expecting, so I'm not seeing how even a documentation fix is warranted here. The docs on put clearly say: "Put an item into the queue. If the queue is full, wait until a free slot is available before adding the item." If we forcibly hand off on put even when a slot is available (to allow older puts to finish first), then we violate the expectation that waiting is only performed when the queue is full (if I test myqueue.full() and it returns False, I can reasonably expect that put won't block). This would be especially impossible to fix if people write code like `if not myqueue.full(): myqueue.put_nowait()`. put_nowait isn't even a coroutine, so it *can't* hand off control to the event loop to allow waiting puts to complete, even if it wanted to, and it can't fail to put (e.g. by determining the empty slots will be filled by outstanding puts in some relatively expensive way), because you literally *just* verified the queue wasn't full and had no awaits between the test and the put_nowait, so it *must* succeed. In short: Yes, it's somewhat unpleasant that a queue slot can become free and someone else can swoop in and steal it before older waiting puts can finish. But any change that "fixed" that would make all code slower (forcing unnecessary coroutine switches), and violate existing documentation guarantees. -- ___ Python tracker <https://bugs.python.org/issue38874> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38874] asyncio.Queue: putting items out of order when it is full
Josh Rosenberg added the comment: The items that haven't finished the put aren't actually "in" the queue yet, so I don't see how non-FIFO order of insertion violates any FIFO guarantees for the contents of the queue; until the items are actually "in", they're not sequenced for the purposes of when they come "out". Mandating such a guarantee effectively means orchestrating a queue with a real maxsize equal to the configured maxsize plus the total number of coroutines competing to put items into it. The guarantee is still being met here; once an item is put, it will be "get"-ed after anything that finished put-ing before it, and before anything that finished put-ing after it. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue38874> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38853] set.repr breaches docstring contract
Josh Rosenberg added the comment: To be clear, the docstring is explicitly disclaiming any ordering contract. If you're reading "unordered" as meaning "not reordered" (like a list or tuple, where the elements appear in insertion order), that's not what "unordered" means here. It means "arbitrary order". As it happens, the hashcodes of small integers correspond to their numerical values, (mostly, -1 is a special case), so if no collisions occur and the numbers are sequential, the ordering will often look like it was sorted in semi-numerical order, as in your case. That doesn't mean it's performing sorting, it just means that's how the hashes happened to distribute themselves across the buckets in the set. A different test case with slightly more distributed numbers won't create the impression of sorting: >>> print({-5, -1, 13, 17}) {17, -5, 13, -1} For the record, I chose that case to use CPython implementation details to produce a really unordered result (all the numbers are bucketed mod 8 in a set that small, and this produces no collisions, with all values mod 8 different from the raw value). On other versions of CPython, or alternate interpreters, both your case and mine could easily come out differently. Point is, this isn't a bug, just a quirk in the small int hash codes. Steven: I think they thought it was sorted in some string-related way, explaining (to them) why -1 was out of place (mind you, if it were string sorted, -1 would come first since the minus sign is ASCIIbetically first, 19 would fall between 1 and 2, and 25 between 2 and 3, so it doesn't hold up). There's no bug here. -- nosy: +josh.r resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue38853> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38710] unsynchronized write pointer in io.TextIOWrapper in 'r+' mode
Change by Josh Rosenberg : -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue38710> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38710] unsynchronized write pointer in io.TextIOWrapper in 'r+' mode
Change by Josh Rosenberg : -- components: +Library (Lib) ___ Python tracker <https://bugs.python.org/issue38710> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36906] Compile time textwrap.dedent() equivalent for str or bytes literals
Josh Rosenberg added the comment: Is there a reason folks are supporting a textwrap.dedent-like behavior over the generally cleaner inspect.cleandoc behavior? The main advantage to the latter being that it handles: '''First Second Third ''' just fine (removing the common indentation from Second/Third), and produces identical results with: ''' First Second Third ''' where textwrap.dedent behavior would leave the first string unmodified (because it removes the largest common indentation, and First has no leading indentation), and dedenting the second, but leaving a leading newline in place (where cleandoc removes it), that can only be avoided by using the typically discouraged line continuation character to make it: '''\ First Second Third ''' cleandoc behavior means the choice of whether the text begins and ends on the same line at the triple quote doesn't matter, and most use cases seem like they'd benefit from that flexibility. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue36906> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38560] Allow iterable argument unpacking after a keyword argument?
Josh Rosenberg added the comment: I'd be +1 on this, but I'm worried about existing code relying on the functional use case from your example. If we are going to discourage it, I think we either have to: 1. Have DeprecationWarning that turns into a SyntaxError, or 2. Never truly remove it, but make it a SyntaxWarning immediately and leave it that way indefinitely -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue38560> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38566] Description of '\w' behavior is vague in `re` documentation
Josh Rosenberg added the comment: The definition of \w, historically, has corresponded to the set of characters that can occur in legal variable names in C (alphanumeric ASCII plus underscores, making it equivalent to [a-zA-Z0-9_] for ASCII regex). That's why, on top of the definitely wordy alphabetic characters, and the arguably wordy numerics, it includes the underscore, _. That definition predates Unicode entirely, and Python is just building on it by expanding the definition of "alphanumeric" to encompass all alphanumeric characters in Unicode. We definitely can't remove underscores from the definition without breaking existing code which assumes a common subset of PCRE support (every regex flavor I know of includes underscores in \w). Adding the zero width characters seems of limited benefit (especially in the non-joiner case; if you're trying to pull out words, presumably you don't want to group letters across a non-joining boundary?). Basically, you're parsing "Unicode word characters" as "Unicode's definition of word characters", when it's really meant to mean "All word characters, not just ASCII". You omitted the clarifying remarks from the documentation though, the full description is: > Matches Unicode word characters; this includes most characters that can be > part of a word in any language, as well as numbers and the underscore. If the > ASCII flag is used, only [a-zA-Z0-9_] is matched. That's about as precise as I think we can make it (because technically, some of the things that count as "word characters" aren't actually part of an "alphabet" in the technical definition). If you think there is a clearer way of expressing it, please suggest a better phrasing, and this can be fixed as a documentation bug. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue38566> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34172] multiprocessing.Pool and ThreadPool leak resources after being deleted
Josh Rosenberg added the comment: It should probably be backport to all supported 3.x branches though, so people aren't required to move to 3.8 to benefit from it. -- ___ Python tracker <https://bugs.python.org/issue34172> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34172] multiprocessing.Pool and ThreadPool leak resources after being deleted
Josh Rosenberg added the comment: Pablo's fix looks like a superset of the original fix applied here, so I'm assuming it fixes this issue as well. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue34172> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32856] Optimize the `for y in [x]` idiom in comprehensions
Josh Rosenberg added the comment: OOC, rather than optimizing a fairly ugly use case, might another approach be to make walrus less leaky? Even if observable leakage is considered desirable, it strikes me that use cases for walrus in genexprs and comprehensions likely break up into: 1. 90%: Cases where variable is never used outside genexpr/comprehension (because functional programming constructs shouldn't have side-effects, gosh darn it!) 2. 5%: Cases where variable is used outside genexpr/comprehension and expects leakage 3. 5%: Cases where variable is used outside genexpr/comprehension, but never in a way that actually relies on the value set in the genexpr/comprehension (same name chosen by happenstance) If the walrus behavior in genexpr/comprehensions were tweaked to say that it only leaks if: 1. It's running at global scope (unavoidable, since there's no way to tell if it's an intended part of the module's interface) or 2. A global or nonlocal statement within the function made it clear the name was considered stateful (again, like running at global scope, there is no way to know for sure if the name will be used somewhere else) or 3. At some point in the function, outside the genexpr/comprehension, the value of the walrus-assigned name was read. Case #3 could be even more narrow if the Python AST optimizer was fancier, potentially something like "if the value was read *after* the genexpr/comprehension, but *before* any following *unconditional* writes to the same name" (so [leaked := x for x in it] wouldn't bother to leak "leaked" if the next line was "leaked = 1" even if "leaked" were read three lines later, or the only reads from leaked occurred before the genexpr/comprehension), but I don't think the optimizer is up to that; following simple rules similar to those the compiler already follows to identify local names should cover 90% of cases anyway. Aside from the dict returned by locals, and the possibility of earlier finalizer invocation (which you couldn't rely on outside CPython anyway), there's not much difference in behavior between a leaking and non-leaking walrus when the value is never referred to again, and it seems like the 90% case for cases where unwanted leakage occurs would be covered by this. Sure, if my WAG on use case percentages is correct, 5% of use cases would continue to leak even though they didn't benefit from it, but it seems like optimizing the 90% case would do a lot more good than optimizing what's already a micro-optimization that 99% of Python programmers would never use (and shouldn't really be encouraged, since it would rely on CPython implementation details, and produce uglier code). I was also inspired by this to look at replacing BUILD_LIST with BUILD_TUPLE when followed by GET_ITER (so "[y for x in it for y in [derived(x)]]" would at least get the performance benefit of looping over a one-element tuple rather than a one-element list), thinking it might reduce the overhead of [y for x in a for y in [x]] in your unpatched benchmark by making it equivalent to [y for x in a for y in (x,)] while reading more prettily, but it turns out you beat me to it with issue32925, so good show there! :-) You should probably rerun your benchmarks though; with issue32925 committed (a month after you posted the benchmarks here), the performance discrepancy should be somewhat less (estimate based on local benchmarking says maybe 20% faster with BUILD_LIST being optimized to BUILD_TUPLE). Still much faster with the proposed optimization than without, but I suspect even optimized, few folks will think to write their comprehensions to take advantage of it, which is why I was suggesting tweaks to the more obvious walrus operator. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue32856> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38167] O_DIRECT read fails with 4K mmap buffer
Josh Rosenberg added the comment: Yeah, not a bug. The I/O subsystem was substantially rewritten between Python 2 and Python 3, so you sometimes need to be more explicit about things like buffering, but as you note, once the buffering is correct, the code works; there's nothing to fix. -- resolution: -> not a bug stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.org/issue38167> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38167] O_DIRECT read fails with 4K mmap buffer
Josh Rosenberg added the comment: > I do not believe an unbuffered file uses O_DIRECT. This is why I use > os.open(fpath, os.O_DIRECT). Problem is you follow it with: fo = os.fdopen(fd, 'rb+') which introduces a Python level of buffering around the kernel unbuffered file descriptor. You'd need to pass buffering=0 to make os.fdopen avoid returning a buffered file object, making it: fo = os.fdopen(fd, 'rb+', buffering=0) -- ___ Python tracker <https://bugs.python.org/issue38167> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36947] Fix 3.3.3.1 Metaclasses Documentation
Josh Rosenberg added the comment: The existing documentation is correct, just hard to understand if you don't already understand the point of metaclasses (metaclasses are hard, the language to describe them will be inherently a little klunky). At some point, it might be nice to write a proper metaclass tutorial, even if it's only targeted at advanced users (the only people who should really be considering writing their own metaclasses or even directly using existing ones; everyone else should be using more targeted tools and/or inheriting from classes that already implement the desired metaclass). The Data model docs aren't concerned with tutorials and examples though; they're just dry description, and they're doing their job here, so I think this issue can be closed. -- ___ Python tracker <https://bugs.python.org/issue36947> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38255] Replace "method" with "attribute" in the description of super()
Josh Rosenberg added the comment: I prefer rhettinger's PR to your proposed PR; while super() may be useful for things other than methods, the 99% use case is methods, and deemphasizing that is a bad idea. rhettinger's PR adds a note about other use cases without interfering with super()'s primary use case. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue38255> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38241] Pickle with protocol=0 in python 3 does not produce a 'human-readable' format
Josh Rosenberg added the comment: I'll note, the same bug appears in Python 2, but only when pickling bytearray; since bytes in Python 2 is just a str alias, you don't see this misbehavior with it, only with bytearray (which is consistently incorrect/non-ASCII on both 2 and 3). -- ___ Python tracker <https://bugs.python.org/issue38241> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38241] Pickle with protocol=0 in python 3 does not produce a 'human-readable' format
Josh Rosenberg added the comment: This seems like a bug in pickle; protocol 0 is *defined* to be ASCII compatible. Nothing should encode to a byte above 0x7f. It's not actually supposed to be "human-readable" (since many ASCII bytes aren't printable), so the docs should be changed to describe protocol 0 as ASCII consistently; if this isn't fixed to make it ASCII consistently, "human-readable" is still meaningless and shouldn't be used. I'm kind of surprised the output from Py3 works on Py2 to be honest. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue38241> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38167] O_DIRECT read fails with 4K mmap buffer
Josh Rosenberg added the comment: Works just fine for me on 3.7.3 on Ubuntu, reading 4096 bytes. How is it failing for you? Is an exception raised? It does seem faintly dangerous to explicitly use O_DIRECT when you're wrapping it in a buffered reader that doesn't know it has to read in units matching the minimum block size (file system dependent on older kernels, 512 bytes in Linux kernel 2.6+); BufferedIOBase.readinto is explicitly documented to potentially issue multiple read calls (readinto1 guarantees it won't do that at least). -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue38167> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33214] join method for list and tuple
Josh Rosenberg added the comment: Note that all of Serhiy's examples are for a known, fixed number of things to concatenate/union/merge. str.join's API can be used for that by wrapping the arguments in an anonymous tuple/list, but it's more naturally for a variable number of things, and the unpacking generalizations haven't reached the point where: [*seq for seq in allsequences] is allowed. list(itertools.chain.from_iterable(allsequences)) handles that just fine, but I could definitely see it being convenient to be able to do: [].join(allsequences) That said, a big reason str provides .join is because it's not uncommon to want to join strings with a repeated separator, e.g.: # For not-really-csv-but-people-do-it-anyway ','.join(row_strings) # Separate words with spaces ' '.join(words) # Separate lines with newlines '\n'.join(lines) I'm not seeing even one motivating use case for list.join/tuple.join that would actually join on a non-empty list or tuple ([None, 'STOP', None] being rather contrived). If that's not needed, it might make more sense to do this with an alternate constructor (a classmethod), e.g.: list.concat(allsequences) which would avoid the cost of creating an otherwise unused empty list (the empty tuple is a singleton, so no cost is avoided there). It would also work equally well with both tuple and list (where making list.extend take varargs wouldn't help tuple, though it's a perfectly worthy idea on its own). Personally, I don't find using itertools.chain (or its from_iterable alternate constructor) all that problematic (though I almost always import it with from itertools import chain to reduce the verbosity, especially when using chain.from_iterable). I think promoting itertools more is a good idea; right now, the notes on concatenation for sequence types mention str.join, bytes.join, and replacing tuple concatenation with a list that you call extend on, but doesn't mention itertools.chain at all, which seems like a failure to make the best solution the discoverable/obvious solution. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue33214> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38116] Make select module PEP-384 compatible
Josh Rosenberg added the comment: Why do you describe these issues (this one, #38069, #38071-#38076, maybe more) as making the module PEP 384 compatible? There is no reason to make the built-in modules stick to the limited API, and it doesn't look like you're doing that in any event (among other things, pretty sure Argument Clinic generated code isn't limited API compatible yet, though that might be changing?). Seems like the main (only?) change you're making is to convert all static types to dynamic types. Which is fine, if it's necessary for PEP 554, but it seems only loosely related to PEP 384 (which defined mechanisms for "statically" defining dynamic heap types, but that wasn't the main thrust). -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue38116> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38003] Incorrect "fixing" of isinstance tests for basestring
Josh Rosenberg added the comment: basestring in Python 2 means "thing that is logically text", because in Python 2, str can mean *either* logical text *or* binary data, and unicode is always logical text. str and unicode can kinda sorta interoperate on Python 2, so it can make sense to test for basestring if you're planning to use it as logical text; if you do 'foo' + u'bar', that's fine in Python 2. In Python 3, only str is logically text; b'foo' + 'bar' is completely illegal, so it doesn't make sense to convert it to recognize both bytes and str. Your problem is that you're using basestring incorrectly in Python 2, and it happens to work only because Python 2 did a bad job of separating text and binary data. Your original example code should actually have been written in Python 2 as: if isinstance(value, bytes): # bytes is an alias of str, and only str, on 2.7 value = value.decode(encoding) elif not isinstance(value, unicode): some other code which 2to3 would convert correctly (changing unicode to str, and leaving everything else untouched) because you actually tested what you meant to test to control the actions taken: 1. If it was binary data (which you interpret all Py2 strs to be), then it is decoded to text (Py2 unicode/Py3 str) 2. If it wasn't binary data and it wasn't text, you did something else Point is, the converter is doing the right thing. You misunderstood the logical meaning of basestring, and wrote code that depended on your misinterpretation, that's all. Your try/except to try to detect Python 3-ness was doomed from the start; you referenced basestring, and 2to3 (reasonably) converts that to str, which breaks your logic. You wrote cross-version code that can't be 2to3-ed because it's *already* Python 3 code; Python 3 code should never be subjected to 2to3, because it'll do dumb things (e.g. change print(1, 2) to print((1, 2))); it's 2to3, not 2or3to3 after all. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue38003> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38046] Can't use sort_keys in json.dumps with mismatched types
Josh Rosenberg added the comment: This is an exact duplicate of #25457. -- nosy: +josh.r resolution: -> duplicate stage: -> resolved status: open -> closed superseder: -> json dump fails for mixed-type keys when sort_keys is specified ___ Python tracker <https://bugs.python.org/issue38046> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23670] Modifications to support iOS as a cross-compilation target
Change by Josh Rosenberg : -- title: Restore -> Modifications to support iOS as a cross-compilation target ___ Python tracker <https://bugs.python.org/issue23670> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37976] zip() shadows TypeError raised in __iter__() of source iterable
Josh Rosenberg added the comment: Raymond: "Since there isn't much value in reporting which iterable number has failed" Isn't there though? If the error just points to the line with the zip, and the zip is zipping multiple similar things (especially things which won't have a traceable line of Python code associated with them to narrow it down), knowing which argument was the cause of the TypeError seems rather useful. Without it, you just know *something* being zipped was wrong, but need to manually track down which of the arguments was the problem. -- nosy: +josh.r ___ Python tracker <https://bugs.python.org/issue37976> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37872] Move statics in Python/import.c to top of the file
Change by Josh Rosenberg : -- title: Move statitics in Python/import.c to top of the file -> Move statics in Python/import.c to top of the file ___ Python tracker <https://bugs.python.org/issue37872> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33007] Objects referencing private-mangled names do not roundtrip properly under pickling.
Josh Rosenberg added the comment: This problem is specific to private methods AFAICT, since they're the only things which have an unmangled __name__ used to pickle them, but are stored as a mangled name. More details on cause and solution on issue #37852, which I closed as a duplicate of this issue. -- nosy: +josh.r versions: +Python 3.6, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue33007> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37852] Pickling doesn't work for name-mangled private methods
Change by Josh Rosenberg : -- resolution: -> duplicate stage: -> resolved status: open -> closed superseder: -> Objects referencing private-mangled names do not roundtrip properly under pickling. ___ Python tracker <https://bugs.python.org/issue37852> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37852] Pickling doesn't work for name-mangled private methods
New submission from Josh Rosenberg : Inspired by this Stack Overflow question, where it prevented using multiprocessing.Pool.map with a private method: https://stackoverflow.com/q/57497370/364696 The __name__ of a private method remains the unmangled form, even though only the mangled form exists on the class dictionary for lookup. The __reduce__ for bound methods doesn't handle them private names specially, so it will serialize it such that on the other end, it does getattr(method.__self__, method.__func__.__name__). On deserializing, it tries to perform that lookup, but of course, only the mangled name exists, so it dies with an AttributeError. Minimal repro: import pickle class Spam: def __eggs(self): pass def eggs(self): return pickle.dumps(self.__eggs) spam = Spam() pkl = spam.eggs() # Succeeds via implicit mangling (but pickles unmangled name) pickle.loads(pkl) # Fails (tried to load __eggs Explicitly mangling via pickle.dumps(spam._Spam__eggs) fails too, and in the same way. A similar problem occurs (on the serializing end) when you do: pkl = pickle.dumps(Spam._Spam__eggs)# Pickling function in Spam class, not bound method of Spam instance though that failure occurs at serialization time, because pickle itself tries to look up .Spam.__eggs (which doesn't exist), instead of .Spam._Spam__eggs (which does). 1. It fails at serialization time (so it doesn't silently produce pickles that can never be unpickled) 2. It's an explicit PicklingError, with a message that explains what it tried to do, and why it failed ("Can't pickle : attribute lookup Spam.__eggs on __main__ failed") In the use case on Stack Overflow, it was the implicit case; a public method of a class created a multiprocessing.Pool, and tried to call Pool.map with a private method on the same class as the mapper function. While normally pickling methods seems odd, for multiprocessing, it's pretty standard. I think the correct fix here is to make method_reduce in classobject.c (the __reduce__ implementation for bound methods) perform the mangling itself (meth_reduce in methodobject.c has the same bug, but it's less critical, since only private methods of built-in/extension types would be affected, and most of the time, such private methods aren't exposed to Python at all, they're just static methods for direct calling in C). This would handle all bound methods, but for "unbound methods" (read: functions defined in a class), it might also be good to update save_global/get_deep_attribute in _pickle.c to make it recognize the case where a component of a dotted name begins with two underscores (and doesn't end with them), and the prior component is a class, so that pickling the private unbound method (e.g. plain function which happened to be defined on a class) also works, instead of dying with a lookup error. The fix is most important, and least costly, for bound methods, but I think doing it for plain functions is still worthwhile, since I could easily see Pool.map operations using an @staticmethod utility function defined privately in the class for encapsulation purposes, and it seems silly to force them to make it more public and/or remove it from the class. -- components: Interpreter Core, Library (Lib) messages: 349716 nosy: josh.r priority: normal severity: normal status: open title: Pickling doesn't work for name-mangled private methods versions: Python 3.9 ___ Python tracker <https://bugs.python.org/issue37852> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com