[issue39812] Avoid daemon threads in concurrent.futures

2022-04-06 Thread Josh Rosenberg

Change by Josh Rosenberg :

Removed message: https://bugs.python.org/msg416876

Python tracker 
Python-bugs-list mailing list

[issue39812] Avoid daemon threads in concurrent.futures

2022-04-06 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

I think this is causing a regression for code that explicitly desires the 
ThreadPoolExecutor to go away abruptly when all other non-daemon threads 
complete (by choosing not to use a with statement, and if shutdown is called, 
calling it with wait=False, or even with those conditions, by creating it from 
a daemon thread of its own).

It doesn't seem like it's necessary, since the motivation was "subinterpreters 
forbid daemon threads" and the same release that contained this change 
(3.9.0alpha6) also contained #40234's change that backed out the change that 
forbade spawning daemon threads in subinterpreters (because they now support 
them by default). If the conflicts with some uses of subinterpreters that make 
it necessary to use non-daemon threads, could that be made a configurable 
option (ideally defaulting to the pre-3.9 choice to use daemon threads)?

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue46175] Zero argument super() does not function properly inside generator expressions

2021-12-27 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Carlos: This has nothing to do with reloading (as Alex's repro shows, no reload 
calls are made).

super() *should* behave the same as super(CLASS_DEFINED_IN, self), and it looks 
like the outer function is doing half of what it must do to make no-arg super() 
work in the genexpr (dis.dis reports that __class__ is being loaded, and a 
closure constructed from the genexpr that includes it, so __class__, which 
no-arg super pulls from closure scope to get its first argument, is there).

The problem is that super() *also* assumes the first argument to the function 
is self, and a genexpr definitionally receives just one argument, the iterator 
(the outermost one for genexprs with nested loops). So no-arg super is doing 
the equivalent of:

super(__class__, iter(vars))

when it should be doing:

super(__class__, self)

Only way to fix it I can think of would be one of:

1. Allow a genexpr to receive multiple arguments to support this use case 
(ugly, requires significant changes to current design of genexprs and probably 
super() too)
2. Somehow teach super() to pull self (positional argument #1 really; super() 
doesn't care about names) from closure scope (and make the compiler put self in 
the closure scope when it builds the closure) when run in a genexpr.

Both options seem... sub-optimal. Better suggestions welcome. Note that the 
same problem affects the various forms of comprehension as well (this isn't 
specific to the lazy design of genexprs; listcomps have the same problem).

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue46148] Optimize pathlib

2021-12-22 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Note: attrgetter could easily be made faster by migrating it to use vectorcall.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue46082] type casting of bool

2021-12-15 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Agreed, this is not a bug. The behavior of the bool constructor is not a parser 
(unlike, say, int), it's a truthiness detector. Non-empty strings are always 
truthy, by design, so both "True" and "False" are truthy strings. There's no 
bug to address here.

nosy: +josh.r
resolution:  -> not a bug
stage:  -> resolved
status: pending -> closed

Python tracker 
Python-bugs-list mailing list

[issue45707] Variable reassginment triggers incorrect behaviors of locals()

2021-11-03 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

This is a documented feature of locals() (it's definitionally impossible to 
auto-vivify *real* locals, because real locals are statically assigned to 
specific indices in a fixed size array at function compile time, and the 
locals() function is returning a copy of said bindings, not a live view of 

nosy: +josh.r
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

Python tracker 
Python-bugs-list mailing list

[issue45520] Frozen dataclass deep copy doesn't work with __slots__

2021-10-19 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

You're right that in non-dataclass scenarios, you'd just use __slots__.

The slots=True thing was necessary for any case where any of the dataclass's 
attributes have default values (my_int: int = 0), or are defined with fields 
(my_list: list = field(default_factory=list)). The problem is that __slots__ is 
implemented by, after the class definition ends, creating descriptors on the 
class to access the data stored at known offsets in the underlying PyObject 
structure. Those descriptors themselves being class attributes means that when 
the type definition machinery tries to use __slots__ to create them, it finds 
conflicting class attributes (the defaults/fields) that already exist and 

Adding support for slots=True means it does two things:

1. It completely defines the class without slots, extracts the stuff it needs 
to make the dataclass separately, then deletes it from the class definition 
namespace and makes a *new* class with __slots__ defined (so no conflict occurs)
2. It checks if the dataclass is also frozen, and applies alternate 
__getstate__/__setstate__ methods that are compatible with a frozen, slotted 

#2 is what fixes this bug (while #1 makes it possible to use the full range of 
dataclass features without sacrificing the ability to use __slots__). If you 
need this to work in 3.9, you could borrow the 3.10 implementations that make 
this work for frozen dataclasses to explicitly define __getstate__/__setstate__ 
for your frozen slotted dataclasses:

def __getstate__(self):
return [getattr(self, f.name) for f in fields(self)]

def __setstate__(self, state):
for field, value in zip(fields(self), state):
# use setattr because dataclass may be frozen
object.__setattr__(self, field.name, value)

I'm not closing this since backporting just the fix for frozen slotted 
dataclasses (without backporting the full slots=True functionality that's a new 
feature) is possibly within scope for a bugfix release of 3.9 (it wouldn't 
change the behavior of working code, and fixes broken code that might 
reasonably be expected to work).


Python tracker 
Python-bugs-list mailing list

[issue45520] Frozen dataclass deep copy doesn't work with __slots__

2021-10-18 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

When I define this with the new-in-3.10 slots=True argument to dataclass rather 
than manually defining __slots__ it works just fine. Looks like the pickle 
format changes rather dramatically to accommodate it.

>>> @dataclass(frozen=True, slots=True)
... class FrozenData:
... my_string: str
>>> deepcopy(FrozenData('initial'))

Is there a strong motivation to support manually defined __slots__ on top of 
slots=True that warrants fixing it for 3.10 onward?

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue45450] Improve syntax error for parenthesized arguments

2021-10-12 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Why not "lambda parameters cannot be parenthesized" (optionally "lambda 
function")? def-ed function parameters are parenthesized, so just saying 
"Function parameters cannot be parenthesized" seems very weird.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue45414] pathlib.Path.parents negative indexing is wrong for absolute paths

2021-10-08 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

On the subject of sleep-deprived and/or sloppy, just realized:

return self.__getitem__(len(self) + idx)

should really just be:

idx += len(self)

no need to recurse.


Python tracker 
Python-bugs-list mailing list

[issue45414] pathlib.Path.parents negative indexing is wrong for absolute paths

2021-10-08 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

"We'll definitely want to make sure that we're careful about bad indices ... 
since it would be easy to get weird behavior where too-large negative indexes 
start 'wrapping around'"

When I noticed the problem, I originally thought "Hey, the test for a negative 
index can come *before* the range check and save some work for negative 
indices". Then I realized, while composing this bug report, that that would 
make p.parents[-4] with len(p.parents) == 3 → p.parents[-1] as you said, and 
die with a RecursionError for p.parents[-3000] or so. I'm going to ignore the 
possibility I'm sleep-deprived and/or sloppy, and assume a lot of good 
programmers would think to make that "optimization" and accidentally introduce 
new bugs. :-) So yeah, all the tests.


Python tracker 
Python-bugs-list mailing list

[issue45340] Lazily create dictionaries for plain Python objects

2021-10-08 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Hmm... And there's one other issue (that wouldn't affect people until they 
actually start worrying about memory overhead). Right now, if you want to 
determine the overhead of an instance, the options are:

1. Has __dict__: sys.getsizeof(obj) + sys.getsizeof(obj.__dict__)
2. Lacks __dict__ (built-ins, slotted classes): sys.getsizeof(obj)

This change would mean even checking if something using this setup has a 
__dict__ creates one. Without additional introspection support, there's no way 
to tell the real memory usage of the instance without changing the memory usage 
(for the worse).


Python tracker 
Python-bugs-list mailing list

[issue45340] Lazily create dictionaries for plain Python objects

2021-10-08 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Hmm... Key-sharing dictionaries were accepted largely without question because 
they didn't harm code that broke them (said code gained nothing, but lost 
nothing either), and provided a significant benefit. Specifically:

1. They imposed no penalty on code that violated the code-style recommendation 
to initialize all variables consistently in __init__ (code that always ended up 
using a non-sharing dict). Such classes don't benefit, but neither do they get 
penalized (just a minor CPU cost to unshare when it realized sharing wouldn't 

2. It imposes no penalty for using vars(object)/object.__dict__ when you don't 
modify the set of keys (so reading or changing values of existing attributes 
caused no problems).

The initial version of this worsens case #2; you'd have to convert to 
key-sharing dicts, and possibly to unshared dicts a moment later, if the set of 
attributes is changed. And when it happens, you'd be paying the cost of the now 
defunct values pointer storage for the life of each instance (admittedly a 
small cost).

But the final proposal compounds this, because the penalty for lazy attribute 
creation (directly, or dynamically by modifying via vars()/__dict__) is now a 
per-instance cost of n pointers (one for each value).

The CPython codebase rarely uses lazy attribute creation, but AFAIK there is no 
official recommendation to avoid it (not in PEP 8, not in the official 
tutorial, not even in PEP 412 which introduced Key-Sharing Dictionaries). 
Imposing a fairly significant penalty on people who aren't even violating 
language recommendations, let alone language rules, seems harsh.

I'm not against this initial version (one pointer wasted isn't so bad), but the 
additional waste in the final version worries me greatly.

Beyond the waste, I'm worried how you'd handle the creation of the first 
instance of such a class; you'd need to allocate and initialize an instance 
before you know how many values to tack on to the object. Would the first 
instance use a real dict during the first __init__ call that it would use to 
realloc the instance (and size all future instances) at the end of __init__? Or 
would it be realloc-ing for each and every attribute creation? In either case, 
threading issues seem like a problem.

Seems like:

1. Even in the ideal case, this only slightly improves memory locality, and 
only provides a fixed reduction in memory usage per-instance (the dict header 
and a little allocator round-off waste), not one that scales with number of 

2. Classes that would benefit from this would typically do better to use 
__slots__ (now that dataclasses.dataclass supports slots=True, encouraging that 
as a default use case adds little work for class writers to use them)

If the gains are really impressive, might still be worth it. But I'm just 
worried that we'll make the language penalize people who don't know to avoid 
lazy attribute creation. And the complexity of this layered:

1. Not-a-dict
2. Key-sharing-dict
3. Regular dict

approach makes me worry it will allow subtle bugs in key-sharing dicts to go 
unnoticed (because so little code would still use them).

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue21041] pathlib.PurePath.parents rejects negative indexes

2021-10-08 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Negative indexing is broken for absolute paths, see #45414.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue45414] pathlib.Path.parents negative indexing is wrong for absolute paths

2021-10-08 Thread Josh Rosenberg

New submission from Josh Rosenberg :

At least on PosixPath (not currently able to try on Windows to check 
WindowsPath, but from a quick code check I think it'll behave the same way), 
the negative indexing added in #21041 is implemented incorrectly for absolute 
paths. Passing either -1 or -2 will return a path representing the root, '/' 
for PosixPath (which should only be returned for -1), and passing an index of 
-3 or beyond returns the value expected for that index + 1, e.g. -3 gets the 
result expected for -2, -4 gets the result for -3, etc. And for the negative 
index that should be equivalent to index 0, you end up with an IndexError.

The underlying problem appears to be that absolute paths (at least, those 
created from a string) are represented in self._parts with the root '/' 
included (redundantly, since self._root has it too), so all the actual 
components of the path are offset by one.

This does not affect slicing (slicing is implemented using range and 
slice.indices to perform normalization from negative to positive indices, so it 
never indexes with a negative index).


>>> from pathlib import Path
>>> p = Path('/1/2/3')
>>> p._parts
['/', '1', '2', '3']
>>> p.parents[:]
(PosixPath('/1/2'), PosixPath('/1'), PosixPath('/'))
>>> p.parents[-1]
>>> p.parents[-1]._parts  # Still behaves normally as self._root is still '/'
>>> p.parents[-2]
>>> p.parents[-2]._parts
>>> p.parents[-3]
>>> p.parents[-4]
Traceback (most recent call last):
IndexError: -4

It looks like the underlying problem is that the negative indexing code doesn't 
account for the possibility of '/' being in _parts and behaving as a component 
separate from the directory/files in the path. Frankly, it's a little odd that 
_parts includes '/' at all (Path has a ._root/.root attribute that stores it 
too, and even when '/' isn't in the ._parts/.parts, the generated complete path 
includes it because of ._root), but it looks like the docs guaranteed that 
behavior in their examples.

It looks like one of two options must be chosen:

1. Fix the negative indexing code to account for absolute paths, and ensure 
absolute paths store '/' in ._parts consistently (it should not be possible to 
get two identical Paths, one of which includes '/' in _parts, one of which does 
not, which is possible with the current negative indexing bug; not sure if 
there are any documented code paths that might produce this warped sort of 
object outside of the buggy .parents), or

2. Make no changes to the negative indexing code, but make absolute paths 
*never* store the root as the first element of _parts (.parts can prepend 
self._drive/self._root on demand to match documentation). This probably 
involves more changes (lots of places assume _parts includes the root, e.g. the 
_PathParents class's own __len__ method raises a ValueError when called on the 
warped object returned by p.parents[-1], because it adjusts for the root, and 
the lack of one means it returns a length of -1).

I think #1 is probably the way to go. I believe all that would require is to 

if idx < 0:
return self.__getitem__(len(self) + idx)

just before:

return self._pathcls._from_parsed_parts(self._drv, self._root, 
self._parts[:-idx - 1])

so it never tries to use a negative idx directly (it has to occur after the 
check for valid index in [-len(self), len(self) so very negative indices don't 
recurse until they become positive).

This takes advantage of _PathParents's already adjusting the reported length 
for the presence of drive/root, keeping the code simple; the alternative I came 
up with that doesn't recurse changes the original return line:

return self._pathcls._from_parsed_parts(self._drv, self._root, 
self._parts[:-idx - 1])


adjust = idx >= 0 or not (self._drv or self._root)
return self._pathcls._from_parsed_parts(self._drv, self._root, 
self._parts[:-idx - adjust])

which is frankly terrible, even if it's a little faster.

components: Library (Lib)
messages: 403488
nosy: josh.r
priority: normal
severity: normal
status: open
title: pathlib.Path.parents negative indexing is wrong for absolute paths
versions: Python 3.10, Python 3.11

Python tracker 
Python-bugs-list mailing list

[issue17792] Unhelpful UnboundLocalError due to del'ing of exception target

2021-09-30 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Aaron: Your understanding of how LEGB works in Python is a little off.

Locals are locals for the *entire* scope of the function, bound or unbound; 
deleting them means they hold nothing (they're unbound) but del can't actually 
stop them from being locals. The choice of whether to look something up in the 
L, E or GB portions of LEGB scoping rules is a *static* choice made when the 
function is defined, and is solely about whether they are assigned to anywhere 
in the function (without an explicit nonlocal/global statement to prevent them 
becoming locals as a result).

Your second example can be made to fail just by adding a line after the print:

def doSomething():
x = 1

and it fails for the same reason:

def doSomething():
x = 10
del x

fails; a local is a local from entry to exit in a function. Failure to assign 
to it for a while doesn't change that; it's a local because you assigned to it 
at least once, along at least one code path. del-ing it after assigning doesn't 
change that, because del doesn't get rid of locals, it just empties them. 
Imagine how complex the LOAD_FAST instruction would get if it needed to handle 
not just loading a local, but when the local wasn't bound, had to choose 
*dynamically* between:

1. Raising UnboundLocalError (if the value is local, but was never assigned)
2. Returning a closure scoped variable (if the value was local, but got del-ed, 
and a closure scope exists)
3. Raising NameError (if the closure scope variable exists, but was never 
4. Returning a global/builtin variable (if there was no closure scope variable 
*or* the closure scope variable was created, but explicitly del-ed)
5. Raising NameError (if no closure, global or builtin name exists)

That's starting to stretch the definition of "fast" in LOAD_FAST. :-)

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue45333] += operator and accessors bug?

2021-09-30 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

This has nothing to do with properties, it's 100% about using augmented 
assignment with numpy arrays and mixed types. An equivalent reproducer is:

a = np.array([1,2,3])  # Implicitly of dtype np.int64

a += 0.5  # Throws the same error, no properties involved

The problem is that += is intended to operate in-place on mutable types, numpy 
arrays *are* mutable types (unlike normal integers in Python), you're trying to 
compute a result that can't be stored in a numpy array of integers, and numpy 
isn't willing to silently make augmented assignment with incompatible types 
make a new copy with a different dtype (they *could* do this, but it would lead 
to surprising behavior, like += on the *same* numpy array either operating in 
place or creating a new array with a different dtype and replacing the original 
based on the type on the right-hand side).

The short form is: If your numpy computation is intended to produce a new array 
with a different data type, you can't use augmented assignment. And this isn't 
a bug in CPython in any event; it's purely about the choices (reasonable ones 
IMO) numpy made implementing their __iadd__ overload.

nosy: +josh.r
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

Python tracker 
Python-bugs-list mailing list

[issue44547] fraction.Fraction does not implement __int__.

2021-07-01 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Seems like an equally reasonable solution would be to make class's with 
__trunc__ but not __int__ automatically generate a __int__ in terms of 
__trunc__ (similar to __str__ using __repr__ when the latter is defined but not 
the former). The inconsistency is in both methods existing, but having the 
equivalence implemented in int() rather than in the type (thereby making 
SupportsInt behave unexpectedly, even though it's 100% true that obj.__int__() 
would fail).

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue44140] WeakKeyDictionary should support lookup by id instead of hash

2021-06-23 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Andrei: If designed appropriately, a weakref callback attached to the actual 
object would delete the associated ID from the dictionary when the object was 
being deleted to avoid that problem. That's basically how WeakKeyDictionary 
works already; it doesn't store the object itself (if it did, that strong 
reference could never be deleted), it just stores a weak reference for it that 
ensures that when the real object is deleted, a callback removes the weak 
reference from the WeakKeyDictionary; this just adds another layer to that work.

I don't think this would make sense as a mere argument to WeakKeyDictionary; 
the implementation would differ significantly, and probably deserves a separate 

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue44470] 3.11 docs.python.org in Polish not English?

2021-06-23 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

I just visited the link, and it's now *mostly* English, but with random bits of 
Korean in it (mostly in links and section headers).

The first warning block for instance begins:

경고: The parser module is deprecated...

Then a few paragraphs later I'm told:

For full information on the language syntax, refer to 파이썬 언어 레퍼런스.

where the Korean is a hyperlink to the Python Language Reference. Very strange.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue14995] PyLong_FromString documentation should state that the string must be null-terminated

2021-06-17 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

The description is nonsensical as is; not sure the patch goes far enough. 
C-style strings are *defined* to end at the NUL terminator; if it really needs 
a NUL after the int, saying it "points to the first character which follows the 
representation of the number" is highly misleading; the NUL isn't logically a 
character in the C-string way of looking at things.

The patch is also wrong; the digits need not end in a NUL byte (trailing 
whitespace is allowed).

AFAICT, the function really uses pend for two purposes:

1. If it succeeds in parsing, then pend reports the end of the string, nothing 
2. If it fails, because the string is not a legal input (contains non-numeric, 
or non-leading/terminal whitespace or whatever), pend tells you where the first 
violation character that couldn't be massaged to meet the rules for int() 

#1 is a mostly useless bit of info (strlen would be equally informative, and if 
the value parsed, you rarely care how long it was anyway), so pend is, 
practically speaking, solely for error-checking/reporting.

The rewrite should basically say what is allowed (making it clear anything 
beyond the single parsable integer value with optional leading/trailing 
whitespace is illegal), and making it clear that pend always points to the end 
of the string on success (not just after the representation of the number, it's 
after the trailing whitespace too), and on failure indicates where parsing 

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue44318] Asyncio classes missing __slots__

2021-06-17 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Andrei: The size of an instance of Semaphore is 48 bytes + 104 more bytes for 
the __dict__ containing its three attributes (ignoring the cost of the 
attributes themselves). A slotted class with three attributes only needs 56 
bytes of overhead per-instance (it has no __dict__, so the 56 is the total 
cost). Dropping overhead of the instances by >60% can make a difference if 
you're really making many thousands of them.

Personally, I think Python level classes should generally default to using 
__slots__ unless the classes are explicitly not for subclassing; not using 
__slots__ means all subclasses have their hands tied by the decision of the 
parent class. Perhaps explicitly opting in to __weakref__ (which __slots__ 
removes by default) to allow weak referencing, but it's fairly rare a class 
*needs* to otherwise allow the creation of arbitrary attributes.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue44175] What do "cased" and "uncased" mean?

2021-05-19 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

See the docs for the title method on what they mean by "titlecased"; "a" is 
self-evidently not titlecased. 


Python tracker 
Python-bugs-list mailing list

[issue44175] What do "cased" and "uncased" mean?

2021-05-18 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

"Cased": Characters which are either lowercase or uppercase (they have some 
other equivalent form in a different case)

"Uncased": Characters which are neither uppercase nor lowercase.

Do you have a suggested alternate wording?

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue43824] array.array.__deepcopy__() accepts a parameter of any type

2021-04-12 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

__deepcopy__ is required to take a second argument by the rules of the copy 
module; the second argument is supposed to be a memo dictionary, but there's no 
reason to use it for array.array (it can't contain Python objects, and you only 
use the memo dictionary when recursing to Python objects you contain).

Sure, the second argument isn't being type-checked, but it's not used at all, 
and it's only supposed to be invoked indirectly via copy.deepcopy (that passes 
a dict).

Can you explain what is wrong here that needs to be fixed? Seems like a 
straightforward "protocol requires argument, but use case doesn't have anything 
to do with it, so it ignores it". Are you suggesting adding type-checks for 
something that never gets used?

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue43464] set intersections should short-circuit

2021-03-10 Thread Josh Rosenberg

New submission from Josh Rosenberg :

At present, set_intersection (the C name for set.intersection) optimizes for 
pairs of sets by iterating the smallest set and only adding entries found in 
the larger, meaning work is proportionate to the smallest input.

But when the other input isn't a set, it goes with a naive solution, iterating 
the entire non-set, and adding entries found in the set. This is fine when the 
intersection will end up smaller than the original set (there's no way to avoid 
exhausting the non-set when that's the case), but when the intersection ends up 
being the same size as the original, we could add a cheap length check and 
short-circuit at that point.

As is, {4}.intersection(range(1)) takes close to 1000 times longer than 
{4}.intersection(range(10)) despite both of them reaching the point where the 
outcome will be {4} at the same time.

Since the length check for short-circuiting only needs to be performed when 
input set actually contains the value, the cost should be fairly low.

I figure this would be the worst case for the change:

{3, 4}.intersection((4,) * 1)

where it performs the length check every time, and doesn't benefit from 
short-circuiting. But cases like:

{4}.intersection((4,) * 1)



would finish much faster. A similar optimization to set_intersection_multi (to 
stop when the intermediate result is empty) would make cases like:

{4000}.intersection([1], range(1), range(10, 20))

also finish dramatically quicker in the (I'd assume not uncommon case) where 
the intersection of many iterables is empty, and this could be known quite 
early on (the cost of this check would be even lower, since it would only be 
performed once per iterable, not per-value).

Only behavioral change this would cause is that errors resulting from 
processing items in an iterable that is no longer run to exhaustion due to 
short-circuiting wouldn't happen ({4}.intersection([4, []]) currently dies, but 
would succeed with short-circuiting; same foes for {4}.intersection([5], [[]]) 
if set_intersection_multi is optimized), and input iterators might be left only 
partially consumed. If that's acceptable, the required code changes are trivial.

components: C API
keywords: easy (C)
messages: 388442
nosy: josh.r
priority: normal
severity: normal
status: open
title: set intersections should short-circuit
versions: Python 3.10

Python tracker 
Python-bugs-list mailing list

[issue43363] memcpy writes to wrong destination

2021-03-02 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Agreed, stack is a PyObject**, so adding an integer (pto_nargs) to the pointer 
(stack) is implicitly by multiples of sizeof(PyObject*). This is how pointer 
arithmetic works in all versions of C I'm aware of. The code is correct.

nosy: +josh.r
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

Python tracker 
Python-bugs-list mailing list

[issue43297] bz2.open modes behaving differently than standard open() modes

2021-02-23 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

All of the compression modules (gzip, lzma) have this behavior, not just bz2; 
it's consistent in that sense. Changing it now, after literally decades with 
the old behavior, would needlessly break existing programs. As you say, it's 
documented clearly, I'm not seeing a gain to be had strong enough to violate 
the existing documentation.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue43209] system cannot find the file specified in subprocess.py

2021-02-23 Thread Josh Rosenberg

Change by Josh Rosenberg :

resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

Python tracker 
Python-bugs-list mailing list

[issue43246] Dict copy optimization violates subclass invariant

2021-02-17 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

The cause is in dict_merge (see here: 
https://github.com/python/cpython/blob/master/Objects/dictobject.c ); it has a 
fast path for when the object being merged in (which is what the dict 
constructor does; it makes an empty dict, then merges the provided dict-like) 

1. A dict (or subclass thereof)
2. Which has not overridden __iter__

When that's the case, it assumes it's "dict-compatible" and performs the merge 
with heavy use of dict-internals. When it's not the case (as in your simple 
wrapper), it calls .keys() on the object, iterates that, and uses it to pull 
values via bracket lookup-equivalent code.

I assume the choice of testing __iter__ (really, the C slot for tp_iter, which 
is equivalent) is for performance; it's more expensive to check if keys was 
overridden and/or if the __getitem__ implementation (of which there is more 
than one possibility for slots at the C layer) has been overridden.

What the code is doing is probably logically wrong, but it's significantly 
faster than doing it the right way, and easy to work around (if you're writing 
your own dictionary-like thing with wildly different semantics, 
collections.abc.MutableMapping is probably a better base class to avoid 
inheriting dict-specific weirdness), so it's probably not worth fixing. Leaving 
open for others to discuss.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue43119] asyncio.Queue.put never yields if the queue is unbounded

2021-02-04 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Making literally every await equivalent to:

await asyncio.sleep(0)

followed by the actual await (which is effectively what you're proposing when 
you expect all await to be preemptible) means adding non-trivial overhead to 
all async operations (asyncio is based on system calls of the 
select/poll/epoll/kpoll variety, which add meaningful overhead when we're 
talking about an operation that is otherwise equivalent to an extremely cheap 
simple collections.deque.append call). It also breaks many reasonable uses of 
asyncio.wait and asyncio.as_completed, where the caller can reasonably expect 
to be able to await the known-complete tasks without being preempted (if you 
know the coroutine is actually done, it could be quite surprising/problematic 
when you await it and get preempted, potentially requiring synchronization that 
wouldn't be necessary otherwise).

Making all await yield to the event loop would be like releasing the GIL before 
acquiring an uncontended lock; it makes an extremely cheap operation *much* 
higher overhead to, at best, fix a problem with poorly designed code. In real 
life, if whatever you're feeding the queue with is infinite and requires no 
awaiting to produce each value, you should probably just avoid the queue and 
have the consumer consume the iterable directly. Or just apply a maximum size 
to the queue; since the source of data to put is infinite and not-awaitable, 
there's no benefit to an unbounded queue, you may as well use a bound roughly 
fitted to the number of consumers, because any further items are just wasting 
memory well ahead of when it's needed.

Point is, regular queue puts only block (and potentially release the GIL early) 
when they're full or, as a necessary consequence of threading being less 
predictable than asyncio, when there is contention on the lock protecting the 
queue internals (which is usually resolved quickly); why would asyncio queues 
go out of their way to block when they don't need to?

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue42948] bytearray.copy is undocumented

2021-01-23 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Does this need specific documentation? bytearray itself is documented with:

> As bytearray objects are mutable, they support the mutable sequence 
> operations in addition to the common bytes and bytearray operations described 
> in Bytes and Bytearray Operations.

where "mutable" is a link to all the mutable sequence operations ( 
https://docs.python.org/3/library/stdtypes.html#typesseq-mutable ), including 
copy. Specifically documenting copy for bytearray is pointless; are we going to 
add specific documentation for append and remove and all the other mutable 
sequence operations as well?

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue42958] filecmp.cmp(shallow=True) isn't actually shallow when only mtime differs

2021-01-19 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

This is a problem with the docstring. The actual docs for it are a bit more 
clear, https://docs.python.org/3/library/filecmp.html#filecmp.cmp :

"If shallow is true, files with identical os.stat() signatures are taken to be 
equal. Otherwise, the contents of the files are compared."

Your patch can't be used because it changes longstanding documented behavior. 
If you'd like to submit a patch to fix the docstring, that's fine, but we're 
not going to break existing code to make the function less accurate.

The patch should probably make the documentation more clear while it's at it.

1. The original wording could be misinterpreted as having the "Otherwise" apply 
to shallow=False only, not to the case where shallow=T rue but os.stat doesn't 
2. The existing wording isn't clear on what an os.stat() "signature" is, which 
can leave the impression that the entirety of os.stat is compared (which would 
only match for hardlinks of the same file), when in fact it's just the file 
type (stat.S_IFMT(st.st_mode), file vs. directory vs. symlink, etc.), size and 

Proposed rewording of main docs would be:

"If shallow is true, files with identical os.stat() signatures (file type, 
size, and modification time) are taken to be equal. When shallow is false, or 
the file signatures are identical, the contents of the files are compared."

A similar wording (or at least, a shorter version of the above, rather than a 
strictly wrong description of the shallow parameter) could be applied to the 

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue42899] Is it legal to eliminate tests of a value, when that test has no effect on control flow?

2021-01-12 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Gregory: Even in a low-level compiled language (say, C++), pretty sure the 
compiler can't automatically optimize out:

if (x) { }

unless it has sure knowledge of the implementation of operator bool; if 
operator bool's implementation isn't in the header file, and link time 
optimization isn't involved, it has to call it to ensure any side-effects it 
might have are invoked.

It can only bypass the call if it knows the implementation of operator bool and 
can verify it has no observable side-effects (as-if rule). There are exceptions 
to the as-if rule for optimizations in special cases (copy elision), but I'm 
pretty sure operator bool isn't one of them; if the optimizer doesn't know the 
implementation of operator bool, it must call it just in case it does something 
weird but critical to the program logic.

Point is, I agree that:

if x:

must evaluate non-constant-literal x for truthiness, no matter how silly that 
seems (not a huge loss, given very little code should ever actually do that).

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue15373] copy.copy() does not properly copy os.environment

2021-01-12 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Would we remove the functionality of os.environ.copy()? It seems very odd for 
types to have a .copy() method that works, while not supporting copy.copy, 
especially when there is zero documentation, on the web or the docstring, to 
even hint at the difference.

I'm strongly in favor of silently doing the right thing and behaving the same 
way the .copy() method already behaves; if you want a "copy" of os.environ that 
still modifies the environment, that's just aliasing it (envalias = 
os.environ), not copying at all. If you're trying to make a shallow copy, not 
an alias, you're trying to separate it from the parent, which every other 
dict-like thing does (assuming immutable values), where os.environ is a very 
weird exception (for copy.copy, but not the .copy() method).

Can someone give an example where you'd want copy.copy to produce a "shallow 
copy" that acts like an alias, not an actual independent copy?

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue42826] typing.Iterable does not need __getitem__() method

2021-01-04 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

As Serhiy says, the glossary term for an iterable is not the same as the 
documentation for typing.Iterable (which at this point is largely defined in 
terms of collections.abc.Iterable). True, collections.abc.Iterable does not 
detect classes that iterate via __getitem__, only via __iter__ (the docs are 
quite clear on this), but such __getitem__ based classes are still iterable in 
the broad sense of the term used in the glossary.

nosy: +josh.r
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

Python tracker 
Python-bugs-list mailing list

[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-20 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

This is a pessimization given the current implementation of str.join; it calls 
PySequence_Fast as the very first step, which is effectively free for a tuple 
or list input (just reference count manipulation), but must convert a generator 
expression to a list (which is slower than building the list with a listcomp in 
the first place).

It does this so it can do two passes, one to compute the final length (and max 
ordinal) of the string, allowing it to allocate just once, and one to build the 
new string.

In theory, it might be rewritten to use PyUnicodeWriter under-the-hood for 
single-pass operation, but as is, a generator expression is slower than a 
listcomp for this task.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue42689] Installation

2020-12-20 Thread Josh Rosenberg

Change by Josh Rosenberg :

resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

Python tracker 
Python-bugs-list mailing list

[issue42629] PyObject_Call not behaving as documented

2020-12-16 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Pingback from #42033. Proper fix for that issue likely involves moving the work 
for copying kwargs into PyObject_Call, which would fix this bug by side-effect.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue42033] Seemingly unnecessary complexification of foo(**kw)

2020-12-16 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Even if making a copy is necessary when the underlying function receives the 
dict "raw", preemptively performing the copy (before knowing if the function 
being called is a Vectorcall function) means that when it's a Vectorcall 
function (e.g. all user-defined functions, right?), instead of just copying 
from the original dict to the unpacked stack for vectorcall, it makes an 
intermediate copy, then copies from that copy to the unpacked stack later on; 
the copy is otherwise completely unused.

The extra bytecode isn't even defending against "dict-like" kwargs, because 
CALL_FUNCTION_EX itself already copies to a true dict for anything that's not 
an exact dict (that defense shouldn't even be there if the bytecode compiler is 
already guaranteeing a true dict).

Seems like, if preventing the caller's dict from being passed directly to the 
underlying function is necessary and intended, it should be done in 
PyObject_Call (which can avoid the copy entirely when call a Vectorcall 
function and when the reference count on the dict is 1), not at the bytecode 
interpreter layer. As is, PyObject_Call is already violating the documented 
behavior by *not* matching the behavior of callable(*args, **kwargs) (see 
#42629), so moving it to PyObject_Call would fix that problem and improve 
performance passing a single kwargs.


Python tracker 
Python-bugs-list mailing list

[issue42646] Add function that supports "applying" methods

2020-12-15 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

If you're annoyed by having to use two lines, one to copy, one to call the 
mutating method, you can use the walrus operator:

(y := x.copy()).some_method()


(y := deepcopy(x)).some_method()

Does that cover your use case?

For the list case, you'd normally just do:

arr = lis[::-1]


(arr = lis.copy()).reverse()

also works.

Granted, not super pretty. But I'm not seeing enough cases where this ugliness 
is truly unavoidable (the two lines don't bother me that much, and for 
built-ins, there is usually a one-liner that works fine, e.g. the reversing 
slice as shown, sorted over list.sort, etc.).

I'll note: Unconditionally calling copy.copy is fine; it knows to try the 
__copy__ method of the things it is called on (and most things that offer copy 
alias it to __copy__ or are special-cased in copy.copy as well; if they don't, 
they should), so you're unlikely to need to perform the "try method, fall back 
to copy.copy" yourself.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue42612] Software Designer

2020-12-12 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

A rough description is not sufficient. If you have code that reproduces the 
problem, post the reproducer so we can check, but odds are you've got a bug in 
your code.

nosy: +josh.r
status: open -> pending

Python tracker 
Python-bugs-list mailing list

[issue39707] Abstract property setter/deleter implementation not enforced, but documented as such

2020-12-09 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

If this is going to be closed as rejected, I think it still needs some 
improvement to the documentation. Right now, the docs for abstractproperty 
(deprecated in favor of combining property and abstractmethod) state:

"If only some components are abstract, only those components need to be updated 
to create a concrete property in a subclass:"

This heavily implies that if *all* components of the property are abstract, 
they must *all* be updated to create a concrete property on the subclass, when 
that is not the case (it's documenting a special way of overriding just one 
component by borrowing the base class, not a normal means of defining a 
property). If nothing else, mentioning this quirk in the docs seems like it 
would save confusion (e.g. 

assignee:  -> docs@python
components: +Documentation
nosy: +docs@python, josh.r
resolution: rejected -> 
status: closed -> open
title: Abstract property setter/deleter implementation not enforced. -> 
Abstract property setter/deleter implementation not enforced, but documented as 
versions: +Python 3.10, Python 3.8, Python 3.9

Python tracker 
Python-bugs-list mailing list

[issue42565] Traceback (most recent call last): File "", line 1, in NameError: name 'python' is not defined

2020-12-03 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Looks like someone tried to run python inside an interactive Python shell, 
rather than the command line. I'm moving to pending and will eventually close 
unless they add a repro for some actual bug.

nosy: +josh.r
status: open -> pending

Python tracker 
Python-bugs-list mailing list

[issue42454] Move slice creation to the compiler for constants

2020-11-30 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Yep, Mark Shannon's solution of contextual hashing is what I was trying 
(without success) when my last computer died (without backing up work offsite, 
oops) and I gave up on this for a while. And Batuhan Taskaya's note about 
compiler dictionaries for the constants being a problem is where I got stuck.

Switching to lists might work (I never pursued this far enough to profile it to 
see what the performance impact was; presumably for small functions it would be 
near zero, while larger functions might compile more slowly).

The other approach I considered (and was partway through implementing when the 
computer died) was to use a dict subclass specifically for the constants 
dictionaries; inherit almost everything from regular dicts, but with built-in 
knowledge of slices so it could perform hashing on their behalf (I believe you 
could use the KnownHash APIs to keep custom code minimal; you just check for 
slices, fake their hash if you got one and call the KnownHash API, otherwise, 
defer to dict normally). Just an extension of the code.__hash__ trick, adding a 
couple more small hacks into small parts of Python so they treat slices as 
hashable only in that context without allowing non-intuitive behaviors in 
normal dict usage.


Python tracker 
Python-bugs-list mailing list

[issue41878] python3 fails to use custom mapping object as symbols in eval()

2020-11-24 Thread Josh Rosenberg

Change by Josh Rosenberg :

resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

Python tracker 
Python-bugs-list mailing list

[issue42454] Move slice creation to the compiler for constants

2020-11-24 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

There is an open issue for this already, under #11107 (a reopen of the closed 
#2268, where the reopen was justified due to Python 3 making slice objects more 
common), just so you know.

I made a stab at this a while ago and gave up due to the problems with making 
slices constants while trying to keep them unhashable (and I never got to 
handling the marshal format updates properly). It doesn't seem right to 
incidentally make:

something[::-1] = something

actually work, and be completely nonsensical, when "something" happens to be a 
dict, when previously, you'd get a clear TypeError for trying to do it. I could 
definitely see code using duck-typing via slices to distinguish sequences from 
other iterables and mappings, and making mapping suddenly support slices in a 
nonsensical way is... odd.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue26290] fileinput and 'for line in sys.stdin' do strange mockery of input buffering

2020-11-20 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

For those who find this in the future, the simplest workaround for the:

for line in sys.stdin:

issue on Python 2 is to replace it with:

for line in iter(sys.stdin.readline, ''):

The problem is caused by the way file.__next__'s buffering behaves, but 
file.readline doesn't use that code (it delegates to either fgets or a loop 
over getc/getc_unlocked that never overbuffers beyond the newline). Two-arg 
iter lets you make an iterator that calls readline each time you want a line, 
and considers a return of '' (which is what readline returns when you hit EOF) 
to terminate iteration.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue42269] Add ability to set __slots__ in dataclasses

2020-11-10 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Is the plan to allow an argument to auto-generate __slots__, or would this 
require repeating the names once in __slots__, and once for annotations and the 


Python tracker 
Python-bugs-list mailing list

[issue42269] Add ability to set __slots__ in dataclasses

2020-11-10 Thread Josh Rosenberg

Change by Josh Rosenberg :

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue39931] Global variables are not accessible from child processes (multiprocessing.Pool)

2020-11-10 Thread Josh Rosenberg

Change by Josh Rosenberg :

resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

Python tracker 
Python-bugs-list mailing list

[issue42033] Seemingly unnecessary complexification of foo(**kw)

2020-10-15 Thread Josh Rosenberg

Change by Josh Rosenberg :

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-07 Thread Josh Rosenberg

Change by Josh Rosenberg :

type: performance -> behavior

Python tracker 
Python-bugs-list mailing list

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-07 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Can reproduce on Alpine Linux, with CPython 3.8.2 (running under WSLv2), so 
it's not just you. CPU usage is high; seems like it must be stuck in an 
infinite loop.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue41924] TextWrap's wrap method throws unhelpful error on bytes object

2020-10-03 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

It's not textwrap that's doing it, which is why the error is so unhelpful; the 
input is assumed to be a str, and the translate method is called on it with a 
dict argument, which is valid for str.translate, but not for bytes.translate.

You'll get other "unhelpful" error messages for other arguments (e.g. most 
other built-in types die because they lack an expandtabs method). Is it 
necessary to provide specific error messages when an API is given a type it 
never claimed to support? I could see issues with a "check for str" check if 
someone is implementing their own str-like type that matches the API but gets 
rejected for not being str.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue41878] python3 fails to use custom mapping object as symbols in eval()

2020-09-29 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Yes, list comprehensions having their own local scope was a change from Py2 to 
Py3. Python 2 did not do this for list comps initially, and it was left that 
way during the 2.x timeframe due to back compat constraints, but 2.x did it 
from the start for generator expressions, as well set and dict comps, and they 
were all made consistent for Py3.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue41702] Inconsistent behaviour in strftime

2020-09-03 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

3.8.2 (on Alpine Linux under WSL) produces '0020-10-05', just like your 3.6 
example. Not seeing anything obvious in commit history that would break it for 
3.7. That said, 3.7 is in security fix only mode at this point (see 
https://devguide.python.org/#status-of-python-branches ); as this works on the 
latest release, I'm thinking this won't be fixed for 3.7.

nosy: +josh.r
title: Inconcistent behaviour in strftime -> Inconsistent behaviour in strftime

Python tracker 
Python-bugs-list mailing list

[issue41694] python3 futures.as_completed timeout broken if future contains undefined reference

2020-09-02 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

The problem is a lot simpler than you're making it:

1. You submit a time.sleep(30) task. This begins running immediately
2. You try to submit another task, but a NameError is raised, bypassing the 
rest of the code (you never call as_completed, with or without a timeout)
3. The ThreadPoolExecutor's __exit__ is invoked, which implicitly invokes 
shutdown(wait=True). This does not return until the successfully submitted task 
(time.sleep(30)) finished.
4. At that point, the exception that was interrupted by with statement cleanup 
resumes bubbling

All of this is behaving exactly as documented, no bug is occurring.

nosy: +josh.r
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

Python tracker 
Python-bugs-list mailing list

[issue36172] csv module internal consistency

2020-08-28 Thread Josh Rosenberg

Change by Josh Rosenberg :

resolution:  -> not a bug
stage:  -> resolved
status: pending -> closed

Python tracker 
Python-bugs-list mailing list

[issue41652] An Advice on Turning Ellipsis into Keyword

2020-08-28 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

I'm closing this as not being worth the costs of adding new keywords. You're 
welcome to propose it on the python-ideas list (a more appropriate place to 
propose and suss out the details of significant language changes), but you'll 
need to formulate a much stronger reason for making this change.

resolution:  -> rejected
stage:  -> resolved
status: open -> closed

Python tracker 
Python-bugs-list mailing list

[issue41652] An Advice on Turning Ellipsis into Keyword

2020-08-28 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

You can do the same thing to replace int, float, dict, len, and all the other 
built-in classes and functions. Why is Ellipsis so special that it needs 
protection, especially when, as you note, ... is an available unoverrideable 
way to refer to it? Making new keywords is a high bar (because it can break 
existing code). What justifies this one beyond "don't want folks to mess with a 
barely used name"?

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue36379] nb_inplace_pow is always called with an invalid argument

2020-08-23 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Zackery, should this be closed? Or is there something missing from the patch?


Python tracker 
Python-bugs-list mailing list

[issue36921] Deprecate yield from and @coroutine in asyncio

2020-06-23 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Was this supposed to deprecate using types.coroutine as a decorator as well? 
Because that's not clearly documented, which means people can still use it to 
make generator-based coroutines without async def.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue40269] Inconsistent complex behavior with (-1j)

2020-04-12 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

The final entry is identical to the second to last, because ints have no 
concept of -0. If you used a float literal, it would match the first two:

>>> -0.-1j

I suspect the behavior here is due to -1j not actually being a literal on its 
own; it's interpreted as the negation of 1j, where 1j is actually 0.0+1.0j, and 
negating it flips the sign on both the real and imaginary component.

>From what I can read of the grammar rules, this is expected; the negation 
>isn't ever part of the literal (minus signs aren't part of the grammar aside 
>from exponents in scientific notation). 

If this is a bug, it's a bug in the grammar. I suspect the correct solution 
here is to include the real part explicitly, as 0.0-1j works just fine.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue40201] Last digit count error

2020-04-05 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Your script is using "true" division with / , (that produces potentially 
inaccurate float results) not floor division with // , (which gets int 
results). When the inputs vastly exceed the integer representational 
capabilities of floats (52-53 bits, where 10 ** 24 is 80 bits), you'll have 

This is a bug in your script, not Python.

nosy: +josh.r
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

Python tracker 
Python-bugs-list mailing list

[issue36144] Dictionary union. (PEP 584)

2020-02-26 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Sorry, I think I need examples to grok this in the general case. ChainMap 
unioned with dict makes sense to me (it's equivalent to update or 
copy-and-update on the top level dict in the ChainMap). But ChainMap unioned 
with another ChainMap is less clear. Could you give examples of what the 
expected end result is for:

d1 = {'a': 1, 'b': 2}
d2 = {'b': 3, 'c': 4}
d3 = {'a': 5, 'd': 6}
d4 = {'d': 7, 'e': 8}
cm1 = ChainMap(d1, d2)
cm2 = ChainMap{d3, d4)

followed by either:

cm3 = cm1 | cm2

cm1 |= cm2

? As in, what is the precise state of the ChainMap cm3 or the mutated cm1, 
referencing d1, d2, d3 and d4 when they are still incorporated by references in 
the chain?

My impression from what you said is that the plan would be for the updated cm1 
to preserve references to d1 and d2 only, with the contents of cm2 (d3 and d4) 
effectively flattened and applied as an in-place update to d1, with an end 
result equivalent to having done:

cm1 = ChainMap(d1, d2)
d1 |= d4
d1 |= d3

(except the key ordering would actually follow d3 first, and d4 second), while 
cm3 would effectively be equivalent to having done (note ordering):

cm3 = ChainMap(d1 | d4 | d3, d2)

though again, key ordering would be based on d1, then d3, then d4, not quite 
matching the union behavior. And a reference to d2 would be preserved in the 
final result, but not any other original dict. Is that correct? If so, it seems 
like it's wasting ChainMap's key feature (lazy accumulation of maps), where:

cm1 |= cm2

could be equivalent to either:

cm1.maps += cm2.maps

though that means cm1 wins overlaps, where normal union would have cm2 win, or 
to hew closer to normal union behavior, make it equivalent to:

cm1.map[:0] = cm2.maps

prepending all of cm2's maps to have the same duplicate handling rules as 
regular dicts (right side wins) at the expense of changing which map cm1 uses 
as the target for writes and deletes. In either case it would hew to the spirit 
of ChainMap, making dict "union"-ing an essentially free operation, in exchange 
for increasing the costs of lookups that don't hit the top dict.


Python tracker 
Python-bugs-list mailing list

[issue36144] Dictionary union. (PEP 584)

2020-02-26 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

What is ChainMap going to do? Normally, the left-most argument to ChainMap is 
the "top level" dict, but in a regular union scenario, last value wins. 

Seems like layering the right hand side's dict on top of the left hand side's 
would match dict union semantics best, but it feels... wrong, given ChainMap's 
normal left-to-right precedence. And top-mostness affects which dict receives 
all writes, so if  chain1 |= chain2 operates with dict-like precedence (chain2 
layers over chain1), then that also means the target of writes/deletions/etc. 
changes to what was on top in chain2.


Python tracker 
Python-bugs-list mailing list

[issue39693] tarfile's extractfile documentation is misleading

2020-02-19 Thread Josh Rosenberg

New submission from Josh Rosenberg :

The documentation for extractfile ( 
https://docs.python.org/3/library/tarfile.html#tarfile.TarFile.extractfile ) 

"Extract a member from the archive as a file object. member may be a filename 
or a TarInfo object. If member is a regular file or a link, an 
io.BufferedReader object is returned. Otherwise, None is returned."

Before reading further, answer for yourself: What do you think happens when a 
provided filename doesn't exist, based on that documentation?

In teaching a Python class that uses tarfile in the final project, and expects 
students to catch predictable errors (e.g. a random tarball being provided, 
rather than one produced by a different mode of the program with specific 
expected files) and convert them to user-friendly error messages, I've found 
this documentation to confuse students repeatedly (if they actually read it, 
rather than just guessing and checking interactively).

Specifically, the documentation:

1. Says nothing about what happens if member doesn't exist (TarFile.getmember 
does mention KeyError, but extractfile doesn't describe itself in terms of 
2. Loosely implies that it should return None in such a scenario "If member is 
a regular file or a link, an io.BufferedReader object is returned. Otherwise, 
None is returned." The intent is likely to mean "all other member types are 
None, and we're saying nothing about non-existent members", but everyone I've 
taught who has read the docs came away with a different impression until they 
tested it.

Perhaps just reword from:

"If member is a regular file or a link, an io.BufferedReader object is 
returned. Otherwise, None is returned."


"If member is a regular file or a link, an io.BufferedReader object is 
returned. For all other existing members, None is returned. If member does not 
appear in the archive, KeyError is raised."

Similar adjustments may be needed for extract, and/or both of them could be 
adjusted to explicitly refer to getmember by stating that filenames are 
converted to TarInfo objects via getmember.

assignee: docs@python
components: Documentation, Library (Lib)
keywords: easy, newcomer friendly
messages: 362298
nosy: docs@python, josh.r
priority: normal
severity: normal
status: open
title: tarfile's extractfile documentation is misleading
versions: Python 3.7, Python 3.8, Python 3.9

Python tracker 
Python-bugs-list mailing list

[issue36051] Drop the GIL during large bytes.join operations?

2019-12-30 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

This will introduce a risk of data races that didn't previously exist. If you 

ba1 = bytearray(b'\x00') * 5
ba2 = bytearray(b'\x00') * 5
... pass references to thread that mutates them ...
ba3 = b''.join((ba1, ba2))

then two things will change from the existing behavior:

1. If the thread in question attempts to write to the bytearrays in place, then 
it could conceivably write data that is only partially picked up (ba1[0], 
ba1[4] = 2, 3 could end up copying the results of the second write without 
the first; at present, it could only copy the first without the second)

2. If the thread tries to change the size of the bytearrays during the join 
(ba1 += b'123'), it'll die with a BufferError that wasn't previously possible

#1 isn't terrible (as noted, data races in that case already existed, this just 
lets them happen in more ways), but #2 is a little unpleasant; code that 
previously had simple data races (the data might be inconsistent, but the code 
ran and produced some valid output) can now fail hard, nowhere near the actual 
call to join that introduced the behavioral change.

I don't think this sinks the patch (loudly breaking code that was silently 
broken before isn't awful), but I feel like a warning of some kind in the 
documentation (if only a simple compatibility note in What's New) might be 

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue39167] argparse boolean type bug

2019-12-30 Thread Josh Rosenberg

Change by Josh Rosenberg :

resolution:  -> duplicate
stage:  -> resolved
status: open -> closed
superseder:  -> ArgumentParser should support bool type according to truth 

Python tracker 
Python-bugs-list mailing list

[issue38971] codecs.open leaks file descriptor when invalid encoding is passed

2019-12-04 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Any reason not to just defer opening the file until after the codec has been 
validated, so the resource acquisition comes last?

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue38934] Dictionaries of dictionaries behave incorrectly when created from dict.fromkeys()

2019-11-27 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

That's the expected behavior, and it's clearly documented here: 

Quote: "All of the values refer to just a single instance, so it generally 
doesn’t make sense for value to be a mutable object such as an empty list. To 
get distinct values, use a dict comprehension instead."

nosy: +josh.r
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

Python tracker 
Python-bugs-list mailing list

[issue38874] asyncio.Queue: putting items out of order when it is full

2019-11-26 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Yes, five outstanding blocked puts can be bypassed by a put that comes in 
immediately after a get creates space. But this isn't really a problem; there 
are no guarantees on what order puts are executed in, only a guarantee that 
once a put succeeds, it's FIFO ordered with respect to all other puts.

Nothing in the docs even implies the behavior you're expecting, so I'm not 
seeing how even a documentation fix is warranted here. The docs on put clearly 

"Put an item into the queue. If the queue is full, wait until a free slot is 
available before adding the item."

If we forcibly hand off on put even when a slot is available (to allow older 
puts to finish first), then we violate the expectation that waiting is only 
performed when the queue is full (if I test myqueue.full() and it returns 
False, I can reasonably expect that put won't block). This would be especially 
impossible to fix if people write code like `if not myqueue.full(): 
myqueue.put_nowait()`. put_nowait isn't even a coroutine, so it *can't* hand 
off control to the event loop to allow waiting puts to complete, even if it 
wanted to, and it can't fail to put (e.g. by determining the empty slots will 
be filled by outstanding puts in some relatively expensive way), because you 
literally *just* verified the queue wasn't full and had no awaits between the 
test and the put_nowait, so it *must* succeed.

In short: Yes, it's somewhat unpleasant that a queue slot can become free and 
someone else can swoop in and steal it before older waiting puts can finish. 
But any change that "fixed" that would make all code slower (forcing 
unnecessary coroutine switches), and violate existing documentation guarantees.


Python tracker 
Python-bugs-list mailing list

[issue38874] asyncio.Queue: putting items out of order when it is full

2019-11-20 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

The items that haven't finished the put aren't actually "in" the queue yet, so 
I don't see how non-FIFO order of insertion violates any FIFO guarantees for 
the contents of the queue; until the items are actually "in", they're not 
sequenced for the purposes of when they come "out". Mandating such a guarantee 
effectively means orchestrating a queue with a real maxsize equal to the 
configured maxsize plus the total number of coroutines competing to put items 
into it.

The guarantee is still being met here; once an item is put, it will be "get"-ed 
after anything that finished put-ing before it, and before anything that 
finished put-ing after it.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue38853] set.repr breaches docstring contract

2019-11-19 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

To be clear, the docstring is explicitly disclaiming any ordering contract. If 
you're reading "unordered" as meaning "not reordered" (like a list or tuple, 
where the elements appear in insertion order), that's not what "unordered" 
means here. It means "arbitrary order". As it happens, the hashcodes of small 
integers correspond to their numerical values, (mostly, -1 is a special case), 
so if no collisions occur and the numbers are sequential, the ordering will 
often look like it was sorted in semi-numerical order, as in your case.

That doesn't mean it's performing sorting, it just means that's how the hashes 
happened to distribute themselves across the buckets in the set. A different 
test case with slightly more distributed numbers won't create the impression of 

>>> print({-5, -1, 13, 17})
{17, -5, 13, -1}

For the record, I chose that case to use CPython implementation details to 
produce a really unordered result (all the numbers are bucketed mod 8 in a set 
that small, and this produces no collisions, with all values mod 8 different 
from the raw value). On other versions of CPython, or alternate interpreters, 
both your case and mine could easily come out differently.

Point is, this isn't a bug, just a quirk in the small int hash codes.

Steven: I think they thought it was sorted in some string-related way, 
explaining (to them) why -1 was out of place (mind you, if it were string 
sorted, -1 would come first since the minus sign is ASCIIbetically first, 19 
would fall between 1 and 2, and 25 between 2 and 3, so it doesn't hold up).

There's no bug here.

nosy: +josh.r
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

Python tracker 
Python-bugs-list mailing list

[issue38710] unsynchronized write pointer in io.TextIOWrapper in 'r+' mode

2019-11-06 Thread Josh Rosenberg

Change by Josh Rosenberg :

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue38710] unsynchronized write pointer in io.TextIOWrapper in 'r+' mode

2019-11-06 Thread Josh Rosenberg

Change by Josh Rosenberg :

components: +Library (Lib)

Python tracker 
Python-bugs-list mailing list

[issue36906] Compile time textwrap.dedent() equivalent for str or bytes literals

2019-11-06 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Is there a reason folks are supporting a textwrap.dedent-like behavior over the 
generally cleaner inspect.cleandoc behavior? The main advantage to the latter 
being that it handles:


just fine (removing the common indentation from Second/Third), and produces 
identical results with:


where textwrap.dedent behavior would leave the first string unmodified (because 
it removes the largest common indentation, and First has no leading 
indentation), and dedenting the second, but leaving a leading newline in place 
(where cleandoc removes it), that can only be avoided by using the typically 
discouraged line continuation character to make it:


cleandoc behavior means the choice of whether the text begins and ends on the 
same line at the triple quote doesn't matter, and most use cases seem like 
they'd benefit from that flexibility.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue38560] Allow iterable argument unpacking after a keyword argument?

2019-10-23 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

I'd be +1 on this, but I'm worried about existing code relying on the 
functional use case from your example.

If we are going to discourage it, I think we either have to:

1. Have DeprecationWarning that turns into a SyntaxError, or
2. Never truly remove it, but make it a SyntaxWarning immediately and leave it 
that way indefinitely

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue38566] Description of '\w' behavior is vague in `re` documentation

2019-10-23 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

The definition of \w, historically, has corresponded to the set of characters 
that can occur in legal variable names in C (alphanumeric ASCII plus 
underscores, making it equivalent to [a-zA-Z0-9_] for ASCII regex). That's why, 
on top of the definitely wordy alphabetic characters, and the arguably wordy 
numerics, it includes the underscore, _.

That definition predates Unicode entirely, and Python is just building on it by 
expanding the definition of "alphanumeric" to encompass all alphanumeric 
characters in Unicode.

We definitely can't remove underscores from the definition without breaking 
existing code which assumes a common subset of PCRE support (every regex flavor 
I know of includes underscores in \w). Adding the zero width characters seems 
of limited benefit (especially in the non-joiner case; if you're trying to pull 
out words, presumably you don't want to group letters across a non-joining 
boundary?). Basically, you're parsing "Unicode word characters" as "Unicode's 
definition of word characters", when it's really meant to mean "All word 
characters, not just ASCII".

You omitted the clarifying remarks from the documentation though, the full 
description is:

> Matches Unicode word characters; this includes most characters that can be 
> part of a word in any language, as well as numbers and the underscore. If the 
> ASCII flag is used, only [a-zA-Z0-9_] is matched.

That's about as precise as I think we can make it (because technically, some of 
the things that count as "word characters" aren't actually part of an 
"alphabet" in the technical definition). If you think there is a clearer way of 
expressing it, please suggest a better phrasing, and this can be fixed as a 
documentation bug.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue34172] multiprocessing.Pool and ThreadPool leak resources after being deleted

2019-10-23 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

It should probably be backport to all supported 3.x branches though, so people 
aren't required to move to 3.8 to benefit from it.


Python tracker 
Python-bugs-list mailing list

[issue34172] multiprocessing.Pool and ThreadPool leak resources after being deleted

2019-10-23 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Pablo's fix looks like a superset of the original fix applied here, so I'm 
assuming it fixes this issue as well.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue32856] Optimize the `for y in [x]` idiom in comprehensions

2019-10-23 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

OOC, rather than optimizing a fairly ugly use case, might another approach be 
to make walrus less leaky? Even if observable leakage is considered desirable, 
it strikes me that use cases for walrus in genexprs and comprehensions likely 
break up into:

1. 90%: Cases where variable is never used outside genexpr/comprehension 
(because functional programming constructs shouldn't have side-effects, gosh 
darn it!)
2. 5%: Cases where variable is used outside genexpr/comprehension and expects 
3. 5%: Cases where variable is used outside genexpr/comprehension, but never in 
a way that actually relies on the value set in the genexpr/comprehension (same 
name chosen by happenstance)

If the walrus behavior in genexpr/comprehensions were tweaked to say that it 
only leaks if:

1. It's running at global scope (unavoidable, since there's no way to tell if 
it's an intended part of the module's interface)


2. A global or nonlocal statement within the function made it clear the name 
was considered stateful (again, like running at global scope, there is no way 
to know for sure if the name will be used somewhere else)


3. At some point in the function, outside the genexpr/comprehension, the value 
of the walrus-assigned name was read.

Case #3 could be even more narrow if the Python AST optimizer was fancier, 
potentially something like "if the value was read *after* the 
genexpr/comprehension, but *before* any following *unconditional* writes to the 
same name" (so [leaked := x for x in it] wouldn't bother to leak "leaked" if 
the next line was "leaked = 1" even if "leaked" were read three lines later, or 
the only reads from leaked occurred before the genexpr/comprehension), but I 
don't think the optimizer is up to that; following simple rules similar to 
those the compiler already follows to identify local names should cover 90% of 
cases anyway.

Aside from the dict returned by locals, and the possibility of earlier 
finalizer invocation (which you couldn't rely on outside CPython anyway), 
there's not much difference in behavior between a leaking and non-leaking 
walrus when the value is never referred to again, and it seems like the 90% 
case for cases where unwanted leakage occurs would be covered by this. Sure, if 
my WAG on use case percentages is correct, 5% of use cases would continue to 
leak even though they didn't benefit from it, but it seems like optimizing the 
90% case would do a lot more good than optimizing what's already a 
micro-optimization that 99% of Python programmers would never use (and 
shouldn't really be encouraged, since it would rely on CPython implementation 
details, and produce uglier code).

I was also inspired by this to look at replacing BUILD_LIST with BUILD_TUPLE 
when followed by GET_ITER (so "[y for x in it for y in [derived(x)]]" would at 
least get the performance benefit of looping over a one-element tuple rather 
than a one-element list), thinking it might reduce the overhead of [y for x in 
a for y in [x]] in your unpatched benchmark by making it equivalent to [y for x 
in a for y in (x,)] while reading more prettily, but it turns out you beat me 
to it with issue32925, so good show there! :-)

You should probably rerun your benchmarks though; with issue32925 committed (a 
month after you posted the benchmarks here), the performance discrepancy should 
be somewhat less (estimate based on local benchmarking says maybe 20% faster 
with BUILD_LIST being optimized to BUILD_TUPLE). Still much faster with the 
proposed optimization than without, but I suspect even optimized, few folks 
will think to write their comprehensions to take advantage of it, which is why 
I was suggesting tweaks to the more obvious walrus operator.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue38167] O_DIRECT read fails with 4K mmap buffer

2019-10-10 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Yeah, not a bug. The I/O subsystem was substantially rewritten between Python 2 
and Python 3, so you sometimes need to be more explicit about things like 
buffering, but as you note, once the buffering is correct, the code works; 
there's nothing to fix.

resolution:  -> not a bug
stage: patch review -> resolved
status: open -> closed

Python tracker 
Python-bugs-list mailing list

[issue38167] O_DIRECT read fails with 4K mmap buffer

2019-10-04 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

> I do not believe an unbuffered file uses O_DIRECT.  This is why I use 
> os.open(fpath, os.O_DIRECT).

Problem is you follow it with:

fo = os.fdopen(fd, 'rb+')

which introduces a Python level of buffering around the kernel unbuffered file 
descriptor. You'd need to pass buffering=0 to make os.fdopen avoid returning a 
buffered file object, making it:

fo = os.fdopen(fd, 'rb+', buffering=0)


Python tracker 
Python-bugs-list mailing list

[issue36947] Fix Metaclasses Documentation

2019-09-24 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

The existing documentation is correct, just hard to understand if you don't 
already understand the point of metaclasses (metaclasses are hard, the language 
to describe them will be inherently a little klunky).

At some point, it might be nice to write a proper metaclass tutorial, even if 
it's only targeted at advanced users (the only people who should really be 
considering writing their own metaclasses or even directly using existing ones; 
everyone else should be using more targeted tools and/or inheriting from 
classes that already implement the desired metaclass).

The Data model docs aren't concerned with tutorials and examples though; 
they're just dry description, and they're doing their job here, so I think this 
issue can be closed.


Python tracker 
Python-bugs-list mailing list

[issue38255] Replace "method" with "attribute" in the description of super()

2019-09-24 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

I prefer rhettinger's PR to your proposed PR; while super() may be useful for 
things other than methods, the 99% use case is methods, and deemphasizing that 
is a bad idea. rhettinger's PR adds a note about other use cases without 
interfering with super()'s primary use case.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue38241] Pickle with protocol=0 in python 3 does not produce a 'human-readable' format

2019-09-20 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

I'll note, the same bug appears in Python 2, but only when pickling bytearray; 
since bytes in Python 2 is just a str alias, you don't see this misbehavior 
with it, only with bytearray (which is consistently incorrect/non-ASCII on both 
2 and 3).


Python tracker 
Python-bugs-list mailing list

[issue38241] Pickle with protocol=0 in python 3 does not produce a 'human-readable' format

2019-09-20 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

This seems like a bug in pickle; protocol 0 is *defined* to be ASCII 
compatible. Nothing should encode to a byte above 0x7f. It's not actually 
supposed to be "human-readable" (since many ASCII bytes aren't printable), so 
the docs should be changed to describe protocol 0 as ASCII consistently; if 
this isn't fixed to make it ASCII consistently, "human-readable" is still 
meaningless and shouldn't be used.

I'm kind of surprised the output from Py3 works on Py2 to be honest.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue38167] O_DIRECT read fails with 4K mmap buffer

2019-09-13 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Works just fine for me on 3.7.3 on Ubuntu, reading 4096 bytes. How is it 
failing for you? Is an exception raised?

It does seem faintly dangerous to explicitly use O_DIRECT when you're wrapping 
it in a buffered reader that doesn't know it has to read in units matching the 
minimum block size (file system dependent on older kernels, 512 bytes in Linux 
kernel 2.6+); BufferedIOBase.readinto is explicitly documented to potentially 
issue multiple read calls (readinto1 guarantees it won't do that at least).

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue33214] join method for list and tuple

2019-09-13 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Note that all of Serhiy's examples are for a known, fixed number of things to 
concatenate/union/merge. str.join's API can be used for that by wrapping the 
arguments in an anonymous tuple/list, but it's more naturally for a variable 
number of things, and the unpacking generalizations haven't reached the point 

[*seq for seq in allsequences]

is allowed.


handles that just fine, but I could definitely see it being convenient to be 
able to do:


That said, a big reason str provides .join is because it's not uncommon to want 
to join strings with a repeated separator, e.g.:

# For not-really-csv-but-people-do-it-anyway

# Separate words with spaces
' '.join(words)

# Separate lines with newlines

I'm not seeing even one motivating use case for list.join/tuple.join that would 
actually join on a non-empty list or tuple ([None, 'STOP', None] being rather 
contrived). If that's not needed, it might make more sense to do this with an 
alternate constructor (a classmethod), e.g.:


which would avoid the cost of creating an otherwise unused empty list (the 
empty tuple is a singleton, so no cost is avoided there). It would also work 
equally well with both tuple and list (where making list.extend take varargs 
wouldn't help tuple, though it's a perfectly worthy idea on its own).

Personally, I don't find using itertools.chain (or its from_iterable alternate 
constructor) all that problematic (though I almost always import it with from 
itertools import chain to reduce the verbosity, especially when using 
chain.from_iterable). I think promoting itertools more is a good idea; right 
now, the notes on concatenation for sequence types mention str.join, 
bytes.join, and replacing tuple concatenation with a list that you call extend 
on, but doesn't mention itertools.chain at all, which seems like a failure to 
make the best solution the discoverable/obvious solution.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue38116] Make select module PEP-384 compatible

2019-09-11 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Why do you describe these issues (this one, #38069, #38071-#38076, maybe more) 
as making the module PEP 384 compatible? There is no reason to make the 
built-in modules stick to the limited API, and it doesn't look like you're 
doing that in any event (among other things, pretty sure Argument Clinic 
generated code isn't limited API compatible yet, though that might be 

Seems like the main (only?) change you're making is to convert all static types 
to dynamic types. Which is fine, if it's necessary for PEP 554, but it seems 
only loosely related to PEP 384 (which defined mechanisms for "statically" 
defining dynamic heap types, but that wasn't the main thrust).

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue38003] Incorrect "fixing" of isinstance tests for basestring

2019-09-06 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

basestring in Python 2 means "thing that is logically text", because in Python 
2, str can mean *either* logical text *or* binary data, and unicode is always 
logical text. str and unicode can kinda sorta interoperate on Python 2, so it 
can make sense to test for basestring if you're planning to use it as logical 
text; if you do 'foo' + u'bar', that's fine in Python 2. In Python 3, only str 
is logically text; b'foo' + 'bar' is completely illegal, so it doesn't make 
sense to convert it to recognize both bytes and str.

Your problem is that you're using basestring incorrectly in Python 2, and it 
happens to work only because Python 2 did a bad job of separating text and 
binary data. Your original example code should actually have been written in 
Python 2 as:

if isinstance(value, bytes):  # bytes is an alias of str, and only str, on 2.7
value = value.decode(encoding)
elif not isinstance(value, unicode):
some other code

which 2to3 would convert correctly (changing unicode to str, and leaving 
everything else untouched) because you actually tested what you meant to test 
to control the actions taken:

1. If it was binary data (which you interpret all Py2 strs to be), then it is 
decoded to text (Py2 unicode/Py3 str)
2. If it wasn't binary data and it wasn't text, you did something else

Point is, the converter is doing the right thing. You misunderstood the logical 
meaning of basestring, and wrote code that depended on your misinterpretation, 
that's all.

Your try/except to try to detect Python 3-ness was doomed from the start; you 
referenced basestring, and 2to3 (reasonably) converts that to str, which breaks 
your logic. You wrote cross-version code that can't be 2to3-ed because it's 
*already* Python 3 code; Python 3 code should never be subjected to 2to3, 
because it'll do dumb things (e.g. change print(1, 2) to print((1, 2))); it's 
2to3, not 2or3to3 after all.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue38046] Can't use sort_keys in json.dumps with mismatched types

2019-09-06 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

This is an exact duplicate of #25457.

nosy: +josh.r
resolution:  -> duplicate
stage:  -> resolved
status: open -> closed
superseder:  -> json dump fails for mixed-type keys when sort_keys is specified

Python tracker 
Python-bugs-list mailing list

[issue23670] Modifications to support iOS as a cross-compilation target

2019-09-03 Thread Josh Rosenberg

Change by Josh Rosenberg :

title: Restore -> Modifications to support iOS as a cross-compilation target

Python tracker 
Python-bugs-list mailing list

[issue37976] zip() shadows TypeError raised in __iter__() of source iterable

2019-08-29 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

Raymond: "Since there isn't much value in reporting which iterable number has 

Isn't there though? If the error just points to the line with the zip, and the 
zip is zipping multiple similar things (especially things which won't have a 
traceable line of Python code associated with them to narrow it down), knowing 
which argument was the cause of the TypeError seems rather useful. Without it, 
you just know *something* being zipped was wrong, but need to manually track 
down which of the arguments was the problem.

nosy: +josh.r

Python tracker 
Python-bugs-list mailing list

[issue37872] Move statics in Python/import.c to top of the file

2019-08-16 Thread Josh Rosenberg

Change by Josh Rosenberg :

title: Move statitics in Python/import.c  to top of the file -> Move statics in 
Python/import.c to top of the file

Python tracker 
Python-bugs-list mailing list

[issue33007] Objects referencing private-mangled names do not roundtrip properly under pickling.

2019-08-15 Thread Josh Rosenberg

Josh Rosenberg  added the comment:

This problem is specific to private methods AFAICT, since they're the only 
things which have an unmangled __name__ used to pickle them, but are stored as 
a mangled name.

More details on cause and solution on issue #37852, which I closed as a 
duplicate of this issue.

nosy: +josh.r
versions: +Python 3.6, Python 3.8, Python 3.9

Python tracker 
Python-bugs-list mailing list

[issue37852] Pickling doesn't work for name-mangled private methods

2019-08-15 Thread Josh Rosenberg

Change by Josh Rosenberg :

resolution:  -> duplicate
stage:  -> resolved
status: open -> closed
superseder:  -> Objects referencing private-mangled names do not roundtrip 
properly under pickling.

Python tracker 
Python-bugs-list mailing list

[issue37852] Pickling doesn't work for name-mangled private methods

2019-08-14 Thread Josh Rosenberg

New submission from Josh Rosenberg :

Inspired by this Stack Overflow question, where it prevented using 
multiprocessing.Pool.map with a private method: 

The __name__ of a private method remains the unmangled form, even though only 
the mangled form exists on the class dictionary for lookup. The __reduce__ for 
bound methods doesn't handle them private names specially, so it will serialize 
it such that on the other end, it does getattr(method.__self__, 
method.__func__.__name__). On deserializing, it tries to perform that lookup, 
but of course, only the mangled name exists, so it dies with an AttributeError.

Minimal repro:

import pickle

class Spam:
def __eggs(self):
def eggs(self):
return pickle.dumps(self.__eggs)

spam = Spam()
pkl = spam.eggs()   # Succeeds via implicit mangling (but 
pickles unmangled name)
pickle.loads(pkl)   # Fails (tried to load __eggs

Explicitly mangling via pickle.dumps(spam._Spam__eggs) fails too, and in the 
same way.

A similar problem occurs (on the serializing end) when you do:

pkl = pickle.dumps(Spam._Spam__eggs)# Pickling function in Spam class, not 
bound method of Spam instance

though that failure occurs at serialization time, because pickle itself tries 
to look up .Spam.__eggs (which doesn't exist), instead of 
.Spam._Spam__eggs (which does).

1. It fails at serialization time (so it doesn't silently produce pickles that 
can never be unpickled)
2. It's an explicit PicklingError, with a message that explains what it tried 
to do, and why it failed ("Can't pickle : 
attribute lookup Spam.__eggs on __main__ failed")

In the use case on Stack Overflow, it was the implicit case; a public method of 
a class created a multiprocessing.Pool, and tried to call Pool.map with a 
private method on the same class as the mapper function. While normally 
pickling methods seems odd, for multiprocessing, it's pretty standard.

I think the correct fix here is to make method_reduce in classobject.c (the 
__reduce__ implementation for bound methods) perform the mangling itself 
(meth_reduce in methodobject.c has the same bug, but it's less critical, since 
only private methods of built-in/extension types would be affected, and most of 
the time, such private methods aren't exposed to Python at all, they're just 
static methods for direct calling in C).

This would handle all bound methods, but for "unbound methods" (read: functions 
defined in a class), it might also be good to update 
save_global/get_deep_attribute in _pickle.c to make it recognize the case where 
a component of a dotted name begins with two underscores (and doesn't end with 
them), and the prior component is a class, so that pickling the private unbound 
method (e.g. plain function which happened to be defined on a class) also 
works, instead of dying with a lookup error.

The fix is most important, and least costly, for bound methods, but I think 
doing it for plain functions is still worthwhile, since I could easily see 
Pool.map operations using an @staticmethod utility function defined privately 
in the class for encapsulation purposes, and it seems silly to force them to 
make it more public and/or remove it from the class.

components: Interpreter Core, Library (Lib)
messages: 349716
nosy: josh.r
priority: normal
severity: normal
status: open
title: Pickling doesn't work for name-mangled private methods
versions: Python 3.9

Python tracker 
Python-bugs-list mailing list

  1   2   3   4   5   6   7   8   >