[issue42307] make install must not copy python.o into $prefix/lib/python3.10/config-3.10-x86_64-linux-gnu/

2020-11-10 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I think the comments are correct in that it is used to create a new statically 
linked interpreter that includes a user provided extension module.  We could 
include python.o inside the libpythonXX.a file but then I think it breaks if 
you are embedding (e.g. linking .a to your own .o that has a main function).

It seems probable that no one uses python.o anymore but my preference would be 
to not remove it unless we are going to completely remove support for 
statically linking.  Nothing is really hurt by having that .o file in the 
install and the feature still works as originally intended.

It would be good to add a comment in the makefile, describing the possible use 
for that file.

--

___
Python tracker 
<https://bugs.python.org/issue42307>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34106] Add --with-module-config= to 'configure' script

2020-10-31 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
resolution:  -> out of date
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue34106>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34704] Do not access ob_type directly, introduce Py_TP

2020-10-31 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

This has been resolved using Py_TYPE() and Py_SET_TYPE().

--
resolution:  -> out of date
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue34704>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27377] Add socket.fdtype()

2020-10-31 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I believe this is not needed anymore.  Closing.

--
resolution:  -> rejected
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue27377>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39573] [C API] Make PyObject an opaque structure in the limited C API

2020-10-26 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Correction: I think you *cannot* have it both ways.

--

___
Python tracker 
<https://bugs.python.org/issue39573>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39573] [C API] Make PyObject an opaque structure in the limited C API

2020-10-26 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

> I don't understand the rationale for this change in depth, but
> does the benefit outweigh (yet another) backwards incompatibility?

I think you can have it both ways.  Do you want a C API that is
stable over a long period of CPython releases or do you want to
continue to be able to look deep (i.e. non opaque PyObject*) into
CPython implementation internals?

During the sprint last week, we talked about how to provide a
compatible API, similar to what Pypy cpyext does.  It would be
possible to provide a (nearly) fully compatible API with the
approach.  It could get quite painful for CPython to maintain such a
thing however.  E.g. cpyext has proxy objects (to maintain CPython
compatible structure layouts) but keeping those proxies in sync with
the internal VM object structures is expensive and tricky.

Certainly making PyObject opaque is going to break some 3rd party
code.  Making it opaque for the non-limited API is not an option
IMHO because it breaks too much 3rd party code.   Is making it
opaque for the limited C API going to break too much code?  Maybe, I
don't know.  Thanks for pointing out pycurl and breezy.

--

___
Python tracker 
<https://bugs.python.org/issue39573>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42099] Fix reference to ob_type in unionobject.c and ceval

2020-10-20 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
keywords: +patch
pull_requests: +21784
pull_request: https://github.com/python/cpython/pull/22829

___
Python tracker 
<https://bugs.python.org/issue42099>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42099] Fix reference to ob_type in unionobject.c and ceval

2020-10-20 Thread Neil Schemenauer


New submission from Neil Schemenauer :

It is great that access to ob_type has been cleaned up to use an access macro.  
There are two spots that still need fixing.

I think we should do something to help avoid this kind of thing slipping into 
the code in the future.  E.g. a special build flag that renames the ob_type 
member would work.  I'll make a separate bug for that.

--
assignee: vstinner
components: Interpreter Core
messages: 379155
nosy: nascheme, vstinner
priority: normal
severity: normal
stage: patch review
status: open
title: Fix reference to ob_type in unionobject.c and ceval
type: enhancement

___
Python tracker 
<https://bugs.python.org/issue42099>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40255] Fixing Copy on Writes from reference counting

2020-04-18 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I resurrected an old thread on Discourse that seems quite relevant to this PR:

https://discuss.python.org/t/switching-from-refcounting-to-libgc/1641/35?u=nas

--

___
Python tracker 
<https://bugs.python.org/issue40255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40255] Fixing Copy on Writes from reference counting

2020-04-16 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

> immutability vs immutable object headers

Sorry, I meant to write "immortal vs immutable".

--

___
Python tracker 
<https://bugs.python.org/issue40255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40255] Fixing Copy on Writes from reference counting

2020-04-16 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

A few thoughts

First, the PR is well done.  The changes are pretty minimal and are well 
localized.  Thanks Eddie for writing clean code and responding to review 
requests.

My feeling is that, in current state, this still should not get merged or at 
least not get enabled by default.  The main consideration is what is the cost 
of this new feature vs what is the benefit.

If the immortalize heap function is not called by default, all Python users pay 
a performance hit for this feature.  Even after recent PR improvements, if you 
don't immortalize the heap, Python is slower. 

Only a small fraction of Python users will benefit from this feature.  I think 
the "pre-fork, CoW exploiting" application architecture is more common than 
some people would realize (it is a successful way to achieve multi-processing 
with CPython, making good use of multiple CPU cores).  However, I'm pretty sure 
it is used by a very tiny fraction of all Python programs.

If we do immortalize by default (e.g. on Python startup), the semantics of the 
refcnt field are subtly changed.  I suspect some programs will be broken as a 
result of this change.  A clue about what might go wrong comes from the unicode 
object code.  E.g. the resize_inplace() function in unicodeobject.c.  It is 
possible that a non-trivial amount of other 3rd party code will make 
assumptions about refcnt that will be broken by this change.  That code could 
be fixed but it is a matter of cost vs benefit.

If it is disabled by default with a build flag we have an ABI compatibility 
problem.  I.e. incref/decref are inlined functions/macros.  Also, as mentioned 
above, these kinds of build options tend to make maintenance and testing harder.

The frozen GC generation did not have these kinds of issues.  If people don't 
use it, there is basically no cost.  I think it was a good idea to merge the 
frozen generation feature.  There is a small amount of ongoing maintenance cost 
but that's acceptable.

Regarding Mark's comment about immutability vs immutable object headers, I 
would frame the goal of the PR differently.  Like the GC frozen generation, 
this immutable instance change is about providing a way to opt-out of Python's 
default garbage collection mechanism.  The frozen generation was to opt-out of 
the mark-and-sweep style cyclic collection.  This PR is about opting-out of the 
default Python reference counting.  

Put that way, could we consider this PR as an incremental step in a process of 
making it possible to have a non-reference count based GC for CPython?  E.g. 
perhaps we could eventually have a mixture of mark-and-sweep and reference 
counted GC, in order to support the handle (HPy) API.  If we want to move 
towards that, we want CPython code to stop making assumptions about the refcnt 
field, like the unicodeobject.c file currently does.

I think pitching the PR in that way makes it more compelling.  Merge it not 
because it helps a tiny class of Python applications.  Instead, merge it 
because it helps find unwarranted assumptions about how the Python GC works.  
Once we get bad assumptions cleaned up in 3rd party code, we have more of 
chance of pursuing an alternative GC strategy for CPython.

--

___
Python tracker 
<https://bugs.python.org/issue40255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40255] Fixing Copy on Writes from reference counting

2020-04-15 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Eddie mentions in the PR about using memory arenas to contain immortal objects. 
 I think it could be worth investigating further.  With the current PR, the 
immortal status is dependent on the value of the refcnt field of the object.  
Using immortal arenas might be faster because you can check if an object is 
immortal just based on its address (no data dependency on refcnt value).  

The fastest would be to create an immortal block as part of the BSS 
(uninitialized data).  Then, within incref/decref you use the location and size 
of the immortal arena (compile time constants) to test if the object is 
immortal.  Maybe could just check if the high bits of an object address match a 
constant mask (if immortal region is aligned and is power of 2 in size).  
Slightly slower but more flexible would be to make the immortal arena size and 
location global variables.  That way, you can set the size of the region on 
startup.  Also would be more flexible in terms of ABI compatibility.  That 
would introduce one or two global loads within incref/decref but those values 
would almost certainly be in L1 cache.

Even more flexible would be to use a memory map to mark which arenas are 
immortal.  See my radix tree implementation for obmalloc:

https://bugs.python.org/issue37448

I would guess the radix tree lookups are too expensive to put in incref/decref. 
 Should probably test that though.

I had started doing an experiment with the arena approach before I noticed 
Eddie's comment about it.  I would like to see his version.  Here is a sketch 
of mine (not working yet):

- change new_arena() to initially allocate from an "immortal memory" region.  
There are multiple ways to allocate that (BSS/uninitialized data, 
aligned_alloc(), etc).

- add a _PyMem_enable_immortal() call to switch obmalloc from using immortal 
arenas to regular ones.  Need to mess with some obmalloc data structures to 
switch arenas (usedpools, usable_arenas, unused_arena_objects).

- change incref/decref to check if immortal status has been enabled and if 
object address falls within immortal region.  If so, incref/decref don't do 
anything.

By default, the immortal arenas could be enabled on Python startup.  Call 
_PyMem_enable_immortal after startup but before running user code.  There could 
be a command-line option to disable the automatic call to 
_PyMem_enable_immortal() so that users like Instagram can do their pre-fork 
initialization before calling it.

Next step down the rabbit-hole could be to use something like Jeethu Rao's 
frozen module serializer and dump out all of the startup objects and put them 
into the BSS:

 https://bugs.python.org/issue34690

--

___
Python tracker 
<https://bugs.python.org/issue40255>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39448] Add regen-frozen makefile target

2020-01-24 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
keywords: +patch
pull_requests: +17558
pull_request: https://github.com/python/cpython/pull/18174

___
Python tracker 
<https://bugs.python.org/issue39448>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39448] Add regen-frozen makefile target

2020-01-24 Thread Neil Schemenauer


New submission from Neil Schemenauer :

Updating the frozen module "__hello__" code inside Python/frozen.c is currently 
a manual process.  That's a bit tedious since it adds some extra work in the 
case that bytecode changes are made.  I've created a small script and a 
makefile target to automates the process.

--
messages: 360655
nosy: nascheme
priority: normal
severity: normal
stage: patch review
status: open
title: Add regen-frozen makefile target
type: enhancement

___
Python tracker 
<https://bugs.python.org/issue39448>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-10-15 Thread Neil Schemenauer


Neil Schemenauer  added the comment:


New changeset 392a13bb9346331b087bcd8bb1b37072c126abee by Neil Schemenauer in 
branch 'master':
bpo-38006: Add unit test for weakref clear bug (GH-16788)
https://github.com/python/cpython/commit/392a13bb9346331b087bcd8bb1b37072c126abee


--

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-10-14 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

> It's fundamentally insane to expect any gc to work perfectly when it may be 
> blind to what the object graph _is_.

I'm often amazed it works at all, let alone perfectly. ;-P

This did trigger me to think of another possible problem.  I setup my unit test 
as you suggested:

#   Z <- Y <- A--+--> WZ -> C
# ^  |
# +--+
# where:
#   WZ is a weakref to Z with callback C
#   Y doesn't implement tp_traverse
#   A contains a reference to itself, Y and WZ

But what happens if the GC doesn't see that WZ is trash?  Then it will not be 
cleared.  Hang it off Y so the GC can't find it.  We can set things up so that 
Z is freed before WZ (e.g. WZ in a second and newer cycle).  Then, the callback 
might still run.

On further thought, this seems safe (but maybe broken) because of the 
handle_weakrefs() logic.  The GC will think WZ is going to outlive Z so it will 
call it before doing any tp_clear calls.

--

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-10-14 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
pull_requests: +16348
pull_request: https://github.com/python/cpython/pull/16788

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-10-01 Thread Neil Schemenauer

Neil Schemenauer  added the comment:

Ɓukasz, is there some reason you removed old versions (2.7, 3.6, etc)?  The bug 
is present on those versions of Python and it should be trivial to backport the 
fix.  If those branches are maintained, I think we should fix it.

Attached is a small test program (gc_weakref_bug_demo2.py) that causes the same 
crash as in this bug report (SEGV inside _PyFunction_Vectorcall).  Setting 
things up is quite tricky and it depends on the order that things are cleared 
in delete_garbage().  Maybe still worth making a unit test for it?

Instead of using the 'dummy' function, you can also use 
types.SimpleNamespace().  Calling repr() on it after tp_clear will result in a 
crash or an assert failure.  Those two crashes are not needed to confirm the 
bug.  Just the fact that 'callback' runs is proof enough.  So, in a unit test, 
the callback to just set a global to indicate failure.

--
Added file: https://bugs.python.org/file48637/gc_weakref_bug_demo2.py

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33418] Memory leaks in functions

2019-10-01 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
pull_requests: +16095
pull_request: https://github.com/python/cpython/pull/16502

___
Python tracker 
<https://bugs.python.org/issue33418>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-30 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I worked some on trying to create a unit test this evening.  Attached is my 
best result so far (gc_weakref_bug_demo.py).  It requires that you enable the 
'xx' module so we have a container object without tp_traverse.  We should 
probably add one to _testcapi for the real unit test so we have it.

I can make the weakref callback run from within delete_garbage().  I haven't 
been able to make tp_clear for the function execute before the callback and I 
think that is needed to replicate the crash in this bug report.

--
Added file: https://bugs.python.org/file48633/gc_weakref_bug_demo.py

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-30 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
nosy: +benjamin.peterson, larry, ned.deily
versions: +Python 2.7, Python 3.5, Python 3.6, Python 3.7

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-30 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I created GH-16502.  I'm not exactly sure of the process of making a revert on 
a branch like '3.8' so hopefully it is correct.  The code change is exactly 
what has been reverted in 3.8.  The net effect will be as if the revert was 
never done.

--

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37725] "make clean" should remove PGO task data

2019-09-30 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue37725>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-30 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
pull_requests: +16092
pull_request: https://github.com/python/cpython/pull/16502

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-30 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

> Is introducing tp_clear on functions a thing that has ABI consequences? In 
> other words, if we take our time on this, would it land in 3.8.1 or 3.9.0?

I think it should not have ABI consequences.  However, I see the addition of 
tp_clear as a new feature.  Leaking memory due to reference cycles is bad 
behavior but not a bug to be fixed in a maintenance release.  Unless we require 
full GC protocol on every container object, we can't promise not to leak.

Inaka's change to add func tp_clear has been in all the 3.8.0 pre-releases (I 
think).  So, it should be pretty well exercised by now (at least, as well as 
the per-releases are tested).

--

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-30 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

> Would [func tp_clear] help with memory usage in functions or was BPO-33418 
> addressed in another way since?

Having a tp_clear for all container objects that can be involved in reference 
cycles will help the GC free memory.  BPO-33418 may be contrived but it does 
demonstrate a problem that could happen in "real" code.  If you have a 
long-running program, tracking down a memory leak due to reference cycles can 
be fiendishly difficult.  That's why Tim originally wrote cyclops and why I 
started working on cyclic GC in the first place.

It is a "quality of implementation" issue.  It is not wrong for CPython to leak 
memory if you create reference cycles (not all types are required to implement 
the GC protocol).  I expect users would be surprised if it happens since the 
majority have never used Python without the cyclic GC.

--

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-30 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Oops, I linked to wrong PR, my proposed fix is GH-16495.  It is the same as 
what Tim suggests in his last comment.

--

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-30 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

> Why setting it back to release blocker? As far as I recall, this issue is not
> a Python 3.8 regression. The regression which triggered this old and existing
> bug has been fixed, see previous comments.

I leave it up to our glorious release manager to decide how to proceed.  IMHO,
GH-15826 (revert the addition of tp_clear for functions) and GH-15641 (Avoid
closure in weakref.WeakValueDictionary) are two cases of not fixing the real
bug, just hiding symptoms.  However, since this bug has existed for decades, I
wouldn't fault him for taking the conservative approach and not trying to
address the real bug in 3.8.

I've created a new PR (GH-16083) that I think is the correct fix.  If that PR
is applied, I think we should also restore tp_clear for functions (revert
GH-15826).  GH-15641 can probably stay.


Tim on my rough patch:
> It's semantically correct since we never wanted to execute a callback from a
> trash weakref to begin with

Thanks for your help Tim.  I think my PR is a cleaner fix that does essentially
the same thing.  Just call _PyWeakref_ClearRef() on weakrefs that are in
'unreachable'.

--

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-30 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
pull_requests: +16083
pull_request: https://github.com/python/cpython/pull/16495

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-29 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

> We can have finalizer code running during delete_garbage().  That
> code should not have access to non-valid objects.  Weakrefs seem be
> a way to violate that.  handle_weakrefs() take care of some of them
> but it seems there are other issues.

I see that handle_weakrefs() calls _PyWeakref_ClearRef() and that
will clear the weakref even if it doesn't have callback.  So, I
think that takes care for the hole I was worried about.  I.e. a
__del__ method could have a weakref to an non-valid object.
However, because handle_weakrefs() will find that weakref, it will
have been cleared when the __del__ method executes.

--

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-29 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

> Fleshing out something I left implicit: if there's a trash object T
> with a finalizer but we don't KNOW it's trash, we won't force-run its
> finalizer before delete_garbage starts either.  Then, really the same
> thing: we may tp_clear some piece of trash T's finalizer needs before
> enough cycles are broken that T's finalizer gets run as a routine
> consequence of T's refcount falling to 0.

Definition: a 'valid' object is one that hasn't had tp_clear called

I think the difference is that non-weakref finalizers have strong
references to objects that they can access when they run.  So, if we
haven't found them, they will keep all the objects that they refer
to alive as well (subtract_refs() cannot account for those refs).
So those objects will all be valid.

There seems a hole though.  Non-weakref finalizers could have a
weakref (without callback) to something in the garbage set.  Then,
when the finalizer runs during delete_garbage(), that finalizer code
can see non-valid objects via the weakref.  I think this can only happen if 
there are missing/incomplete tp_traverse methods.

We can have finalizer code running during delete_garbage().  That
code should not have access to non-valid objects.  Weakrefs seem be
a way to violate that.  handle_weakrefs() take care of some of them
but it seems there are other issues.

--

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-29 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Since W is in the unreachable set, we should not be executing its callback.  
Would the attached rough patch (gc_disable_wr_callback.txt) be a possible fix?  
When we find W inside handle_weakrefs(), we mark it as trash and will not 
execute the callback.

This is similar to Pablo's bpo-38009 but I think attacks the problem in a 
better way.  Having func_clear() being run is only one possible bad outcome of 
executing the callback.  We want to avoid executing *any* user level Python 
code if the weakref has access to objects found by the GC and which have 
tp_clear called on them.

--
Added file: https://bugs.python.org/file48630/gc_disable_wr_callback.txt

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-29 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I hacked up my copy of CPython to add flags to PyObject to see the
GC state of objects.  That makes it easier to know if an object is
in the 'unreachable' list or not.

> We must know that F is trash, else we never would have called tp_clear(F).

Yup.  In the crashing test program, F is in 'unreachable'.

> We must NOT know CT is trash.  If we did know that, then we would
> have found W too, and then:

Yes again.  CT is not in 'unreachable'.

> So if CT _is_ trash, we didn't know it.  If/when delete_garbage
> broke enough cyclea that CT's refcount fell to 0, other parts of
> Python would have invoked W's callback as a matter of course, with
> F possibly already tp_clear'ed.

I haven't verified this but it must be what's happening.  Note that my flags
show that W *is* in 'unreachable'.  It has to be otherwise F would not have
tp_clear called on it.

> Neil, at a high level it's not REALLY surprisimg that gc can screw
> up if it can't tell the difference between trash & treasure ;-)
> Long ago, though, it's possible the only real failure mode was
> leaking objects that were actually unreachable.

It has been so long I can't be sure the GC worked that way but I think
it did.  It would only call tp_clear on things it could really prove
were garbage.  If tp_clear didn't actually break the cycle, the only
harmful result was leaking memory.  That behavior is similar to a
conservative GC.

> Offhand not worried about get_objects().  That returns objects from
> generation lists, but gc moves possible trash among temporary
> internal (not generation) lists as it goes along.

But when delete_garbage() gets done, it moves the object to 'old'.  I
think that means gc.get_objects() called after the collection completes
can reveal objects that have had their tp_clear called (if tp_clear
didn't result in them being freed).

--

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-29 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Victor:
> I'm surprised that the function would be seen as "unreachable" if
> it's reference counter was equal to 135:

If all those references are contained within that garbage set, it is
possible.


Pablo:
> For the weakref to be handled correctly the ctypedescr needs to be
> identified correctly as part of the isolated cycle. If is not in
> the isolated cycle something may be missing tp_traverse or the GC
> flags. 

I think this analysis is correct.  I did some poking around with
using 'rr'.  It is very handy to run the program in reverse and use
conditional breakpoints based on pointer addresses.

The objects involved seem to be:

CF: cfield
CT: ctypedescr
F: function
W: weakref

They have the following structure:

CF -> CT -> W -> F

The first link the chain is not found by the GC because CF does not
implement tp_traverse.  All those objects are part of garbage kept
alive by the reference cycle and the GC calls delete_garbage() on
it.  CT is not part of 'unreachable' and handle_weakrefs() does not
find W.

What I previously said about tp_clear was incorrect.  We take great
pains to avoid running non-trivial code after calling tp_clear (no
finalizers and no callbacks).  So, the func_clear method should be
safe.  Even if tp_clear doesn't succeed in breaking the cycle, there
should be no way for user Python code to access those objects that
have had tp_clear called on them.  Is gc.get_objects() a hole in
this?  Seems like you could cause crashes with it.

This bug is surprising to me in another way.  If the analysis is
correct then fully implementing the GC protocols is no longer
optional (as it was when cyclic GC was first introduced).  If the GC
cannot find garbage weakrefs handle_weakrefs() doesn't work and we
end up with crashes like this.

Maybe Pablos fix to not call the weakref callback if we are in
delete_garbage() is a good enough band-aid.  The alternative would
seem to be ensuring that tp_traverse methods exist so that every
single reference can be followed by the GC.  I imagine that would be
a huge task and there are many extension types that don't have them.

--

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38009] Handle weakreference callbacks invoked indirectly in the middle of a gc collection

2019-09-27 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I think the problem with your logic is that the weakref to F is part of the 
garbage set.  So, handle_finalizers() should detect that and clear the 
finalizer rather than call it.  Once we get to delete_garbage() and start 
calling tp_clear(), we can't be running weakref callbacks or finalizers.  The 
GC logic goes through great pains to ensure that.

--
nosy: +nascheme

___
Python tracker 
<https://bugs.python.org/issue38009>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38006] Crash in remove() weak reference callback of weakref.WeakValueDictionary at Python exit

2019-09-27 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

A few comments from the "peanut gallery".  Thanks to Victor and Pablo for doing 
the hard work of investigating this bug.

First, it has been a long time since Tim and I first developed gcmodule.c.  So, 
some of my understanding may be inaccurate due to code changes or old age 
memory loss. ;-P

If I understand Victor's test case correctly, the problem is caused if you have 
an extension type that implements tp_traverse but not tp_clear and that there 
is also a weakref involved in the trash cycle.  In my original design of the 
GC, not implementing tp_clear should have been okay.  It would mean that the GC 
might find garbage reference cycles that it couldn't cleanup.  Those would leak 
memory but would not crash Python.

I'm not sure what has changed since to require that tp_clear actually 
successfully clears the cycle.  Tim was the origin of the code that handles 
weakrefs. The GC is (relatively) simple if not for handling weakrefs and 
finalizers.  Tim did almost all of the difficult and brain exploding work on 
that stuff.  There was a big re-org some years ago to handle "legacy 
finalizers" and PEP 442 finalizers.  That made things more complicated yet.  
Maybe the requirement for a working tp_clear came into existence then?  I added 
Tim to the nosy list since he might have insight.

To me, it seems problematic that we would require a type to have a tp_clear 
method that actually breaks cycles.  For mutable objects like dicts and lists, 
we can have tp_clear do its thing while leaving the object in a valid state.  
The tp_clear added for functions was not like that because other code expected 
structure fields to be non-NULL.  At least that's my understanding.

Is the behavior of tp_clear the key to this bug?  In the original design GC, 
calling tp_clear was not guaranteed to cause objects in the cylce to be freed.  
So, after tp_clear, the object still needs to be in some kind of valid state.  
That is one of the reasons why the tuple type doesn't have a tp_clear (also 
that they can't normally be directly be used to create cycles).

One last observation: the reference cycle GC is now being relied upon in ways 
that were never originally envisioned.  Originally the cyclic GC was an 
optional feature and you could build Python without it.  Reference cycles were 
supposed to exist in rare cases and only be created by user code.  In that 
world, having tp_clear on the function type is be unneeded.  Now, CPython is 
creating reference cycles itself and the GC is being relied on much more to 
free memory.  I'm afraid we are evolving into a complicated and low performance 
GC (refcount + cycle collection).  I don't have any good suggestions on a 
better approach.  Maintaining the C-API ties us to reference counting.  I 
mention it because it might be helpful to have a high-level view of the 
evolution of this part of CPython.

--
nosy: +nascheme, tim.peters

___
Python tracker 
<https://bugs.python.org/issue38006>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37725] "make clean" should remove PGO task data

2019-07-30 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
keywords: +patch
pull_requests: +14790
pull_request: https://github.com/python/cpython/pull/15033

___
Python tracker 
<https://bugs.python.org/issue37725>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37725] "make clean" should remove PGO task data

2019-07-30 Thread Neil Schemenauer


New submission from Neil Schemenauer :

I find it annoying and surprising that "make clean" does not remove the PGO 
data.  If you change a source file, running "make clean" and "make" should 
result in a newly built executable, IMHO.  As it is now, you usually get a 
confusing build failure (PGO data is out of date).

The fix is fairly easy.  Make a new target that does what "clean" currently 
does.  Have the PGO build call that when it needs to preserve the PGO data.  
Introduce a new "clean" target that does what the old clean did and also 
removes the PGO data.

Changing the build system is fraught with danger but I think this is a fairly 
safe change.  The current behavior is quite annoying, IMHO.

--
components: Build
messages: 348773
nosy: nascheme
priority: normal
severity: normal
stage: patch review
status: open
title: "make clean" should remove PGO task data
type: enhancement

___
Python tracker 
<https://bugs.python.org/issue37725>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37707] Skip individual unit tests that are expensive for the PGO task

2019-07-30 Thread Neil Schemenauer


Neil Schemenauer  added the comment:


New changeset 382cb85401bb010ead411c0532499ffe16c3cf27 by Neil Schemenauer 
(Miss Islington (bot)) in branch '3.8':
bpo-37707: Exclude expensive unit tests from PGO task (GH-15009) (#15024)
https://github.com/python/cpython/commit/382cb85401bb010ead411c0532499ffe16c3cf27


--

___
Python tracker 
<https://bugs.python.org/issue37707>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37707] Skip individual unit tests that are expensive for the PGO task

2019-07-30 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue37707>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36044] PROFILE_TASK for PGO build is not a good workload

2019-07-30 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Closing as I think PR 14702 mostly resolves this.  The gc_collect() issue has 
not been resolved but based on some investigation, I think it is not worth the 
effort to try to reduce the time spend in that function.  Reducing the number 
of unit tests being run had a huge benefit and did not significantly impact the 
optimization.

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue36044>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37707] Skip individual unit tests that are expensive for the PGO task

2019-07-30 Thread Neil Schemenauer


Neil Schemenauer  added the comment:


New changeset 52a48e62c6a94577152f9301bbe5f3bc806cfcf1 by Neil Schemenauer in 
branch 'master':
bpo-37707: Exclude expensive unit tests from PGO task (GH-15009)
https://github.com/python/cpython/commit/52a48e62c6a94577152f9301bbe5f3bc806cfcf1


--

___
Python tracker 
<https://bugs.python.org/issue37707>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36044] PROFILE_TASK for PGO build is not a good workload

2019-07-29 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I think expanding the list of tests used by --pgo is fine, as long as we put a 
little thought into each one we add.  The ones added by Steve look fine to me.

It seems useful to run a profiler and see if there are any unusually expensive 
tests being added.  I used 'cprofile' and found a few candidates.  See bug 
#37707 and PR 15009.

--

___
Python tracker 
<https://bugs.python.org/issue36044>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37707] Skip individual unit tests that are expensive for the PGO task

2019-07-29 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
keywords: +patch
pull_requests: +14772
pull_request: https://github.com/python/cpython/pull/15009

___
Python tracker 
<https://bugs.python.org/issue37707>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37707] Skip individual unit tests that are expensive for the PGO task

2019-07-29 Thread Neil Schemenauer


New submission from Neil Schemenauer :

Add a new support decorator, @skip_if_pgo_task and then use it to mark test 
cases.  I suspect the PGO task works well if it can exercise common code paths. 
 Running long tests likely have rapidly diminishing benefits.  The instrumented 
PGO executable runs quite a bit slower than a normal build and so it is useful 
to spend a bit of time to exclude expensive tests.

--
components: Build
messages: 348677
nosy: nascheme
priority: normal
severity: normal
stage: patch review
status: open
title: Skip individual unit tests that are expensive for the PGO task
type: enhancement

___
Python tracker 
<https://bugs.python.org/issue37707>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36044] PROFILE_TASK for PGO build is not a good workload

2019-07-22 Thread Neil Schemenauer


Neil Schemenauer  added the comment:


New changeset 4e16a4a3112161a5c6981c0588142d4a4673a934 by Neil Schemenauer in 
branch 'master':
bpo-36044: Reduce number of unit tests run for PGO build (GH-14702)
https://github.com/python/cpython/commit/4e16a4a3112161a5c6981c0588142d4a4673a934


--

___
Python tracker 
<https://bugs.python.org/issue36044>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36044] PROFILE_TASK for PGO build is not a good workload

2019-07-19 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I changed configure.in to use AC_ARG_VAR instead.  So, you can override it as 
you suggest:

  ./configure [..] PROFILE_TASK="-m test --pgo-extended"

--

___
Python tracker 
<https://bugs.python.org/issue36044>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36044] PROFILE_TASK for PGO build is not a good workload

2019-07-15 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Thanks for the feedback.  I agree that putting the tests in regrtest is better. 
 I've made the following changes:

- the --pgo option now uses the shorter list of tests by default

- I added --pgo-extended as a regrtest option that enables the old behavior of 
running most of the tests.  I considered being more fancy and having named sets 
of different tests (like the -u option) but I think just the two options (fast 
or slow) is good enough.  I used "pgo-extended" because I think "fast" or 
"slow" is confusing (i.e. build is slower but hopefully resulting binary is 
faster).

- You can still pass an explicit list of tests if you use the --pgo option.  
That should avoid breaking builds like the Docker one.

- You can now configure the the PGO task using 'configure'.  I think that is 
cleaner than trying to modify the Makefile after the fact. e.g. 

   ./configure [...] --with-profile-task='-m test --pgo-extended'

--

___
Python tracker 
<https://bugs.python.org/issue36044>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36044] PROFILE_TASK for PGO build is not a good workload

2019-07-11 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

+-+---+-+
| Benchmark   | task-all2 | task-short7 |
+=+===+=+
| 2to3| 304 ms| 305 ms: 1.00x slower (+0%)  |
+-+---+-+
| crypto_pyaes| 109 ms| 107 ms: 1.01x faster (-1%)  |
+-+---+-+
| django_template | 116 ms| 109 ms: 1.06x faster (-6%)  |
+-+---+-+
| dulwich_log | 76.8 ms   | 77.0 ms: 1.00x slower (+0%) |
+-+---+-+
| fannkuch| 454 ms| 449 ms: 1.01x faster (-1%)  |
+-+---+-+
| float   | 113 ms| 112 ms: 1.02x faster (-2%)  |
+-+---+-+
| hexiom  | 9.50 ms   | 9.56 ms: 1.01x slower (+1%) |
+-+---+-+
| html5lib| 93.4 ms   | 95.4 ms: 1.02x slower (+2%) |
+-+---+-+
| json_loads  | 24.7 us   | 26.5 us: 1.07x slower (+7%) |
+-+---+-+
| logging_format  | 9.55 us   | 9.63 us: 1.01x slower (+1%) |
+-+---+-+
| logging_simple  | 8.54 us   | 8.72 us: 1.02x slower (+2%) |
+-+---+-+
| nbody   | 131 ms| 130 ms: 1.01x faster (-1%)  |
+-+---+-+
| nqueens | 93.6 ms   | 92.8 ms: 1.01x faster (-1%) |
+-+---+-+
| pathlib | 19.0 ms   | 19.1 ms: 1.01x slower (+1%) |
+-+---+-+
| pickle_list | 2.72 us   | 2.66 us: 1.02x faster (-2%) |
+-+---+-+
| pidigits| 168 ms| 168 ms: 1.00x faster (-0%)  |
+-+---+-+
| python_startup  | 7.89 ms   | 7.89 ms: 1.00x slower (+0%) |
+-+---+-+
| python_startup_no_site  | 5.41 ms   | 5.41 ms: 1.00x slower (+0%) |
+-+---+-+
| raytrace| 470 ms| 481 ms: 1.02x slower (+2%)  |
+-+---+-+
| regex_compile   | 168 ms| 169 ms: 1.01x slower (+1%)  |
+-+---+-+
| regex_dna   | 168 ms| 173 ms: 1.03x slower (+3%)  |
+-+---+-+
| regex_effbot| 2.91 ms   | 2.72 ms: 1.07x faster (-7%) |
+-+---+-+
| regex_v8| 20.4 ms   | 21.1 ms: 1.03x slower (+3%) |
+-+---+-+
| richards| 67.6 ms   | 66.9 ms: 1.01x faster (-1%) |
+-+---+-+
| scimark_fft | 341 ms| 339 ms: 1.01x faster (-1%)  |
+-+---+-+
| scimark_monte_carlo | 101 ms| 98.2 ms: 1.03x faster (-3%) |
+-+---+-+
| scimark_sparse_mat_mult | 4.05 ms   | 3.97 ms: 1.02x faster (-2%) |
+-+---+-+
| spectral_norm   | 128 ms| 129 ms: 1.01x slower (+1%)  |
+-+---+-+
| sqlalchemy_imperative   | 30.0 ms   | 30.4 ms: 1.01x slower (+1%) |
+-+---+-+
| sympy_expand| 408 ms| 411 ms: 1.01x slower (+1%)  |
+-+---+-+
| sympy_sum   | 90.1 ms   | 89.6 ms: 1.01x faster (-1%) |
+-+---+-+
| sympy_str   | 182 ms| 180 ms: 1.01x faster (-1%)  |
+-+---+-+
| tornado_http| 175 ms| 179 ms: 1.03x slower (+3%)  |
+-+---+-+
| unpack_sequence | 50.0 ns   | 47.5 ns: 1.05x faster (-5

[issue36044] PROFILE_TASK for PGO build is not a good workload

2019-07-11 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I tweaked the list of unit tests a little more, trying to incorporate some from 
your Docker build settings.  Not sure what's going on with the pickle results.  
Below are new pyperformance runs, comparing my PR to the "master" version it is 
based on.

I added the --with-profile-task configure option.  If someone really wants to 
wait for a long build to gain a tiny bit of extra speed, that makes it pretty 
easy to do so.

BTW, I tried making the --pgo disable most of the gc_collect() calls.  It 
doesn't actually make much difference in profile generation time so I discarded 
those changes.

--

___
Python tracker 
<https://bugs.python.org/issue36044>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36044] PROFILE_TASK for PGO build is not a good workload

2019-07-11 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
keywords: +patch
pull_requests: +14501
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/14702

___
Python tracker 
<https://bugs.python.org/issue36044>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36044] PROFILE_TASK for PGO build is not a good workload

2019-07-10 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

> Decreasing the total wall time for a default --enable-optimizations build 
> would 
> be a good thing for everyone, provided the resulting interpreter remains 
> "effectively similar" in speed.  If you somehow manage to find something that
> actually speeds up the resulting interpreter, amazing!

I spent quite a lot of time making different PGO builds and comparing with 
pyperformance.  The current PGO task is *really* slow.  Just running the 
PROFILE_TASK takes 24 minutes on my decently fast PC.

Using this set of tests seems to work pretty well:

PROFILE_TASK=-m test.regrtest --pgo \
test_collections \
test_dataclasses \
test_difflib \
test_embed \
test_float \
test_functools \
test_generators \
test_int \
test_itertools \
test_json \
test_logging \
test_long \
test_ordered_dict \
test_pickle \
test_pprint \
test_re \
test_set \
test_statistics \
test_struct \
test_tabnanny \
test_xml_etree

Instead of 24 minutes, the above task takes one and a half minutes.  
pyperformance results seem largely unchanged.  Comparison below.  Tuning the 
tests to get the best pyperformance result is a bit dangerous and perhaps 
running the whole test suite is safer (i.e. we are not optimizing for specific 
benchmarks).  I didn't tweak the list too much.  I added test_int, test_long, 
test_struct and test_itertools as a result of my pyperformance runs.  Not too 
surprising those are important modules.

I think the set of tests above should do a pretty good job of covering the hot 
code paths in most Python programs.  So, maybe it is good enough given the 
massive speedup in build time.



+-+--+--+
| Benchmark   | task-all | task-short   |
+=+==+==+
| 2to3| 311 ms   | 315 ms: 1.01x slower (+1%)   |
+-+--+--+
| chaos   | 111 ms   | 108 ms: 1.02x faster (-2%)   |
+-+--+--+
| crypto_pyaes| 114 ms   | 112 ms: 1.01x faster (-1%)   |
+-+--+--+
| dulwich_log | 78.0 ms  | 78.7 ms: 1.01x slower (+1%)  |
+-+--+--+
| fannkuch| 470 ms   | 452 ms: 1.04x faster (-4%)   |
+-+--+--+
| float   | 118 ms   | 117 ms: 1.01x faster (-1%)   |
+-+--+--+
| go  | 253 ms   | 255 ms: 1.01x slower (+1%)   |
+-+--+--+
| json_dumps  | 12.5 ms  | 11.8 ms: 1.06x faster (-6%)  |
+-+--+--+
| json_loads  | 26.3 us  | 25.4 us: 1.04x faster (-3%)  |
+-+--+--+
| logging_format  | 9.53 us  | 9.66 us: 1.01x slower (+1%)  |
+-+--+--+
| logging_silent  | 198 ns   | 196 ns: 1.01x faster (-1%)   |
+-+--+--+
| mako| 15.2 ms  | 15.8 ms: 1.04x slower (+4%)  |
+-+--+--+
| meteor_contest  | 98.2 ms  | 96.8 ms: 1.01x faster (-1%)  |
+-+--+--+
| nbody   | 135 ms   | 133 ms: 1.01x faster (-1%)   |
+-+--+--+
| nqueens | 97.2 ms  | 96.6 ms: 1.01x faster (-1%)  |
+-+--+--+
| pathlib | 19.4 ms  | 19.7 ms: 1.02x slower (+2%)  |
+-+--+--+
| pickle  | 8.10 us  | 9.07 us: 1.12x slower (+12%) |
+-+--+--+
| pickle_dict | 23.1 us  | 18.6 us: 1.25x faster (-20%) |
+-+--+--+
| pickle_list | 3.64 us  | 2.81 us: 1.30x faster (-23%) |
+-+--+--+
| pickle_pure_python  | 470 us   | 460 us: 1.02x faster (-2%)   |
+-+--+--+
| pidigits| 169 ms   | 173 ms: 1.02x slower (+2%)   |
+-+--+--+
| python_s

[issue37537] Compute allocated blocks in _Py_GetAllocatedBlocks()

2019-07-10 Thread Neil Schemenauer


Neil Schemenauer  added the comment:


New changeset 5d25f2b70351fc6a56ce5513ccf5f58556c18837 by Neil Schemenauer in 
branch 'master':
bpo-37537: Compute allocated blocks in _Py_GetAllocatedBlocks() (#14680)
https://github.com/python/cpython/commit/5d25f2b70351fc6a56ce5513ccf5f58556c18837


--

___
Python tracker 
<https://bugs.python.org/issue37537>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37537] Compute allocated blocks in _Py_GetAllocatedBlocks()

2019-07-10 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

PyPerformance result, using --enable-optimizations

+-+-+--+
| Benchmark   | base| no-allocated |
+=+=+==+
| 2to3| 388 ms  | 385 ms: 1.01x faster (-1%)   |
+-+-+--+
| dulwich_log | 91.8 ms | 92.8 ms: 1.01x slower (+1%)  |
+-+-+--+
| go  | 302 ms  | 296 ms: 1.02x faster (-2%)   |
+-+-+--+
| hexiom  | 11.4 ms | 11.5 ms: 1.01x slower (+1%)  |
+-+-+--+
| json_dumps  | 14.0 ms | 13.7 ms: 1.02x faster (-2%)  |
+-+-+--+
| logging_format  | 12.1 us | 11.9 us: 1.02x faster (-2%)  |
+-+-+--+
| logging_simple  | 10.9 us | 10.6 us: 1.02x faster (-2%)  |
+-+-+--+
| pickle_dict | 23.6 us | 21.3 us: 1.11x faster (-10%) |
+-+-+--+
| pidigits| 208 ms  | 210 ms: 1.01x slower (+1%)   |
+-+-+--+
| python_startup  | 9.73 ms | 9.66 ms: 1.01x faster (-1%)  |
+-+-+--+
| python_startup_no_site  | 6.70 ms | 6.66 ms: 1.01x faster (-1%)  |
+-+-+--+
| regex_dna   | 199 ms  | 195 ms: 1.02x faster (-2%)   |
+-+-+--+
| regex_effbot| 3.18 ms | 3.14 ms: 1.01x faster (-1%)  |
+-+-+--+
| regex_v8| 25.0 ms | 24.4 ms: 1.02x faster (-2%)  |
+-+-+--+
| scimark_fft | 421 ms  | 426 ms: 1.01x slower (+1%)   |
+-+-+--+
| scimark_sor | 220 ms  | 223 ms: 1.01x slower (+1%)   |
+-+-+--+
| scimark_sparse_mat_mult | 5.18 ms | 5.33 ms: 1.03x slower (+3%)  |
+-+-+--+
| spectral_norm   | 159 ms  | 156 ms: 1.02x faster (-2%)   |
+-+-+--+
| unpickle| 14.6 us | 14.2 us: 1.02x faster (-2%)  |
+-+-+--+
| unpickle_list   | 4.46 us | 4.33 us: 1.03x faster (-3%)  |
+-+-+--+
| xml_etree_iterparse | 141 ms  | 139 ms: 1.01x faster (-1%)   |
+-+-+--+

Not significant (36): chaos; crypto_pyaes; deltablue; django_template; 
fannkuch; float; html5lib; json_loads; logging_silent; mako; meteor_contest; 
nbody; nqueens; pathlib; pickle; pickle_list; pickle_pure_python; raytrace; 
regex_compile; richards; scimark_lu; scimark_monte_carlo; 
sqlalchemy_declarative; sqlalchemy_imperative; sqlite_synth; sympy_expand; 
sympy_integrate; sympy_sum; sympy_str; telco; tornado_http; unpack_sequence; 
unpickle_pure_python; xml_etree_parse; xml_etree_generate; xml_etree_process

--
stage: patch review -> 

___
Python tracker 
<https://bugs.python.org/issue37537>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37537] Compute allocated blocks in _Py_GetAllocatedBlocks()

2019-07-10 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
keywords: +patch
pull_requests: +14486
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/14680

___
Python tracker 
<https://bugs.python.org/issue37537>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37537] Compute allocated blocks in _Py_GetAllocatedBlocks()

2019-07-10 Thread Neil Schemenauer


New submission from Neil Schemenauer :

This is a small but measurable optimization for _PyObject_Malloc and 
_PyObject_Free.  It slows down _Py_GetAllocatedBlocks a bit but I think that's 
a price worth paying.

--
components: Interpreter Core
messages: 347602
nosy: nascheme
priority: normal
severity: normal
status: open
title: Compute allocated blocks in _Py_GetAllocatedBlocks()

___
Python tracker 
<https://bugs.python.org/issue37537>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37448] obmalloc: radix tree for tracking arena address ranges

2019-06-29 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
keywords: +patch
pull_requests: +14291
pull_request: https://github.com/python/cpython/pull/14474

___
Python tracker 
<https://bugs.python.org/issue37448>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37448] obmalloc: radix tree for tracking arena address ranges

2019-06-29 Thread Neil Schemenauer


Change by Neil Schemenauer :


Added file: https://bugs.python.org/file48446/obmalloc_overhead.py

___
Python tracker 
<https://bugs.python.org/issue37448>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37448] obmalloc: radix tree for tracking arena address ranges

2019-06-29 Thread Neil Schemenauer


Change by Neil Schemenauer :


Added file: https://bugs.python.org/file48445/pyperf_radix_compare.txt

___
Python tracker 
<https://bugs.python.org/issue37448>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37448] obmalloc: radix tree for tracking arena address ranges

2019-06-29 Thread Neil Schemenauer


New submission from Neil Schemenauer :

This patch implements an alternative version of obmalloc's address_in_range().  
It uses a radix tree to map the areas of memory covered by obmalloc arenas.  
pymalloc_free() uses address_in_range() to determine if a block of memory is 
controlled by obmalloc or has been allocated by the system allocator.  The 
address_in_range() function must be fast.

The current version of address_in_range() uses a slightly memory unsanitary 
scheme.  I.e. it reads memory that has possibly not been initialized.  In 
theory that is not allowed and could cause a crash.  In practice, it has worked 
reliability for many years.  There is some ugly logic in obmalloc.c to disable 
sanity checking (e.g. ASN, TSAN, MSAN).  One advantage of this radix tree 
approach is that it doesn't have this unsanitary behavior.

Another small advantage of the radix tree approach is that it is independent 
from the OS page size.  With the current address_in_range() scheme, the size of 
obmalloc pools are limited to the size of the OS page size.  If larger, a 
segmentation fault could occur.  Bug #37211 (obmalloc: eliminate limit on pool 
size) allows for larger pools but at the cost of some additional code 
complexity and some additional memory overhead.

This patch has been discussed quite a bit on the python-dev list.  Thread 
subjects are:
obmalloc (was Have a big machine and spare time? Here's a possible Python 
bug.)
radix tree arena map for obmalloc

That discussion focuses quite a bit on the value of increasing the obmalloc 
arena and pool sizes.  This proposed patch keeps the sizes the same.  We can 
evaluate changing the sizes as a different issue.

I have run the pyperformance benchmark suite to compare performance of the 
radix tree with the status quo.  I think there is no significant difference for 
the programs that are part of the suite.  The pyperformance comparision report 
is attached.

If we do decide to go with larger pool and arena sizes, the radix tree approach 
actually uses less memory than the "big pools PR" (Bug #37211).  Attached is 
Tim's program to compare memory overheads of the different approaches.

I think it is worth considering using the radix tree by default on 64-bit 
platforms.  It seems to be not any slower than the status quo, does not have 
the memory unsanitary behavior and gives us the ability to easily increase pool 
sizes if we decide we want to.

--
components: Interpreter Core
messages: 346903
nosy: nascheme
priority: normal
severity: normal
stage: patch review
status: open
title: obmalloc: radix tree for tracking arena address ranges
type: enhancement

___
Python tracker 
<https://bugs.python.org/issue37448>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37221] PyCode_New API change breaks backwards compatibility policy

2019-06-11 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I suggest we change PyCode_New() back in the next beta, despite that change 
breaking Cython again.  We could work with them so that they have a new release 
with a "PY_VERSION_HEX > 0x030800b1" test in it. Try to get that Cython version 
released before 3.8b2 hits.  The window of a Cython version that doesn't 
support CPython >= 3.8b1 can be made pretty small.  It would be nice to find 
these issues in alpha releases but finding them in beta is still early enough 
to fix it.

Having to regenerate all of the Cython emitted code embedded in different 
packages is painful.  I experienced having to update numpy so that I could test 
my software with 3.8b1.  Avoiding that is why we should change PyCode_New back. 
 Introducing a new API like PyCode_NewWithPosArgs() is going to work better.

If we do have to make a similar incompatible change in the future, we should 
try hard to make it so there is an overlapping period of compatibility.  It's 
bad to have an API change from one release to the next without giving time for 
projects like Cython time to catch up.

--
nosy: +nascheme

___
Python tracker 
<https://bugs.python.org/issue37221>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37200] PyType_GenericAlloc might over-allocate memory

2019-06-07 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Updated patch is attached.  It appears that the extra item is only needed if 
Py_TPFLAGS_TYPE_SUBCLASS set.  In all other cases, it seems we don't need the 
extra space for the sentinel.  At least, the unit tests pass with this change.  
It could be that some extension module depends on this extra allocated spaced.

The number of types affected by this change seem relatively small.  The 'list' 
type is not affected because tp_itemsize is 0.  I did a little test by running 
some unit tests, here are some type names that have a smaller amount of memory 
allocated for them:

   _NamedIntConstant
   CodecInfo
   Point
   TestResults
   madtuple

--
priority: normal -> low
stage:  -> patch review
Added file: https://bugs.python.org/file48404/generic_alloc_sentinel_v2.txt

___
Python tracker 
<https://bugs.python.org/issue37200>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37200] PyType_GenericAlloc might over-allocate memory

2019-06-07 Thread Neil Schemenauer


New submission from Neil Schemenauer :

In the process of working on some garbage collector/obmalloc experiments, I 
noticed what seems to be a quirk about PyType_GenericAlloc().  It calls:

size = _PyObject_VAR_SIZE(type, nitems+1);

Note the "+1" which is documented as "for the sentinel".  That code dates back 
to change "e5c691abe3946ddbaa00730b92f3b96f96903f7d" when Guido added support 
for heap types.  This extra item is not added by _PyObject_GC_NewVar().  Also, 
the documentation for tp_alloc says that the size of the allocated block should 
be:

  tp_basicsize + nitems*tp_itemsize, rounded up to a multiple of sizeof(void*);

The "+1" for the sentinel is definitely needed in certain cases.  I think it 
might only be needed if 'type' is a subtype of 'type'.  I.e. if 
Py_TPFLAGS_TYPE_SUBCLASS is set on 'type'.

I haven't done enough analysis to fully understand this quirk yet but I think 
we are allocating extra memory quite regularly.  Quite a lot of types use 
tp_alloc = PyType_GenericAlloc.  E.g. the 'list' type or a subclass of the 
tuple type.

It seems with the attached patch, unit tests still pass.  Perhaps the +1 could 
be removed on the non-GC branch of the code as well.

--
components: Interpreter Core
files: generic_alloc_sentinel.txt
messages: 345002
nosy: nascheme
priority: normal
severity: normal
status: open
title: PyType_GenericAlloc might over-allocate memory
type: performance
Added file: https://bugs.python.org/file48403/generic_alloc_sentinel.txt

___
Python tracker 
<https://bugs.python.org/issue37200>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27987] obmalloc's 8-byte alignment causes undefined behavior

2019-05-22 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

We now have a concrete use case. ;-) 
 
My idea was that we can introduce a new, CPython internal API that
aligns on 8-byte boundaries (or takes alignment as a parameter).  The
API would be a stop-gap measure. We can use the API to reduce
the overhead for specific types. 
 
E.g. for non-subclasses of float, we know the PyObject structure does
not need 16-byte alignment. We don't need a version of "alignof" to know
this. Inside floatobject.c, we could call the new objmalloc API that
gives new memory with 8-byte alignment. That would save the 33%
overhead. 
 
E.g. in PyFloat_FromDouble, rather than: 
 
 PyObject_MALLOC(sizeof(PyFloatObject))  
 
we could call something like: 
 
 _PyObject_MALLOC_ALIGNED(sizeof(PyFloatObject), 8)  
 
This internal API would not be a permanent solution. Having to manually
fix each place that PyObjects are allocated and hard-coding the required
alignment is not the best solution. We can only fix specific types and
extension modules would always get the 16-byte alignment. Still, by
tweaking some of the most common types, we avoid much of the overhead
for the alignment change, at least for the average Python program. 
 
In the long term, we would need a better solution. E.g. an API that can
take alignment requirements as a parameter. Or, a solution I like
better, have types use PyObject_New().  Then, add an alignment
specifier to type object (e.g. tp_align to go along with tp_basicsize).
Then there does not have to be a new public API that takes alignment.

--

___
Python tracker 
<https://bugs.python.org/issue27987>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27987] obmalloc's 8-byte alignment causes undefined behavior

2019-05-21 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

> sys.getsizeof(3.14) is 24.  And it becomes 32 byte in 16byte aligned 
> pymalloc. (+33%)

I've been doing some reading and trying to understand this issue.  My 
understanding is that malloc() needs to return pointers that are 16-byte 
aligned on AMD64 but, in general, pointers don't have the be aligned that way.  
If you have a structure that contains a "long double" then that member also has 
to be 16-bit aligned.

It seems to me that we don't need to have the PyObject structure containing a 
Python float to be 16-byte aligned.  If so, could we introduce a new obmalloc 
API that returns memory with 8-byte alignment, for use by objects that know 
they don't require 16-byte alignment?  floatobject.c could use this API to 
avoid the 33% overhead.

The new obmalloc API could initially be internal use only until we can come up 
with a design we know we can live with long term.

--

___
Python tracker 
<https://bugs.python.org/issue27987>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36710] Pass _PyRuntimeState as an argument rather than using the _PyRuntime global variable

2019-05-02 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I think there are two questions to answer.  First, do we want to support 
multiple runtimes per process?  Second, if we do, what is the best way to do 
that?  Some people would argue that multiple runtimes are not needed or are too 
hard to do.  Maybe they are correct, I'm not sure.  We should try to get a 
consensus on that first question.

If we do decide to do it, then we need to answer the second question.  Passing 
a "context" argument around seems the best solution.  That is how the Java JNI 
does it.  It sounds like that's how Javascript VMs do it too.  We don't need to 
get creative.  Look at what other VMs do and copy the best idea.

If we do decide to do it, evolving the codebase and all extension modules is 
going to be a massive task.  I would imagine that we can have a backwards 
compatible API layer that uses TSS.  The layer that passes context explicitly 
would still have to maintain the TSS.  There could be a build option that turns 
that backwards compatibility on or off.  If off, you would gain some 
performance advantage because TSS does not have to be kept up-to-date.

My feeling right now that even though this is a massive job, it is the correct 
thing to do.  CPUs continue to gain cores.  Improving CPython's ability to do 
multi-threading and multi-processing should be a priority for CPython core 
developers.

--
nosy: +nascheme

___
Python tracker 
<https://bugs.python.org/issue36710>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35810] Object Initialization does not incref Heap-allocated Types

2019-02-21 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Sorry, morning coffee didn't kick in yet I guess. ;-)  My actual wish is to 
make all types heap allocated and eliminate the statically allocated ones.  So, 
Py_TPFLAGS_HEAPTYPE would be set on all types in that world.  That is a 
gigantic task, affecting near every Python extension type.  Too huge for even a 
nutty person like me to imagine doing in the near term.  So, sorry for 
potentially derailing discussion here.

I agree with comments made by Stefan Behnel and Petr Viktorin.  There is a 
small risk to cause problems (i.e. serious memory leaks in a previously working 
program).  However, as Petr says, the extension in that case is broken and it 
is not hard to fix.  Eddie has provided examples for what changes are needed.

I think if we properly communicate the change then it is okay to merge the PR.

--

___
Python tracker 
<https://bugs.python.org/issue35810>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35810] Object Initialization does not incref Heap-allocated Types

2019-02-21 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Hello Eddie,
Thank you for putting what looks to be significant effort into this PR.  It 
would be great if we can get this fixed.  There is a real issue about breaking 
3rd party extensions.  So, we want to proceed with care.

I wonder, if we are going to break extensions already, could we just remove the 
whole concept of heap allocated types?  If you look through the CPython source 
code, I think you will find a lot of tricky code that deals with the 
Py_TPFLAGS_HEAPTYPE case.  If we could remove heap types, we could remove all 
those cases.  That would give some performance improvement but more importantly 
would simplify the implementation.

If PyType_FromSpec() is now working correctly, could we just move everything 
that the currently a heap type to use that?  Obviously we have to give 3rd 
party extensions a lot of time to get themselves updated.  Maybe give a 
deprecation warning if Py_TPFLAGS_HEAPTYPE is used.  You could have a 
configuration option for Python that enables or disables the 
Py_TPFLAGS_HEAPTYPE support.  Once we think extensions have been given enough 
time to update themselves, we can remove Py_TPFLAGS_HEAPTYPE.

Some other possible advantages of getting rid of heap types:

- GC objects will always have the GC header allocated (because CPython controls 
the allocation of the chunk of memory for the type)

- might be possible to eliminate GC headers and use bitmaps.  I have been 
experimenting with the idea but it seems to require that we don't use heap 
types.  Initially I was interested in the bitmap idea because of memory 
savings.  After more tinkering, I think the big win will be in eliminating 
linked-list traversals.  On modern CPUs, that's slow and iterating over a 
bitmap should be much faster.

- I suspect heap types are difficult to support for PyPy.  I haven't looked 
into that but it seems tricky when you have non-refcounting GC

- type_is_gc() is ugly and would go away.  Based on my profiling, 
PyObject_IS_GC() is pretty expensive.  A lot of types have the tp_is_gc slot 
set (more than you would expect).

- In the very long term, using PyType_FromSpec() could give us the freedom to 
change the structure layout of types.  I don't have any specific ideas about 
that but it seems like a better design.

--
nosy: +nascheme

___
Python tracker 
<https://bugs.python.org/issue35810>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36044] PROFILE_TASK for PGO build is not a good workload

2019-02-19 Thread Neil Schemenauer


New submission from Neil Schemenauer :

I was doing some 'perf' profiling runs of Python.  I decided to try running 
PROFILE_TASK to see what the profile looks like.  I was surprised that 
gc_collect dominated runtime:

  Children  Self  Symbol
+   93.93% 6.00%  [.] _PyEval_EvalFrameDefault
+   76.19% 0.12%  [.] function_code_fastcall  
+   70.65% 0.31%  [.] _PyMethodDef_RawFastCallKeywords
+   63.24% 0.13%  [.] _PyCFunction_FastCallKeywords   
+   58.67% 0.36%  [.] _PyEval_EvalCodeWithName
+   57.45%23.84%  [.] collect 
+   52.89% 0.00%  [.] gc_collect  
+   52.10% 0.08%  [.] _PyFunction_FastCallDict
+   41.99% 0.02%  [.] _PyObject_Call_Prepend  
+   36.37% 0.18%  [.] _PyFunction_FastCallKeywords
+   20.94% 0.07%  [.] _PyObject_FastCallDict  
+   19.64% 0.00%  [.] PyObject_Call   
+   17.74% 0.11%  [.] _PyObject_FastCallKeywords  
+   12.45% 0.00%  [.] slot_tp_call
+   12.27% 4.05%  [.] dict_traverse   
+   11.45%11.04%  [.] visit_reachable 
+   11.18%10.76%  [.] visit_decref
+9.65% 0.11%  [.] type_call   
+8.80% 0.83%  [.] func_traverse   
+7.78% 0.08%  [.] _PyMethodDescr_FastCallKeywords 

Part of the problem is that we run full cyclic GC for every test.  I.e. 
cleanup_test_droppings() calls gc.collect().  Maybe we could make these calls 
conditional on the --pgo flag of regtest.  Or, maybe we need to re-evaluate if 
running the unit test suite is the best way to generate PGO trace data.

Based on a tiny bit of further investigation, it looks like gc_collect() is 
getting called quite a lot of times, in addition to cleanup_test_droppings().  
Maybe there is some low-hanging fruit here for optimization.  Full GC is pretty 
expensive.

--
messages: 336018
nosy: nascheme
priority: normal
severity: normal
status: open
title: PROFILE_TASK for PGO build is not a good workload
type: performance

___
Python tracker 
<https://bugs.python.org/issue36044>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36012] Investigate slow writes to class variables

2019-02-19 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

BTW, 'perf report [...]' has a really neat annotated assembly view.  Scroll to 
the function you are interested in and press 'a'.  Press 't' to toggle the time 
units (left side numbers).  I'm attaching a screenshot of the disassembly of 
the lookdict function.  The time units are sample accounts.  Each count is a 
point where the profiler woke up on that specific instruction.

--
Added file: https://bugs.python.org/file48157/perf-screenshot.png

___
Python tracker 
<https://bugs.python.org/issue36012>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36012] Investigate slow writes to class variables

2019-02-19 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Some profiling using 'perf'.  This is for cpython 
63fa1cfece4912110ce3a0ff11fb3ade3ff5e756.

  children  self
  [...]
+   97.27% 0.00%  run_mod (inlined)
+   88.53% 6.33%  PyObject_SetAttr
+   79.34% 6.80%  type_setattro
+   43.92%43.92%  update_slot
+   26.87% 5.84%  _PyObject_GenericSetAttrWithDict
+   14.62% 6.52%  insertdict
+8.80% 8.80%  lookdict_unicode_nodummy
+6.57% 0.00%  _Py_DECREF (inlined)
+5.19% 5.19%  PyUnicode_InternInPlace
+3.57% 3.57%  _PyObjectDict_SetItem
+3.38% 3.38%  _PyType_Lookup
+1.71% 0.00%  _Py_INCREF (inlined)
+1.42% 0.00%  _Py_XDECREF (inlined)
  [...]

After applying PR 11907:

  children  self
  [...]
+   94.76% 0.00%  run_mod (inlined)
+   75.22% 6.77%  PyObject_SetAttr
+   64.65%13.08%  type_setattro
+   47.51%11.96%  _PyObject_GenericSetAttrWithDict
+   22.99%13.95%  insertdict
+   10.10%10.10%  lookdict_unicode_nodummy
+9.47% 9.47%  PyUnicode_InternInPlace
+7.10% 7.10%  _PyObjectDict_SetItem
+7.02% 7.02%  _PyType_Lookup
+6.52% 0.00%  _Py_DECREF (inlined)
+3.17% 0.00%  _Py_INCREF (inlined)
+3.16% 0.00%  _Py_XDECREF (inlined)
+2.59% 0.00%  PyDict_SetItem (inlined)
+1.50% 0.00%  is_dunder_name (inlined)
  [...]

The PyUnicode_InternInPlace() can mostly be eliminated by testing 
PyUnicode_CHECK_INTERNED() first (we already have called PyUnicode_Check() on 
it).  That only gives a 7% speedup on my machine though.  The is_dunder_name() 
is a much bigger optimization.

--

___
Python tracker 
<https://bugs.python.org/issue36012>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-17 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Some thoughts on this API.

I think we need the "create with exclusive behavior" option, even though we 
don't know how to implement it on Windows right now.  To me, there are two 
cases when calling SharedMemory:

1) You want to create a new shared memory object.  In this case, if your chosen 
name conflicts with an existing name, you should get an error.  If you don't, 
you could be sharing data between unrelated processes.  Chaos will ensue.

2) You want to attach to an existing shared memory object.  You have a name, 
passed to you by some means.  In this case, you don't want to create a new 
object if it doesn't exist.  You should get an error.

I thought there might be a 3rd case where you have "co-equal" processes and any 
one of them could create and the others would attach.  That dosen't seem safe 
though, if you can't ensure a unique name is used.  You might attach to some 
unrelated shared memory object due to a name conflict.  So, I think the API 
should not support this case.

To support 1 & 2, we could just have 'create'.  When true, it would act like 
O_CREX.  When false, you would get an error if the name doesn't already exist.

Regarding 'size', I think it is a bit weird how it currently works.  Maybe 
'size' should only be valid if you are creating a new shared memory object.  If 
you are attaching to an existing one, size would be found from the existing 
object (if that's possible on all platforms).  SharedMemory could grow a 
"resize()" method that lets you enlarge the underlying memory object.  I don't 
know if that's useful in practice so maybe better to leave it out and keep the 
API simple.

Should 'size' be a property that always does fstat() to find the size of the 
underlying file?  Or, should it be the size passed in (for create=True) or the 
size of the file when the mmap is created.  I'm not sure but maybe it is better 
to not always do the fstat().

The 'read_only' keyword doesn't make sense if you care creating a new object 
and need to call ftruncate() on it.  I think the OS will not allow that.  Maybe 
'read_only' should not be allowed if you are creating.  Or, you should have a 
method that takes a read-write object and makes it read-only.

On Linux at least, shm_open() is just a thin layer over open().  So the shared 
memory file is really just a regular file that is stored in /var/run/shm.  If 
you realize that, it seems SharedMemory() is much like 
tempfile.NamedTemporaryFile().  E.g. if you create a NamedTemporaryFile under 
/var/run/shm it will behave like the file descriptor created by SharedMemory(). 
 SharedMemory() objects have the difference that they don't clean themselves 
up.  Also, they don't set the close-on-fork flag on the file descriptor.  Maybe 
they should?  It seems unclear to me how you should avoid cluttering 
/var/run/shm with shared memory objects that people forget to cleanup.  I guess 
the plan is that people need to call unlink() at the right time.  That seems 
error prone though.

--

___
Python tracker 
<https://bugs.python.org/issue35813>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35903] Build of posixshmem.c should probe for required OS functions

2019-02-08 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue35903>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35903] Build of posixshmem.c should probe for required OS functions

2019-02-08 Thread Neil Schemenauer


Neil Schemenauer  added the comment:


New changeset 5741c45acf9b0ce22ff0dbf56322fe0ff16cfcfc by Neil Schemenauer in 
branch 'master':
bpo-35903: Use autoconfig to probe for shm_open() and shm_unlink(). (#11765)
https://github.com/python/cpython/commit/5741c45acf9b0ce22ff0dbf56322fe0ff16cfcfc


--

___
Python tracker 
<https://bugs.python.org/issue35903>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35813] shared memory construct to avoid need for serialization between processes

2019-02-07 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I didn't finish reviewing completely yet but here are some comments.

I think the random filename generation can be pulled out of
_posixshmem.  Make it require that a filename is passed into it.
That moves a fair bit of complexity out of C and into Python.  In
the Python module, I suggest that you could use secrets.token_hex()
to build the filename.  Put the loop that retries because of name
collisions in the Python module as well.

If you like, I can try to make a patch that does the above.

When looking at at how the Python code would handle a name collision,
I see this code:


+switch (errno) {
+case EACCES:
+PyErr_Format(pPermissionsException,
+"No permission to %s this segment",
+(flags & O_TRUNC) ? "truncate" : "access"
+);
+break;
+
+case EEXIST:
+PyErr_SetString(pExistentialException,
+"Shared memory with the specified name already 
exists");
+break;
+
+case ENOENT:
+PyErr_SetString(pExistentialException,
+"No shared memory exists with the specified 
name");
+break;
+
+case EINVAL:
+PyErr_SetString(PyExc_ValueError, "Invalid parameter(s)");
+break;
+
+case EMFILE:
+PyErr_SetString(PyExc_OSError,
+ "This process already has the maximum number 
of files open");
+break;
+
+case ENFILE:
+PyErr_SetString(PyExc_OSError,
+ "The system limit on the total number of open 
files has been reached");
+break;
+
+case ENAMETOOLONG:
+PyErr_SetString(PyExc_ValueError,
+ "The name is too long");
+break;
+
+default:
+PyErr_SetFromErrno(PyExc_OSError);
+break;
+}


I think it might be cleaner just to make it:

PyErr_SetFromErrno(PyExc_OSError);

Then, if you need a customized exception or message, you can
re-raise inside the Python code.  To me, it seems simpler and more
direct to just preserve the errno and always raise OSError.  Changing things
to ValueError means that callers need to look at the message text to 
differentiate between some errno values.

Is it the case that _posixshmem started life as a module that would
be used directly and not something hidden by another layer of
abstraction?  If so, having these customized exceptions and having
it do the filename generation itself makes sense.  However, it is an
internal implementation detail of shared_memory.py, I think it
should be simplified to do only what is needed.  E.g. a thin layer
of the system calls.

--
nosy: +nascheme

___
Python tracker 
<https://bugs.python.org/issue35813>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35903] Build of posixshmem.c should probe for required OS functions

2019-02-06 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Hi Ronald,
I tested with my MacBook but I'm not sure what version of the OS.  Probably the 
latest but I can check when I get home.  Maybe my comment was not clear. On my 
MacBook, you don't need librt for shm_open, etc.  That sounds like it is the 
same as your 10.14.3.

AC_SEARCH_LIBS is the macro I'm using to find if I need to link with librt.  It 
is a bit hard to use though.  It will execute the true clause if the function 
doesn't require any of the specified libraries.  That seems to be the case for 
MacOS.  It also adds -lrt to LIBS and so I have to undo that.

I'm using $ac_cv_search_shm_open test if -lrt is actually needed.

--

___
Python tracker 
<https://bugs.python.org/issue35903>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35903] Build of posixshmem.c should probe for required OS functions

2019-02-05 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I didn't understand autoconfig well enough but I think I fixed it now.  
Apparently MacOS is a platform where it has shm_open but you don't need to link 
with -lrt.  The revised 'configure' should handle that correctly now.

--

___
Python tracker 
<https://bugs.python.org/issue35903>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35903] Build of posixshmem.c should probe for required OS functions

2019-02-05 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
pull_requests: +11719, 11720

___
Python tracker 
<https://bugs.python.org/issue35903>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35903] Build of posixshmem.c should probe for required OS functions

2019-02-05 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
pull_requests: +11719, 11720, 11721

___
Python tracker 
<https://bugs.python.org/issue35903>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35903] Build of posixshmem.c should probe for required OS functions

2019-02-05 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
pull_requests: +11719

___
Python tracker 
<https://bugs.python.org/issue35903>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35903] Build of posixshmem.c should probe for required OS functions

2019-02-05 Thread Neil Schemenauer


New submission from Neil Schemenauer :

The logic in setup.py that determines if _multiprocessing/posixshmem.c should 
get built is not very robust.  I think it is better to use autoconfig to probe 
for the required functions and libraries.  My autoconfig brain cells are a bit 
fuzzy but I think my patch is correct.

I look for shm_open and shm_unlink.  I also check if librt is required for 
these functions.

--
assignee: davin
components: Build
keywords: patch
messages: 334888
nosy: davin, nascheme
priority: normal
severity: normal
stage: patch review
status: open
title: Build of posixshmem.c should probe for required OS functions
type: enhancement
versions: Python 3.8

___
Python tracker 
<https://bugs.python.org/issue35903>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23903] Generate PC/python3.def by scraping headers

2019-01-23 Thread Neil Schemenauer


Change by Neil Schemenauer :


--
nosy: +nascheme

___
Python tracker 
<https://bugs.python.org/issue23903>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22232] str.splitlines splitting on non-\r\n characters

2018-10-07 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I too would prefer a new method name rather than overloading splitlines() with 
more keyword args (passed as hardcoded constants, usually).  Again, I think we 
want:

list(open(..).read().()) == list(open(..))

readlines() returns a list but I think this method should return an iterator 
(seems more Python 3 like to me, call list if you want a list).  In that case, 
iterlines() seems like the right name to me.  I think it should take a 
'newline' keyword that behaves the same as the open() version of the keyword.

--

___
Python tracker 
<https://bugs.python.org/issue22232>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22232] str.splitlines splitting on non-\r\n characters

2018-10-05 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

> Why not simply add a new parameter, to make people who want ASCII linebreaks 
> continue to use .splitlines() ?

That could work but I think in nearly every case you don't want to use 
splitlines() without supplying the parameter.  So, it seems like a bit of trap 
for new users.  Worse, because in Python 2, str.splitlines() does what they 
want, they will do the simple thing which is likely wrong.

If we do stick with just splitlines(), perhaps it should get a 'newline' 
parameter that mostly matches io.open (i.e. it controls universal newline 
behavior).  So if you don't want to change behavior, 
str.splitlines(newline=None) would split as it currently does.  To make it 
split like io files do, you would have to do newline='\n'.

To me, it seems attractive that:
fp.readlines() == fp.read().iterlines()

You suggestion would make it something like:
fp.readlines() == fp.read().splitlines(newline='\n')

I guess I could live with that but it seems unnecessarily ugly and verbose for 
what is the most common usage.

--

___
Python tracker 
<https://bugs.python.org/issue22232>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22232] str.splitlines splitting on non-\r\n characters

2018-10-05 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I've created a topic on this inside the "Ideas" area of discuss.python.org.  
Sorry if that wasn't appropriate, not sure if I should have keep the discussion 
here.

Inada Naoki suggests creating a new method str.iterlines{[keepends]).  Given 
that people are -1 on changing str.splitlines, I think that's a good solution.  
A new method is better yet if it would only split on '\n', that way 
fp.read().iterlines() matches fp.readlines().  It is what people seem to expect 
and is the most handy behaviour.  So, str and bytes would both get the new 
method and they would both split on only '\n'.

If we do that, I think nearly every use of splitlines() should get changed to 
iterlines().

--

___
Python tracker 
<https://bugs.python.org/issue22232>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue22232] str.splitlines splitting on non-\r\n characters

2018-10-05 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

If we introduce a keyword parameter, I think the default of str.splitlines() 
should be changed to match bytes.splitlines (and match Python 2 
str.splitlines()).  I.e. split on \r and \n by default.  I looked through the 
stdline and I can't find any calls that should actually by splitting on the 
extra characters.  I will check it again though.

Does anyone have an example of where the current behaviour is actually wanted?

--
nosy: +nascheme

___
Python tracker 
<https://bugs.python.org/issue22232>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18291] codecs.open interprets FS, RS, GS as line ends

2018-10-05 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I just found bug #22232 myself but thanks for pointing it out.

> changing the behavior unconditionally is not an option

At this point, I disagree.  If I do a search on the web, lots of pages 
referring to str.splitlines() seem it imply that is splits only on \r and \n.  
For Python 2 that was correct.  I think most people would be surprised by the 
Python 3 behaviour.

I looked through the Python stdlib and marked any place str.splitlines() was 
used.  I have more research to do yet but I think nearly all of these cases 
will work better (or perhaps correctly) if str.splitlines is changed.

--

___
Python tracker 
<https://bugs.python.org/issue18291>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18291] codecs.open interprets FS, RS, GS as line ends

2018-10-04 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

New patch that changes str.splitlines to work like Python 2 str.splitlines and 
like Python 3 bytes.splitlines.  Surprisingly, only a few cases in the unit 
test suite fail.  I've fixed them in my patch.

--
Added file: https://bugs.python.org/file47852/str_splitlines.txt

___
Python tracker 
<https://bugs.python.org/issue18291>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18291] codecs.open interprets FS, RS, GS as line ends

2018-10-04 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Some further progress on this.  My patch slows down reading files with the 
codecs module very significantly.  So, I think it could never be merged as is.  
Maybe we would need to implement an alternative str.splitlines that behaves as 
we want, implemented in C.

Looking at the uses of str.splitlines in the stdlib, I can't help but think 
there are many places where this (IMHO bad) behaviour of splitting on all these 
extra controls characters have made it so that splitlines should not be used in 
most cases.  Or, we should change splitlines to work the same as the file 
readlines splitting.

For example, RobotFileParser uses str.splitlines().  I suspect it should only 
be splitting on \n characters.

--

___
Python tracker 
<https://bugs.python.org/issue18291>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18291] codecs.open interprets FS, RS, GS as line ends

2018-10-04 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Attached is a rough patch that tries to fix this problem.  I changed the 
behavior in that unicode char 0x2028 is no longer treated as a line separator.  
It would be trival to change the regex to support that too, if we want to 
preserve backwards compatibility.  Personally, I think readlines() on a codecs 
reader should do that same line splitting as an 'io' file.

If we want to use the patch, the following must yet be done: write tests that 
check the splitting on FS, RS, and GS characters.  Write a news entry.  I 
didn't do any profiling to see what the performance effect of my change is so 
that should be checked too.

--
Added file: https://bugs.python.org/file47851/codecs_splitlines.txt

___
Python tracker 
<https://bugs.python.org/issue18291>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18291] codecs.open interprets FS, RS, GS as line ends

2018-10-04 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I think one bug here is that codecs readers use str.splitlines() internally.  
The splitlines method treats a bunch of different characters as line 
separators, unlike io..readlines().  So, you end up with different 
behavior between doing iter(codecs.getreader(...)) and iter(io.open(...)).

We can argue if str.splitlines() is doing the correct thing, see the table here:
https://docs.python.org/3.8/library/stdtypes.html#str.splitlines

However, it seems clearer to me that readlines() on a codecs reader and on a 
file object should really be splitting lines on the same characters.

--
nosy: +nascheme
versions:  -Python 2.7, Python 3.3, Python 3.4

___
Python tracker 
<https://bugs.python.org/issue18291>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30825] csv.Sniffer does not detect lineterminator

2018-10-04 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

There is another issue related to this.  If you use codecs to get a reader, it 
uses str.splitlines() internally, which treats a bunch of different characters 
as line terminators.  See issue #18291 and:

https://docs.python.org/3.8/library/stdtypes.html#str.splitlines

I was thinking about different ways to fix this.  First, the csv module 
suggests you pass newline='' to the file object.  I suspect most people don't 
know to do that.  So, I thought maybe the csv module should inspect the file 
object that gets passed in and then warn if newline='' has not been used or if 
the file is a codecs reader object.

However, that seems fairly complicated.  Would it be better if we changed the 
'csv' module to do its own line splitting?  I think that would be better 
although I'm not sure about backwards compatibly.  Currently, the reader 
expects to call iter() on the input file.  Would it be okay if it used the 
'read' method of it in preference to using iter()?  It could still fallback to 
iter() if there was no read method.

--
nosy: +nascheme

___
Python tracker 
<https://bugs.python.org/issue30825>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34801] codecs.getreader() splits lines containing control characters

2018-10-04 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Perhaps the 'csv' module should do some sanity checking on the file passed to 
the reader.  The docs recommend that newline='' be used to open the file.  
Maybe 'csv' could check that and warn if its not the case.  I poked around but 
it seems like io files don't have a handy property to check for that.  Further, 
maybe 'csv' could check if the file is a codecs.StreamReader object.  In that 
case, there is no way to turn off the extra newline characters and so that's 
probably a bug.

--

___
Python tracker 
<https://bugs.python.org/issue34801>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34801] codecs.getreader() splits lines containing control characters

2018-10-04 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Thank you for the research. The problem is indeed that \v is getting treated as 
a line separator.  That is an intentional design choice, see:

https://bugs.python.org/issue12855

It would seem to have some surprising implications for CSV parsing.  E.g. if 
someone embeds a \v character in a quoted field, parsing the file using 
codecs.getreader() will cause the field to be split across two rows.

Someone else has run into the same issue:

https://www.enigma.com/blog/the-secret-world-of-newline-characters

I'm not sure anything should be done.  Perhaps we should do something to reduce 
that chances that people trip over this issue.  E.g. if I want to parse a file 
containing Unicode text with the CSV module, how do I do it while allowing \v 
characters (or other new-line like characters other than \n) within fields?

--

___
Python tracker 
<https://bugs.python.org/issue34801>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34850] Emit a syntax warning for "is" with a literal

2018-10-04 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

> The problem with a SyntaxWarning is that the wrong people will see it. It 
> gets in the way of users of applications that happen to be written in Python.

Turn the check on only when PYTHONDEVMODE is set?  Seems like it solves the 
issue with the wrong people seeing the warning.

--
nosy: +nascheme

___
Python tracker 
<https://bugs.python.org/issue34850>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34867] Add mode to disable small integer and interned string caches

2018-10-04 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Woudn't turning these off hurt performance a lot?  If so, I don't know if 
people would actually use such a mode.  Then it becomes pretty useless.  Could 
we combine this idea with the PYTHONDEVMODE flag?  If PYTHONDEVMODE is turned 
on, we could do a check like Serhiy suggests for inappropriate 'is' 
comparisons.  That seems more useful to me.

--
nosy: +nascheme

___
Python tracker 
<https://bugs.python.org/issue34867>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34822] Simplify AST for slices

2018-10-03 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Hello Serhiy,

I've not reviewed the patch but I trust that if you say it simplifies things, 
that's so.  Perhaps the important question is if it is okay to change the AST 
in backwards incompatible ways within 3.x releases.  As a library author who 
could be affected, I think it is okay. The AST is mostly an implementation 
detail and should not required to maintain compatibility.  We expose it via the 
'ast' module for low level tools that want to manipulate it.  It is up to those 
users to handle backwards incompatible changes.

That said, we shouldn't just change it for trivial reasons.  That just causes 
work for 3rd party libraries.  Removing 400 lines of code seems like sufficient 
reason.

--

___
Python tracker 
<https://bugs.python.org/issue34822>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34880] About the "assert" bytecode

2018-10-03 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

Assuming it is not crazy complicated to fix, I would like to to be changed.  It 
should work like the TypeError example.  That fact that it has worked the 
current way since Python 1.5 isn't a strong argument.  I think no one should be 
surprised if it changes.  Also, expecting other Python implementations to match 
the LOAD_GLOBAL behavior seems to put unnecessary burden on them.  If someone 
writes code that relies on that behavior, they should not be surprised if it 
gets broken, IMHO.

--
nosy: +nascheme
status: pending -> open

___
Python tracker 
<https://bugs.python.org/issue34880>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue34801] codecs.getreader() splits lines containing control characters

2018-09-25 Thread Neil Schemenauer


New submission from Neil Schemenauer :

This seems to be a bug in codecs.getreader().  io.TextIOWrapper(fp, encoding) 
works correctly.

--
files: codecs_bug.py
messages: 326382
nosy: nascheme
priority: low
severity: normal
status: open
title: codecs.getreader() splits lines containing control characters
type: behavior
Added file: https://bugs.python.org/file47823/codecs_bug.py

___
Python tracker 
<https://bugs.python.org/issue34801>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32892] Remove specific constant AST types in favor of ast.Constant

2018-09-24 Thread Neil Schemenauer


Neil Schemenauer  added the comment:

I've checked Quixote PTL module support.  I will have to make some changes to 
support the removal of Str and other node types.  E.g. I have to change 
visit_Str to visit_Constant.  The fixes are pretty easy though and I think it 
is reasonable that this kind of AST manipulation is dependent on Python version 
and may get broken between releases.

--

___
Python tracker 
<https://bugs.python.org/issue32892>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   3   >