Re: [Python-Dev] PEP 557: Data Classes

2017-09-18 Thread Glenn Linderman

On 9/18/2017 11:37 AM, Ethan Furman wrote:

On 09/11/2017 03:28 PM, Guido van Rossum wrote:

Oddly I don't like the enum (flag names get too long that way), but I 
do agree with everything else Barry said (it

should be a trivalue flag and please don't name it cmp).


Hmmm, named constants are one of the motivating factors for having an 
Enum type.  It's easy to keep the name a reasonable length, however: 
export them into the module-level namespace.  re is an excellent 
example; the existing flags were moved into a FlagEnum, and then (for 
backwards compatibility) aliased back to the module level:

snip

Likewise, if we had:

    class Compare(enum.Enum):
    NONE = 'each instance is an island'
    EQUAL = 'instances can be equal to each other'
    ORDERED = 'instances can be ordered and/or equal'
    globals().update(Compare.__members__)

then we can still use, for example, EQUAL, but get the more 
informative repr and str when we need to.


I tend to like the Enum also. But maybe this is a case where there 
should be multiple different decorators:


@dataclass
@equality_dataclass
@ordered_datatclass
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Antoine Pitrou
On Mon, 18 Sep 2017 13:25:47 -0700
Nathaniel Smith  wrote:
> >> If it is then it might make sense to look at the cycle collection
> >> heuristics; IIRC they're based on a fairly naive count of how many
> >> allocations have been made, without regard to their size.  
> >
> > Yes... But just because a lot of memory has been allocated isn't a good
> > enough heuristic to launch a GC collection.  
> 
> I'm not an expert on GC at all, but intuitively it sure seems like
> allocation size might be a useful piece of information to feed into a
> heuristic. Our current heuristic is just, run a small collection after
> every 700 allocations, run a larger collection after 10 smaller
> collections.

Ok, so there are two useful things a heuristic may try to guess:
1. how much CPU time a (full) collection will cost
2. how much memory it might help release

While #1 is quite realistically predictable (because a full collection
is basically O(number of allocated objects)), #2 is entirely
unpredictable and you can really only make out an upper bound
(trivially, the GC cannot release any more memory than has been
allocated :-)).

Our current heuristic is geared at estimating #1, so that the
proportional time taken by GC runs in any long-running program is
roughly bounded.

> Every heuristic has problematic cases, that's why we call it a
> heuristic :-). But somehow every other GC language manages to do
> well-enough without refcounting...

I don't know, define "well enough"? :-) For example, Java is well-known
for requiring manual tuning of GC parameters for heavy workloads (or
at least it was).

I would be curious how well Java deals when someone does numerical
crunching on large chunks of data, then throws that data away, for
example.

Apparently R also uses a garbage collector, and ironically you can find
places on the Internet where people recommend you call gc() to avoid
thrashing memory.
https://stackoverflow.com/questions/8813753/what-is-the-difference-between-gc-and-rm

On CPython at least, once you lose the last reference to your
multi-gigabyte Numpy array, memory is returned immediately...

> I think they mostly have more
> sophisticated heuristics than CPython, though.

I'm sure some of them do.  They also tend to have many more man-hours
available than we do to optimize the hell out of their GC.

> > Perhaps we could special-case tracebacks somehow, flag when a traceback
> > remains alive after the implicit "del" clause at the end of an "except"
> > block, then maintain some kind of linked list of the flagged tracebacks
> > and launch specialized GC runs to find cycles accross that collection.
> > That sounds quite involved, though.  
> 
> We already keep a list of recently allocated objects and have a
> specialized GC that runs across just that collection. That's what
> generational GC is :-).

Well, you can cross your fingers that the traceback is still in the
youngest generations when it finally becomes collectable.  But if the
traceback stays alive for too long, it ends up in the oldest generation
and a long time can pass before it gets collected.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Nathaniel Smith
On Mon, Sep 18, 2017 at 10:59 AM, Antoine Pitrou  wrote:
> Le 18/09/2017 à 19:53, Nathaniel Smith a écrit :
>>>
 Why are reference cycles a problem that needs solving?
>>>
>>> Because sometimes they are holding up costly resources in memory when
>>> people don't expect them to.  Such as large Numpy arrays :-)
>>
>> Do we have any reason to believe that this is actually happening on a
>> regular basis though?
>
> Define "regular" :-)  We did get some reports on dask/distributed about it.

Caused by uncollected cycles involving tracebacks? I looked here:

  
https://github.com/dask/distributed/issues?utf8=%E2%9C%93&q=is%3Aissue%20memory%20leak

and saw some issues with cycles causing delayed collection (e.g. #956)
or the classic memory leak problem of explicitly holding onto data you
don't need any more (e.g. #1209, bpo-29861), but nothing involving
traceback cycles. It was just a quick skim though.

>> If it is then it might make sense to look at the cycle collection
>> heuristics; IIRC they're based on a fairly naive count of how many
>> allocations have been made, without regard to their size.
>
> Yes... But just because a lot of memory has been allocated isn't a good
> enough heuristic to launch a GC collection.

I'm not an expert on GC at all, but intuitively it sure seems like
allocation size might be a useful piece of information to feed into a
heuristic. Our current heuristic is just, run a small collection after
every 700 allocations, run a larger collection after 10 smaller
collections.

> What if that memory is
> gonna stay allocated for a long time?  Then you're frequently launching
> GC runs for no tangible result except more CPU consumption and frequent
> pauses.

Every heuristic has problematic cases, that's why we call it a
heuristic :-). But somehow every other GC language manages to do
well-enough without refcounting... I think they mostly have more
sophisticated heuristics than CPython, though. Off the top of my head,
I know PyPy's heuristic involves the ratio of the size of nursery
objects versus the size of the heap, and JVMs do much cleverer things
like auto-tuning nursery size to make empirical pause times match some
target.

> Perhaps we could special-case tracebacks somehow, flag when a traceback
> remains alive after the implicit "del" clause at the end of an "except"
> block, then maintain some kind of linked list of the flagged tracebacks
> and launch specialized GC runs to find cycles accross that collection.
> That sounds quite involved, though.

We already keep a list of recently allocated objects and have a
specialized GC that runs across just that collection. That's what
generational GC is :-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Martin (gzlist) via Python-Dev
Thanks for working on this and writing up the details Victor.

For those confused about why this matters, routinely having every
object in your application participating in one (or more) giant
reference cycles makes reasoning about and fixing resource issues very
difficult.

For instance, a while back I made some changes in Bazaar so that each
TestCase instance was dereferenced after being run:



Which mattered when running ~25000 tests, to keep peak memory usage
sane. Also there were various other object lifetime issues, and
tracking down which specific test failed to join a thread, or close a
file on Windows was much simpler after making sure cleanup actually
happened at the time a test was deleted.

Keeping more stuff alive for longer periods also makes peak memory
requirements higher, and the gc has to do more work each time.

On 18/09/2017, Victor Stinner  wrote:
>
> Ideally, CPython 3.x should never create reference cycles. Removing
> Exception.__traceback__ is the obvious "fix" for the issue. But I
> expect that slowly, a lot of code started to rely on the attribute,
> maybe even for good reasons :-)
>
> A more practical solution would be to log a warning. Maybe the garbage
> collector can emit a warning if it detects an exception part of a
> reference cycle? Or maybe detect frames?

Logging a warning is unlikely to be practical. I had an optional test
flag that complained about reference cycles, and it produced a lot of
output.

Martin
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 557: Data Classes

2017-09-18 Thread Ethan Furman

On 09/11/2017 03:28 PM, Guido van Rossum wrote:


Oddly I don't like the enum (flag names get too long that way), but I do agree 
with everything else Barry said (it
should be a trivalue flag and please don't name it cmp).


Hmmm, named constants are one of the motivating factors for having an Enum type.  It's easy to keep the name a 
reasonable length, however: export them into the module-level namespace.  re is an excellent example; the existing flags 
were moved into a FlagEnum, and then (for backwards compatibility) aliased back to the module level:


class RegexFlag(enum.IntFlag):
ASCII = sre_compile.SRE_FLAG_ASCII # assume ascii "locale"
IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case
LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale
UNICODE = sre_compile.SRE_FLAG_UNICODE # assume unicode "locale"
MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for 
newline
DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline
VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments
A = ASCII
I = IGNORECASE
L = LOCALE
U = UNICODE
M = MULTILINE
S = DOTALL
X = VERBOSE
# sre extensions (experimental, don't rely on these)
TEMPLATE = sre_compile.SRE_FLAG_TEMPLATE # disable backtracking
T = TEMPLATE
DEBUG = sre_compile.SRE_FLAG_DEBUG # dump pattern after compilation
globals().update(RegexFlag.__members__)

So we can still do re.I instead of re.RegexFlag.I.

Likewise, if we had:

class Compare(enum.Enum):
NONE = 'each instance is an island'
EQUAL = 'instances can be equal to each other'
ORDERED = 'instances can be ordered and/or equal'
globals().update(Compare.__members__)

then we can still use, for example, EQUAL, but get the more informative repr 
and str when we need to.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Antoine Pitrou
Le 18/09/2017 à 19:53, Nathaniel Smith a écrit :
>>
>>> Why are reference cycles a problem that needs solving?
>>
>> Because sometimes they are holding up costly resources in memory when
>> people don't expect them to.  Such as large Numpy arrays :-)
> 
> Do we have any reason to believe that this is actually happening on a
> regular basis though?

Define "regular" :-)  We did get some reports on dask/distributed about it.

> If it is then it might make sense to look at the cycle collection
> heuristics; IIRC they're based on a fairly naive count of how many
> allocations have been made, without regard to their size.

Yes... But just because a lot of memory has been allocated isn't a good
enough heuristic to launch a GC collection.  What if that memory is
gonna stay allocated for a long time?  Then you're frequently launching
GC runs for no tangible result except more CPU consumption and frequent
pauses.

Perhaps we could special-case tracebacks somehow, flag when a traceback
remains alive after the implicit "del" clause at the end of an "except"
block, then maintain some kind of linked list of the flagged tracebacks
and launch specialized GC runs to find cycles accross that collection.
That sounds quite involved, though.

> The issue that Victor ran into with
> socket.create_connection is a special case where that function saves
> off the caught exception to use later.

That's true... but you can find such special cases in an assorted bunch
of library functions (stdlib or third-party), not just
socket.create_connection().  Fixing them one by one is always possible,
but it's a never-ending battle.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Nathaniel Smith
On Mon, Sep 18, 2017 at 9:50 AM, Antoine Pitrou  wrote:
> On Mon, 18 Sep 2017 09:42:45 -0700
> Nathaniel Smith  wrote:
>>
>> Obviously it's nice when the refcount system is able to implicitly clean
>> things up in a prompt and deterministic way, but there are already tools to
>> handle the cases where it doesn't (ResourceWarning, context managers, ...),
>> and the more we encourage people to implicitly rely on refcounting, [...]
>
> The thing is, we don't need to encourage them.  Having objects disposed
> of when the last visible reference vanishes is a pretty common
> expectation people have when using CPython.
>
>> Why are reference cycles a problem that needs solving?
>
> Because sometimes they are holding up costly resources in memory when
> people don't expect them to.  Such as large Numpy arrays :-)

Do we have any reason to believe that this is actually happening on a
regular basis though?

If it is then it might make sense to look at the cycle collection
heuristics; IIRC they're based on a fairly naive count of how many
allocations have been made, without regard to their size.

> And, no, there are no obvious ways to fix for users.  gc.collect()
> is much too costly to be invoked on a regular basis.
>
>> Because if so then that seems
>> like a bug in the warnings mechanism; there's no harm in a dead Thread
>> hanging around until collected, and Victor may have wasted a day debugging
>> an issue that wasn't a problem in the first place...
>
> Yes, I think Victor is getting a bit overboard with the so-called
> "dangling thread" issue.  But the underlying issue (that heavyweight
> resources can be inadvertently held up in memory up just because some
> unrelated exception was caught and silenced along the way) is a real
> one.

Simply catching and silencing exceptions doesn't create any loops -- if you do

try:
raise ValueError
except ValueError as exc:
raise exc

then there's no loop, because the 'exc' local gets cleared as soon as
you exit the except: block. The issue that Victor ran into with
socket.create_connection is a special case where that function saves
off the caught exception to use later.

If someone wanted to replace socket.create_connection's 'raise err' with

try:
raise err
finally:
del err

then I guess that would be pretty harmless...

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Antoine Pitrou
On Mon, 18 Sep 2017 09:42:45 -0700
Nathaniel Smith  wrote:
> 
> Obviously it's nice when the refcount system is able to implicitly clean
> things up in a prompt and deterministic way, but there are already tools to
> handle the cases where it doesn't (ResourceWarning, context managers, ...),
> and the more we encourage people to implicitly rely on refcounting, [...]

The thing is, we don't need to encourage them.  Having objects disposed
of when the last visible reference vanishes is a pretty common
expectation people have when using CPython.

> Why are reference cycles a problem that needs solving?

Because sometimes they are holding up costly resources in memory when
people don't expect them to.  Such as large Numpy arrays :-)

And, no, there are no obvious ways to fix for users.  gc.collect()
is much too costly to be invoked on a regular basis.

> Because if so then that seems
> like a bug in the warnings mechanism; there's no harm in a dead Thread
> hanging around until collected, and Victor may have wasted a day debugging
> an issue that wasn't a problem in the first place...

Yes, I think Victor is getting a bit overboard with the so-called
"dangling thread" issue.  But the underlying issue (that heavyweight
resources can be inadvertently held up in memory up just because some
unrelated exception was caught and silenced along the way) is a real
one.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Nathaniel Smith
On Sep 18, 2017 07:58, "Antoine Pitrou"  wrote:


Le 18/09/2017 à 16:52, Guido van Rossum a écrit :
>
> In Python 2 the traceback was not part of the exception object because
> there was (originally) no cycle GC. In Python GC we changed the awkward
> interface to something more useful, because we could depend on GC. Why
> are we now trying to roll back this feature? We should just improve GC.
> (Or perhaps you shouldn't be raising so many exceptions. :-)

Improving the GC is obviously a good thing, but what heuristic would you
have in mind that may solve the issue at hand?


I read the whole thread and I'm not sure what the issue at hand is :-).
Obviously it's nice when the refcount system is able to implicitly clean
things up in a prompt and deterministic way, but there are already tools to
handle the cases where it doesn't (ResourceWarning, context managers, ...),
and the more we encourage people to implicitly rely on refcounting, the
harder it is to optimize the interpreter or use alternative language
implementations. Why are reference cycles a problem that needs solving?

Actually the bit that I find most confusing about Victor's story is, how
can a traceback frame keep a thread alive? Is the problem that "dangling
thread" warnings are being triggered by threads that are finished and dead
but their Thread objects are still allocated? Because if so then that seems
like a bug in the warnings mechanism; there's no harm in a dead Thread
hanging around until collected, and Victor may have wasted a day debugging
an issue that wasn't a problem in the first place...

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why aren't decorators just expressions?

2017-09-18 Thread Éric Araujo
  Hello,

Le 2017-09-16 à 07:22, Serhiy Storchaka a écrit :
> 16.09.17 12:39, Larry Hastings пише:
>> So why don't decorators allow arbitrary expressions?  [...]
> 
> Actually I remember somebody raised this question a year or two ago,> but 
> don't remember details.

  The discussion I remember was https://bugs.python.org/issue19660

  Guido said: «Nobody has posted a real use case. All the examples are
toys. What are the real use cases that are blocked by this? Readability
counts!»

  Cheers
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Antoine Pitrou

Le 18/09/2017 à 16:52, Guido van Rossum a écrit :
> 
> In Python 2 the traceback was not part of the exception object because
> there was (originally) no cycle GC. In Python GC we changed the awkward
> interface to something more useful, because we could depend on GC. Why
> are we now trying to roll back this feature? We should just improve GC.
> (Or perhaps you shouldn't be raising so many exceptions. :-)

Improving the GC is obviously a good thing, but what heuristic would you
have in mind that may solve the issue at hand?

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Guido van Rossum
On Mon, Sep 18, 2017 at 2:48 AM, Antoine Pitrou  wrote:

> On Mon, 18 Sep 2017 11:31:12 +0200
> Victor Stinner  wrote:
> >
> > Ideally, CPython 3.x should never create reference cycles. Removing
> > Exception.__traceback__ is the obvious "fix" for the issue. But I
> > expect that slowly, a lot of code started to rely on the attribute,
> > maybe even for good reasons :-)
>
> The real issue is not reference cycles, but the fact that a traceback
> frame keeps all locals alive.  When the frame's execution is finished,
> that information is practically almost always useless, but is kept for
> the rare cases of debugging (and perhaps dubious features such as
> some of py.test's magic).  So we're constantly paying the price of this
> minor, albeit probably useful (*), debugging-related feature with
> annoying object lifetime issues that are sometimes quite difficult to
> track down and solve.
>
> (*) (I've hardly ever used pdb so I don't know if people often examine
> the locals of finished frames, but I suppose some do)
>

Yes, that's what `pdb.pm()` is for.


> Perhaps it would be useful to have a way to tell the interpreter to
> automatically clear all locals on frames when they have finished
> executing.  It could be a thread-local setting (or, better, a PEP 550
> context variable setting).
>

In Python 2 the traceback was not part of the exception object because
there was (originally) no cycle GC. In Python GC we changed the awkward
interface to something more useful, because we could depend on GC. Why are
we now trying to roll back this feature? We should just improve GC. (Or
perhaps you shouldn't be raising so many exceptions. :-)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Martin Panter
On 18 September 2017 at 09:31, Victor Stinner  wrote:
> Last years, I fixed many reference cycles in various parts of the
> Python 3 standard library. Sometimes, it takes years to become aware
> of the reference cycle and finally fix it.
>
> For example, recently, I worked on fixing all "dangling threads"
> leaked by tests of the Python test suite, and I found and fixed many
> reference cycles which probably existed since Python 3 was created
> (forked from Python 2)

My instinct is to suggest to make the “dangling threads” test tolerate
Thread objects that are no longer running. Last time I looked, it
tests a list of weak references to Thread objects. But maybe it should
check “Thread.is_alive” or use “threading.enumerate”.

> . . .
>
> Ideally, CPython 3.x should never create reference cycles. Removing
> Exception.__traceback__ is the obvious "fix" for the issue. But I
> expect that slowly, a lot of code started to rely on the attribute,
> maybe even for good reasons :-)
>
> A more practical solution would be to log a warning. Maybe the garbage
> collector can emit a warning if it detects an exception part of a
> reference cycle? Or maybe detect frames?

The “gc” module can do some of this already. In the past (when __del__
used to prevent garbage collection) I used DEBUG_SAVEALL to and
graphed the result to figure out where code was creating reference
cycles. Or maybe if you turned on DEBUG_COLLECTABLE and filtered the
objects printed out, or played with the new “gc.callbacks” API.

> If the GC cannot do it, maybe we might use a debug thread (enabled
> manually) which checks manually if an exception is part of a reference
> cycle using gc.get_objects(): check if an exception remains alive
> longer than X seconds? I had the same idea for asyncio, to detect
> reference cycles or if a task is never "awaited", but I never
> implemented the idea.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Nick Coghlan
On 18 September 2017 at 20:52, Antoine Pitrou  wrote:
> On Mon, 18 Sep 2017 20:35:02 +1000
> Nick Coghlan  wrote:
>> Rather than being thread local or context local state, whether or not
>> to keep the full frames in the traceback could be a yet another
>> setting on the *exception* object, whereby we tweaked the logic that
>> drops the reference at the end of an except clause as follows:
>
> That doesn't solve the problem, since the issue is that exceptions can
> be raised (and then silenced) in many places, and you don't want such
> exception-raising code (such as socket.create_connection) to start
> having to set an option on the exceptions it raises.

Bleh, in trying to explain why my proposal would be sufficient to
break the problematic cycles, I realised I was wrong: if we restrict
the frame clearing to already terminated frames (as would be necessary
to avoid breaking any still executing functions), then that means we
won't clear the frame running the exception handler, and that's the
frame that creates the problematic cyclic reference.

However, I still think it makes more sense to focus on the semantics
of preserving an exception beyond the life of the stack being unwound,
rather than on the semantics of raising the exception in the first
place.

In the usual case, the traceback does keep the whole stack alive while
the stack is being unwound, but then the exception gets thrown away at
the end when sys.exc_info() gets reset back to (None, None, None), and
then all the frames still get cleaned up fairly promptly (this is also
the case in Python 2).

We only get problems when one of the exception handlers in the stack
grabs an additional reference to either the traceback or the exception
and hence creates a persistent cyclic reference from one of the frames
back to itself. The difference in Python 3 is that saving the
exception is reasonably common, while explicitly saving the traceback
is relatively rare, so the "exc.__traceback__" is keeping tracebacks
alive that would otherwise have been cleaned up more
deterministically.

Putting the problem that way gave me an idea, though: what if, when
the interpreter was setting "sys.exc_info()" back to (None, None,
None) (or otherwise dropping an exception instance from being the
"currently active exception") it automatically set exc.__traceback__
to None?

That way, if you wanted the *traceback* (rather than just the
exception) to live beyond the stack being unwound, you'd have to
preserve the entire sys.exc_info() triple (or at least save
"exc.__traceback__" separately from "exc"). By doing that, we'd have
the opportunity to encourage folks that are considering preserving the
entire traceback to extract a TracebackException instead and some
themselves from some potentially nasty reference management issues:
https://docs.python.org/3/library/traceback.html#tracebackexception-objects

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Antoine Pitrou
On Mon, 18 Sep 2017 20:35:02 +1000
Nick Coghlan  wrote:
> On 18 September 2017 at 20:18, Victor Stinner  
> wrote:
> > 2017-09-18 12:07 GMT+02:00 Nick Coghlan :  
> >> I wonder if it might be reasonable to have tracebacks only hold a weak
> >> reference to their frame objects when "__debug__ == False".  
> >
> > Please don't change the Python behaviour depending on __debug__, or it
> > will be a nightmare to debug it :-( ("Why does it work on my
> > computer?")  
> 
> Yeah, that's always a challenge :)
> 
> Rather than being thread local or context local state, whether or not
> to keep the full frames in the traceback could be a yet another
> setting on the *exception* object, whereby we tweaked the logic that
> drops the reference at the end of an except clause as follows:

That doesn't solve the problem, since the issue is that exceptions can
be raised (and then silenced) in many places, and you don't want such
exception-raising code (such as socket.create_connection) to start
having to set an option on the exceptions it raises.

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Antoine Pitrou
On Mon, 18 Sep 2017 20:07:42 +1000
Nick Coghlan  wrote:

> On 18 September 2017 at 19:48, Antoine Pitrou  wrote:
> > On Mon, 18 Sep 2017 11:31:12 +0200
> > Victor Stinner  wrote:  
> >>
> >> Ideally, CPython 3.x should never create reference cycles. Removing
> >> Exception.__traceback__ is the obvious "fix" for the issue. But I
> >> expect that slowly, a lot of code started to rely on the attribute,
> >> maybe even for good reasons :-)  
> >
> > The real issue is not reference cycles, but the fact that a traceback
> > frame keeps all locals alive.  When the frame's execution is finished,
> > that information is practically almost always useless, but is kept for
> > the rare cases of debugging (and perhaps dubious features such as
> > some of py.test's magic).  
> 
> As another use case, IPython tracebacks will include details of
> referenced local variables, and
> https://docs.python.org/3/library/traceback.html#tracebackexception-objects
> offers the ability to readily capture the repr() of locals referenced
> from a traceback for the same reason.

True... but interactive use has different concerns than production
use (hence my proposition of a setting to change frame cleanup
behaviour). Besides, IPython also stores the history of displayed
values, which is another cause of keeping objects alive :-)

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 554 v3 (new interpreters module)

2017-09-18 Thread Antoine Pitrou

Hi,

First my high-level opinion about the PEP: the CSP model can probably
be already implemented using Queues.  To me, the interesting promise of
subinterpreters is if they allow to remove the GIL while sharing memory
for big objects (such as Numpy arrays).  This means the PEP should
probably focus on potential concurrency improvements rather than try to
faithfully follow the CSP model.

Other than that, a bunch of detailed comments follow:

On Wed, 13 Sep 2017 18:44:31 -0700
Eric Snow  wrote:
> 
> API for interpreters
> 
> 
> The module provides the following functions:
> 
> ``list_all()``::
> 
>Return a list of all existing interpreters.

See my naming proposal in the previous thread.

> 
>run(source_str, /, **shared):
> 
>   Run the provided Python source code in the interpreter.  Any
>   keyword arguments are added to the interpreter's execution
>   namespace.

"Execution namespace" specifically means the __main__ module in the
target interpreter, right?

>  If any of the values are not supported for sharing
>   between interpreters then RuntimeError gets raised.  Currently
>   only channels (see "create_channel()" below) are supported.
> 
>   This may not be called on an already running interpreter.  Doing
>   so results in a RuntimeError.

I would distinguish between both error cases: RuntimeError for calling
run() on an already running interpreter, ValueError for values which
are not supported for sharing.

>   Likewise, if there is any uncaught
>   exception, it propagates into the code where "run()" was called.

That makes it a bit harder to differentiate with errors raised by run()
itself (see above), though how much of an annoyance this is remains
unclear.  The more litigious implication, though, is that it forces the
interpreter to support migration of arbitrary objects from one
interpreter to another (since a traceback keeps all local variables
alive).

> API for sharing data
> 
> 
> The mechanism for passing objects between interpreters is through
> channels.  A channel is a simplex FIFO similar to a pipe.  The main
> difference is that channels can be associated with zero or more
> interpreters on either end.

So it seems channels have become more complicated now?  Is it important
to support multi-producer multi-consumer channels?

>  Unlike queues, which are also many-to-many,
> channels have no buffer.

How does it work?  Does send() block until someone else calls recv()?
That does not sound like a good idea to me.  I don't think it's a
coincidence that the most varied kinds of I/O (from socket or file IO
to threading Queues to multiprocessing Pipes) have non-blocking send().

send() blocking until someone else calls recv() is not only bad for
performance, it also increases the likelihood of deadlocks.

>recv_nowait(default=None):
> 
>   Return the next object from the channel.  If none have been sent
>   then return the default.  If the channel has been closed
>   then EOFError is raised.
> 
>close():
> 
>   No longer associate the current interpreter with the channel (on
>   the receiving end).  This is a noop if the interpreter isn't
>   already associated.  Once an interpreter is no longer associated
>   with the channel, subsequent (or current) send() and recv() calls
>   from that interpreter will raise EOFError.

EOFError normally means the *other* (sending) side has closed the
channel (but it becomes complicated with a multi-producer multi-consumer
setup...). When *this* side has closed the channel, we should raise
ValueError.

>  The Python runtime
>   will garbage collect all closed channels.  Note that "close()" is
>   automatically called when it is no longer used in the current
>   interpreter.

"No longer used" meaning it loses all references in this interpreter?

>send(obj):
> 
>Send the object to the receiving end of the channel.  Wait until
>the object is received.  If the channel does not support the
>object then TypeError is raised.  Currently only bytes are
>supported.  If the channel has been closed then EOFError is
>raised.

Similar remark as above (EOFError vs. ValueError).
More generally, send() raising EOFError sounds unheard of.

A sidenote: context manager support (__enter__ / __exit__) on channels
would sound more useful to me than iteration support.

> Initial support for buffers in channels
> ---
> 
> An alternative to support for bytes in channels in support for
> read-only buffers (the PEP 3119 kind).

Probably you mean PEP 3118.

> Then ``recv()`` would return
> a memoryview to expose the buffer in a zero-copy way.

It will probably not do much if you only can pass buffers and not
structured objects, because unserializing (e.g. unpickling) from a
buffer will still copy memory around.

To pass a Numpy array, for example, you not only need to pa

Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Nick Coghlan
On 18 September 2017 at 20:18, Victor Stinner  wrote:
> 2017-09-18 12:07 GMT+02:00 Nick Coghlan :
>> I wonder if it might be reasonable to have tracebacks only hold a weak
>> reference to their frame objects when "__debug__ == False".
>
> Please don't change the Python behaviour depending on __debug__, or it
> will be a nightmare to debug it :-( ("Why does it work on my
> computer?")

Yeah, that's always a challenge :)

Rather than being thread local or context local state, whether or not
to keep the full frames in the traceback could be a yet another
setting on the *exception* object, whereby we tweaked the logic that
drops the reference at the end of an except clause as follows:

1. If the exception ref count == 1, just drop the reference and let
the traceback die with it
2. If the exception ref count > 1, check for a new "__keep_locals__" attribute
3. If it's true, just drop the reference and otherwise leave the exception alone
4. If it's false, recursively clear all the already terminated frames
in __traceback__, __context__, and __cause__

For now, we'd default to "__keep_locals__ = True", but we'd attempt to
figure out some way to migrate to the default being "__keep_locals__ =
False".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Victor Stinner
2017-09-18 12:07 GMT+02:00 Nick Coghlan :
> I wonder if it might be reasonable to have tracebacks only hold a weak
> reference to their frame objects when "__debug__ == False".

Please don't change the Python behaviour depending on __debug__, or it
will be a nightmare to debug it :-( ("Why does it work on my
computer?")

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Nick Coghlan
On 18 September 2017 at 19:48, Antoine Pitrou  wrote:
> On Mon, 18 Sep 2017 11:31:12 +0200
> Victor Stinner  wrote:
>>
>> Ideally, CPython 3.x should never create reference cycles. Removing
>> Exception.__traceback__ is the obvious "fix" for the issue. But I
>> expect that slowly, a lot of code started to rely on the attribute,
>> maybe even for good reasons :-)
>
> The real issue is not reference cycles, but the fact that a traceback
> frame keeps all locals alive.  When the frame's execution is finished,
> that information is practically almost always useless, but is kept for
> the rare cases of debugging (and perhaps dubious features such as
> some of py.test's magic).

As another use case, IPython tracebacks will include details of
referenced local variables, and
https://docs.python.org/3/library/traceback.html#tracebackexception-objects
offers the ability to readily capture the repr() of locals referenced
from a traceback for the same reason.

>  So we're constantly paying the price of this
> minor, albeit probably useful (*), debugging-related feature with
> annoying object lifetime issues that are sometimes quite difficult to
> track down and solve.
>
> (*) (I've hardly ever used pdb so I don't know if people often examine
> the locals of finished frames, but I suppose some do)
>
> Perhaps it would be useful to have a way to tell the interpreter to
> automatically clear all locals on frames when they have finished
> executing.  It could be a thread-local setting (or, better, a PEP 550
> context variable setting).

I wonder if it might be reasonable to have tracebacks only hold a weak
reference to their frame objects when "__debug__ == False".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Victor Stinner
Hum, my email is long. Maybe an short example is simpler to understand:
---
import gc

class VerboseDestructor:
# Imagine an object using many limited resources like memory,
# sockets and/or files
def __del__(self):
print("VerboseDestructor")

def create_ref_cycle():
obj = VerboseDestructor()
# Create a reference cycle using an exception
# to keep the frame alive
try:
raise ValueError
except ValueError as exc:
err = exc

print("create_ref_cycle")
create_ref_cycle()
print("gc.collect")
gc.collect()
print("exit")
---

Output:
---
create_ref_cycle
gc.collect
VerboseDestructor
exit
---

create_ref_cycle() function creates a reference cycle causing the
"obj" variable to be kept alive after create_ref_cycle() exit.

Let's say that the "leaked" object uses a lot of memory. The bug can
now be called a memory leak.

Let's say that the leaked object uses a file. Now the risk is that the
file cannot be removed anymore on Windows (you cannot remove an open
file), or that data is not properly flushed on disk, or that the same
file is re-opened by a different function in a different thread and so
you might corrupt data, etc.

It's not hard to imagine many various issues caused by an object kept
alive longer than expected.

Well, you *can* modify your code to add a VerboseDestructor.close()
method and explicitly releases resources as soon as possible in
create_ref_cycle(). Maybe by adding support for the context manager
protocol to VerboseDestructor and use "with obj:" in
create_ref_cycle().

But here you expect that you are already aware of the reference cycle.

Moreover, Python 3 haters can complain that the code worked perfectly
fine in Python 2...

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Antoine Pitrou
On Mon, 18 Sep 2017 11:31:12 +0200
Victor Stinner  wrote:
> 
> Ideally, CPython 3.x should never create reference cycles. Removing
> Exception.__traceback__ is the obvious "fix" for the issue. But I
> expect that slowly, a lot of code started to rely on the attribute,
> maybe even for good reasons :-)

The real issue is not reference cycles, but the fact that a traceback
frame keeps all locals alive.  When the frame's execution is finished,
that information is practically almost always useless, but is kept for
the rare cases of debugging (and perhaps dubious features such as
some of py.test's magic).  So we're constantly paying the price of this
minor, albeit probably useful (*), debugging-related feature with
annoying object lifetime issues that are sometimes quite difficult to
track down and solve.

(*) (I've hardly ever used pdb so I don't know if people often examine
the locals of finished frames, but I suppose some do)

Perhaps it would be useful to have a way to tell the interpreter to
automatically clear all locals on frames when they have finished
executing.  It could be a thread-local setting (or, better, a PEP 550
context variable setting).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Evil reference cycles caused Exception.__traceback__

2017-09-18 Thread Victor Stinner
Hi,

Python 3 added a __traceback__ attribute to exception objects. I guess
that it was added to be able to get the original traceback when an
exception is re-raised. Artifical example (real code is more complex
with subfunctions and conditional code):

try:
   ...
except Exception as exc:
  ...
  raise exc   # keep original traceback


The problem is that Exception.__traceback__ creates reference cycles.
If you store an exception in a local variable, suddently, all local
variables of all frames (of the current traceback) become part of a
giant reference cycle. Sometimes all these variables are kept alive
very long (until you exit Python?), at least, much longer than the
"expected" lifetime (destroyed "magically" local variables when you
exit the function).

The reference cycle is:

1) frame -> local variable -> variable which stores the exception
2) exception -> traceback -> frame

exception -> ... -> frame -> ... -> same exception


Breaking manually the reference cycle is complex. First, you must be
aware of the reference cycle! Second, you have to identify which
functions of your large application create the reference cycle: this
task is long and painful even with good tooling. Finally, you have to
explicitly clear variables or attributes to break the reference cycle
manually.


asyncio.Future.set_exception() keeps an exception object and its
traceback object alive: asyncio creates reference cycles *by design*.
Enjoy! asyncio tries hard to reduce the consequence of reference
cycles, or even try to break cycles, using hacks like "self = None" in
methods... Setting self to None is really surprising and requires a
comment explainaing the hack.


Last years, I fixed many reference cycles in various parts of the
Python 3 standard library. Sometimes, it takes years to become aware
of the reference cycle and finally fix it.

For example, recently, I worked on fixing all "dangling threads"
leaked by tests of the Python test suite, and I found and fixed many
reference cycles which probably existed since Python 3 was created
(forked from Python 2):

* socket.create_connection(): commit
acb9fa79fa6453c2bbe3ccfc9cad2837feb90093, bpo-31234

* concurrent.futures.ThreadPoolExecutor: commit
bc61315377056fe362b744d9c44e17cd3178ce54, bpo-31249

* pydoc: commit 4cab2cd0c05fcda5fcb128c9eb230253fff88c21, bpo-31238

* xmlrpc.server: commit 84524454d0ba77d299741c47bd0f5841ac3f66ce, bpo-31247

Other examples:

* test_ssl: commit 868710158910fa38e285ce0e6d50026e1d0b2a8c, bpo-31323

* test_threading: commit 3d284c081fc3042036adfe1bf2ce92c34d743b0b, bpo-31234

Another example of a recent change fixing a reference cycle, by Antoine Pitrou:

* multiprocessing: commit 79d37ae979a65ada0b2ac820279ccc3b1cd41ba6, bpo-30775


For socket.create_connection(), I discovered the reference cycle
because a test started to log a warning about dangling thred. The
warning was introduced indirectly by a change which modified
support.HOST value from '127.0.0.1' to 'localhost'... It's hard to see
to link between support.HOST value and a reference cycle. Full story:

https://bugs.python.org/issue29639#msg302087


Again, it's just yet another random example of a very tricky reference
cycle bug caused by Exception.__traceback__.


Ideally, CPython 3.x should never create reference cycles. Removing
Exception.__traceback__ is the obvious "fix" for the issue. But I
expect that slowly, a lot of code started to rely on the attribute,
maybe even for good reasons :-)

A more practical solution would be to log a warning. Maybe the garbage
collector can emit a warning if it detects an exception part of a
reference cycle? Or maybe detect frames?

If the GC cannot do it, maybe we might use a debug thread (enabled
manually) which checks manually if an exception is part of a reference
cycle using gc.get_objects(): check if an exception remains alive
longer than X seconds? I had the same idea for asyncio, to detect
reference cycles or if a task is never "awaited", but I never
implemented the idea.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com