subject:"Re\: \[Python\-ideas\] Deterministic iterator cleanup"

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-29 Thread Neil Girdhar



On Tuesday, October 25, 2016 at 6:26:17 PM UTC-4, Nathaniel Smith wrote:
>
> On Sat, Oct 22, 2016 at 9:02 AM, Nick Coghlan  > wrote: 
> > On 20 October 2016 at 07:02, Nathaniel Smith  > wrote: 
> >> The first change is to replace the outer for loop with a while/pop 
> >> loop, so that if an exception occurs we'll know which iterables remain 
> >> to be processed: 
> >> 
> >> def chain(*iterables): 
> >> try: 
> >> while iterables: 
> >> for element in iterables.pop(0): 
> >> yield element 
> >> ... 
> >> 
> >> Now, what do we do if an exception does occur? We need to call 
> >> iterclose on all of the remaining iterables, but the tricky bit is 
> >> that this might itself raise new exceptions. If this happens, we don't 
> >> want to abort early; instead, we want to continue until we've closed 
> >> all the iterables, and then raise a chained exception. Basically what 
> >> we want is: 
> >> 
> >> def chain(*iterables): 
> >> try: 
> >> while iterables: 
> >> for element in iterables.pop(0): 
> >> yield element 
> >> finally: 
> >> try: 
> >> operators.iterclose(iter(iterables[0])) 
> >> finally: 
> >> try: 
> >> operators.iterclose(iter(iterables[1])) 
> >> finally: 
> >> try: 
> >> operators.iterclose(iter(iterables[2])) 
> >> finally: 
> >> ... 
> >> 
> >> but of course that's not valid syntax. Fortunately, it's not too hard 
> >> to rewrite that into real Python -- but it's a little dense: 
> >> 
> >> def chain(*iterables): 
> >> try: 
> >> while iterables: 
> >> for element in iterables.pop(0): 
> >> yield element 
> >> # This is equivalent to the nested-finally chain above: 
> >> except BaseException as last_exc: 
> >> for iterable in iterables: 
> >> try: 
> >> operators.iterclose(iter(iterable)) 
> >> except BaseException as new_exc: 
> >> if new_exc.__context__ is None: 
> >> new_exc.__context__ = last_exc 
> >> last_exc = new_exc 
> >> raise last_exc 
> >> 
> >> It's probably worth wrapping that bottom part into an iterclose_all() 
> >> helper, since the pattern probably occurs in other cases as well. 
> >> (Actually, now that I think about it, the map() example in the text 
> >> should be doing this instead of what it's currently doing... I'll fix 
> >> that.) 
> > 
> > At this point your code is starting to look a whole lot like the code 
> > in contextlib.ExitStack.__exit__ :) 
>
> One of the versions I tried but didn't include in my email used 
> ExitStack :-). It turns out not to work here: the problem is that we 
> effectively need to enter *all* the contexts before unwinding, even if 
> trying to enter one of them fails. ExitStack is nested like (try (try 
> (try ... finally) finally) finally), and we need (try finally (try 
> finally (try finally ...))) But this is just a small side-point 
> anyway, since most code is not implementing complicated 
> meta-iterators; I'll address your real proposal below. 
>
> > Accordingly, I'm going to suggest that while I agree the problem you 
> > describe is one that genuinely emerges in large production 
> > applications and other complex systems, this particular solution is 
> > simply far too intrusive to be accepted as a language change for 
> > Python - you're talking a fundamental change to the meaning of 
> > iteration for the sake of the relatively small portion of the 
> > community that either work on such complex services, or insist on 
> > writing their code as if it might become part of such a service, even 
> > when it currently isn't. Given that simple applications vastly 
> > outnumber complex ones, and always will, I think making such a change 
> > would be a bad trade-off that didn't come close to justifying the 
> > costs imposed on the rest of the ecosystem to adjust to it. 
> > 
> > A potentially more fruitful direction of research to pursue for 3.7 
> > would be the notion of "frame local resources", where each Python 
> > level execution frame implicitly provided a lazily instantiated 
> > ExitStack instance (or an equivalent) for resource management. 
> > Assuming that it offered an "enter_frame_context" function that mapped 
> > to "contextlib.ExitStack.enter_context", such a system would let us do 
> > things like: 
>
> So basically a 'with expression', that gives up the block syntax -- 
> taking its scope from the current function instead -- in return for 
> being usable in expression context? That's a really interesting, and I 
> see the intuition that it might be less disruptive if our implicit 
> iterclose calls are scoped to the function rather than the 'for' loop. 
>
> But having thought about it and

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-26 Thread Nick Coghlan

On 26 October 2016 at 08:48, Nathaniel Smith  wrote:
> If it takes a strong reference, then suddenly we're pinning all
> iterators in memory until the end of the enclosing function, which
> will often look like a memory leak. I think this would break a *lot*
> more existing code than the for-scoped-iterclose proposal does, and in
> more obscure ways that are harder to detect and warn about ahead of
> time.

It would take a strong reference, which is another reason why
close_resources() would be an essential part of the explicit API
(since it would drop the references in addition to calling the
__exit__() and close() methods of the declared resources), and also
yet another reason why you've convinced me that the only implicit API
that would ever make sense is one that was scoped specifically to the
iteration process.

However, I still think the explicit-API-only suggestion is a much
better path to pursue than any implicit proposal - it will give folks
that see it for the first something to Google, and it's a general
purpose technique rather than being restricted specifically to the
cases where the resource to be managed and the iterator being iterated
over are one and the same object.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-26 Thread Nick Coghlan

On 26 October 2016 at 08:25, Nathaniel Smith  wrote:
> On Sat, Oct 22, 2016 at 9:02 AM, Nick Coghlan  wrote:
>> At this point your code is starting to look a whole lot like the code
>> in contextlib.ExitStack.__exit__ :)
>
> One of the versions I tried but didn't include in my email used
> ExitStack :-). It turns out not to work here: the problem is that we
> effectively need to enter *all* the contexts before unwinding, even if
> trying to enter one of them fails. ExitStack is nested like (try (try
> (try ... finally) finally) finally), and we need (try finally (try
> finally (try finally ...)))

Regardless of any other outcome from this thread, it may be useful to
have a "contextlib.ResourceSet" as an abstraction for collective
management of resources, regardless of whatever else happens. As you
say, the main difference is that the invocation of the cleanup
functions wouldn't be nested at all and could be called in an
arbitrary order (if that's not sufficient for a particular use case,
then you'd need to define an ExitStack for the items where the order
of cleanup matters, and then register *that* with the ResourceSet).

>> A potentially more fruitful direction of research to pursue for 3.7
>> would be the notion of "frame local resources", where each Python
>> level execution frame implicitly provided a lazily instantiated
>> ExitStack instance (or an equivalent) for resource management.
>> Assuming that it offered an "enter_frame_context" function that mapped
>> to "contextlib.ExitStack.enter_context", such a system would let us do
>> things like:
>
> So basically a 'with expression', that gives up the block syntax --
> taking its scope from the current function instead -- in return for
> being usable in expression context? That's a really interesting, and I
> see the intuition that it might be less disruptive if our implicit
> iterclose calls are scoped to the function rather than the 'for' loop.
>
> But having thought about it and investigated some... I don't think
> function-scoping addresses my problem, and I don't see evidence that
> it's meaningfully less disruptive to existing code.
>
> First, "my problem":
>
> Obviously, Python's a language that should be usable for folks doing
> one-off scripts, and for paranoid folks trying to write robust complex
> systems, and for everyone in between -- these are all really important
> constituencies. And unfortunately, there is a trade-off here, where
> the changes we're discussing effect these constituencies differently.
> But it's not just a matter of shifting around a fixed amount of pain;
> the *quality* of the pain really changes under the different
> proposals.
>
> In the status quo:
> - for one-off scripts: you can just let the GC worry about generator
> and file handle cleanup, re-use iterators, whatever, it's cool
> - for robust systems: because it's the *caller's* responsibility to
> ensure that iterators are cleaned up, you... kinda can't really use
> generators without -- pick one -- (a) draconian style guides (like
> forbidding 'with' inside generators or forbidding bare 'for' loops
> entirely), (b) lots of auditing (every time you write a 'for' loop, go
> read the source to the generator you're iterating over -- no
> modularity for you and let's hope the answer doesn't change!), or (c)
> introducing really subtle bugs.

(Note: I've changed my preferred API name from "function_resource" +
"frame_resource" to the general purpose "scoped_resource" - while it's
somewhat jargony, which I consider unfortunate, the goal is to make
the runtime scope of the resource match the lexical scope of the
reference as closely as is feasible, and if folks are going to
understand how Python manages references and resources, they're going
to need to learn the basics of Python's scope management at some
point)

Given your points below, the defensive coding recommendation here would be to

- always wrap your iterators in scoped_resource() to tell Python to
clean them up when the function is done
- explicitly call close_resources() after the affected for loops to
clean the resources up early

You'd still be vulnerable to resource leaks in libraries you didn't
write, but would have decent control over your own code without having
to make overly draconian changes to your style guide - you'd only need
one new rule, which is "Whenever you're iterating over something, pass
it through scoped_resource first".

To simplify this from a forwards compatibility perspective (i.e. so it
can implicitly adjust when an existing type gains a cleanup method),
we'd make scoped_resource() quite permissive, accepting arbitrary
objects with the following behaviours:

- if it's a context manager, enter it, and register the exit callback
- if it's not a context manager, but has a close() method, register
the close method
- otherwise, pass it straight through without taking any other action

This would allow folks to always declare something as a scoped

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-25 Thread Nathaniel Smith

...Doh. I spent all that time evaluating the function-scoped-cleanup
proposal from the high-level design perspective, and then immediately
after hitting send, I suddenly realized that I'd missed a much more
straightforward technical problem.

One thing that 'with' blocks / for-scoped-iterclose do is that they
put an upper bound on the lifetime of generator objects. That's
important if you're using a non-refcounting-GC, or if there might be
reference cycles. But it's not all they do: they also arrange to make
sure that any cleanup code is executed in the context of the code
that's using the generator. This is *also* really important: if you
have an exception in your cleanup code, and the GC runs your cleanup
code, then that exception will just disappear into nothingness (well,
it'll get printed to the console, but that's hardly better). So you
don't want to let the GC run your cleanup code. If you have an async
generator, you want to run the cleanup code under supervision of the
calling functions coroutine runner, and ideally block the running
coroutine while you do it; doing this from the GC is
difficult-to-impossible (depending on how picky you are -- PEP 525
does part of it, but not all). Again, letting the GC get involved is
bad.

So for the function-scoped-iterclose proposal: does this implicit
ExitStack-like object take a strong reference to iterators, or just a
weak one?

If it takes a strong reference, then suddenly we're pinning all
iterators in memory until the end of the enclosing function, which
will often look like a memory leak. I think this would break a *lot*
more existing code than the for-scoped-iterclose proposal does, and in
more obscure ways that are harder to detect and warn about ahead of
time. So that's out.

If it takes a weak reference, ... then there's a good chance that
iterators will get garbage collected before the ExitStack has a chance
to clean them up properly. So we still have no guarantee that the
cleanup will happen in the right context, that exceptions will not be
lost, and so forth. In fact, it becomes literally non-deterministic:
you might see an exception propagate properly on one run, and not on
the next, depending on exactly when the garbage collector happened to
run.

IMHO that's *way* too spooky to be allowed, but I can't see any way to
fix it within the function-scoping framework :-(

-n

On Tue, Oct 25, 2016 at 3:25 PM, Nathaniel Smith  wrote:
> On Sat, Oct 22, 2016 at 9:02 AM, Nick Coghlan  wrote:
>> On 20 October 2016 at 07:02, Nathaniel Smith  wrote:
>>> The first change is to replace the outer for loop with a while/pop
>>> loop, so that if an exception occurs we'll know which iterables remain
>>> to be processed:
>>>
>>> def chain(*iterables):
>>> try:
>>> while iterables:
>>> for element in iterables.pop(0):
>>> yield element
>>> ...
>>>
>>> Now, what do we do if an exception does occur? We need to call
>>> iterclose on all of the remaining iterables, but the tricky bit is
>>> that this might itself raise new exceptions. If this happens, we don't
>>> want to abort early; instead, we want to continue until we've closed
>>> all the iterables, and then raise a chained exception. Basically what
>>> we want is:
>>>
>>> def chain(*iterables):
>>> try:
>>> while iterables:
>>> for element in iterables.pop(0):
>>> yield element
>>> finally:
>>> try:
>>> operators.iterclose(iter(iterables[0]))
>>> finally:
>>> try:
>>> operators.iterclose(iter(iterables[1]))
>>> finally:
>>> try:
>>> operators.iterclose(iter(iterables[2]))
>>> finally:
>>> ...
>>>
>>> but of course that's not valid syntax. Fortunately, it's not too hard
>>> to rewrite that into real Python -- but it's a little dense:
>>>
>>> def chain(*iterables):
>>> try:
>>> while iterables:
>>> for element in iterables.pop(0):
>>> yield element
>>> # This is equivalent to the nested-finally chain above:
>>> except BaseException as last_exc:
>>> for iterable in iterables:
>>> try:
>>> operators.iterclose(iter(iterable))
>>> except BaseException as new_exc:
>>> if new_exc.__context__ is None:
>>> new_exc.__context__ = last_exc
>>> last_exc = new_exc
>>> raise last_exc
>>>
>>> It's probably worth wrapping that bottom part into an iterclose_all()
>>> helper, since the pattern probably occurs in other cases as well.
>>> (Actually, now that I think about it, the map() example in the text
>>> should be doing this instead of what it's currently doing... I'll fix
>>> that.)
>>
>> At this point your code is starting to look a whole lot like the code
>> in contextlib.ExitStack.__exit__ :)
>
>

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-24 Thread Chris Barker

On Sat, Oct 22, 2016 at 9:17 AM, Nick Coghlan  wrote:

> This is actually a case where style guidelines would ideally differ
> between between scripting use cases ... and
> library(/framework/application) development use cases
>

Hmm -- interesting idea -- and I recall Guido bringing something like this
up on one of these lists not too long ago -- "scripting" use cases really
are different that "systems programming"

However, that script/library distinction isn't well-defined in
> computing instruction in general,

no it's not -- except in the case of "scripting languages" vs. "systems
languages" -- you can go back to the classic  Ousterhout paper:

https://www.tcl.tk/doc/scripting.html

But Python really is suitable for both use cases, so tricky to know how to
teach.

And my classes, at least, have folks with a broad range of use-cases in
mind, so I can't choose one way or another. And, indeed, there is no small
amount of code (and coder) that starts out as a quicky script, but ends up
embedded in a larger system down the road.

And (another and?) one of the great things ABOUT Python is that is IS
suitable for such a broad range of use-cases.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-22 Thread Nick Coghlan

On 23 October 2016 at 02:17, Nick Coghlan  wrote:
> On 22 October 2016 at 06:59, Chris Barker  wrote:
>> And then context managers were introduced. And it seems to be there is a
>> consensus in the Python community that we all should be using them when
>> working on files, and I myself have finally started routinely using them,
>> and teaching newbies to use them -- which is kind of a pain, 'cause I want
>> to have them do basic file reading stuff before I explain what a "context
>> manager" is.
>
> This is actually a case where style guidelines would ideally differ
> between between scripting use cases (let the GC handle it whenever,
> since your process will be terminating soon anyway) and
> library(/framework/application) development use cases (promptly clean
> up after yourself, since you don't necessarily know your context of
> use).
>
> However, that script/library distinction isn't well-defined in
> computing instruction in general, and most published style guides are
> written by library/framework/application developers, so students and
> folks doing ad hoc scripting tend to be the recipients of a lot of
> well-meaning advice that isn't actually appropriate for them :(

Pondering this overnight, I realised there's a case where folks using
Python primarily as a scripting language can still run into many of
the resource management problems that arise in larger applications:
IPython notebooks, where the persistent kernel can keep resources
alive for a surprisingly long time in the absence of a reference
counting GC. Yes, they have the option of just restarting the kernel
(which many applications don't have), but it's still a nicer user
experience if we can help them avoid having those problems arise in
the first place.

This is likely mitigated in practice *today* by IPython users mostly
being on CPython for access to the Scientific Python stack, but we can
easily foresee a future where the PyPy community have worked out
enough of their NumPy compatibility and runtime redistribution
challenges that it becomes significantly more common to be using
notebooks against Python kernels that don't use automatic reference
counting.

I'm significantly more amenable to that as a rationale for pursuing
non-syntactic approaches to local resource management than I am the
notion of pursuing it for the sake of high performance application
development code.

Chris, would you be open to trying a thought experiment with some of
your students looking at ways to introduce function-scoped
deterministic resource management *before* introducing with
statements? Specifically, I'm thinking of a progression along the
following lines:

# Cleaned up whenever the interpreter gets around to cleaning up
the function locals
def readlines_with_default_resource_management(fname):
return open(fname).readlines()

# Cleaned up on function exit, even if the locals are still
referenced from an exception traceback
# or the interpreter implementation doesn't use a reference counting GC
from local_resources import function_resource

def readlines_with_declarative_cleanup(fname):
   return function_resource(open(fname)).readlines()

# Cleaned up at the end of the with statement
def readlines_with_imperative_cleanup(fname):
with open(fname) as f:
return f.readlines()

The idea here is to change the requirement for new developers from
"telling the interpreter what to *do*" (which is the situation we have
for context managers) to "telling the interpreter what we *want*"
(which is for it to link a managed resource with the lifecycle of the
currently running function call, regardless of interpreter
implementation details)

Under that model, Inada-san's recent buffer snapshotting proposal
would effectively be an optimised version of the one liner:

def snapshot(data, limit, offset=0):
return bytes(function_resource(memoryview(data))[offset:limit])

The big refactoring benefit that this feature would offer over with
statements is that it doesn't require a structural change to the code
- it's just wrapping an existing expression in a new function call
that says "clean this up promptly when the function terminates, even
if it's still part of a reference cycle, or we're not using a
reference counting GC".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-22 Thread Nick Coghlan

On 22 October 2016 at 06:59, Chris Barker  wrote:
> And then context managers were introduced. And it seems to be there is a
> consensus in the Python community that we all should be using them when
> working on files, and I myself have finally started routinely using them,
> and teaching newbies to use them -- which is kind of a pain, 'cause I want
> to have them do basic file reading stuff before I explain what a "context
> manager" is.

This is actually a case where style guidelines would ideally differ
between between scripting use cases (let the GC handle it whenever,
since your process will be terminating soon anyway) and
library(/framework/application) development use cases (promptly clean
up after yourself, since you don't necessarily know your context of
use).

However, that script/library distinction isn't well-defined in
computing instruction in general, and most published style guides are
written by library/framework/application developers, so students and
folks doing ad hoc scripting tend to be the recipients of a lot of
well-meaning advice that isn't actually appropriate for them :(

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Nathaniel Smith

On Fri, Oct 21, 2016 at 3:29 AM, Steven D'Aprano  wrote:
> As for the amount of good, this proposal originally came from PyPy.

Just to be clear, I'm not a PyPy dev, and the PyPy devs' contribution
here was mostly to look over a draft I circulated and to agree that it
seemed like something that'd be useful to them.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Nathaniel Smith

On Fri, Oct 21, 2016 at 3:48 PM, Amit Green  wrote:
> NOTE: This is my first post to this mailing list, I'm not really sure
>   how to post a message, so I'm attempting a reply-all.
>
> I like Nathaniel's idea for __iterclose__.
>
> I suggest the following changes to deal with a few of the complex issues
> he discussed.
>
> 1.  Missing __iterclose__, or a value of none, works as before,
> no changes.
>
> 2.  An iterator can be used in one of three ways:
>
> A. 'for' loop, which will call __iterclose__ when it exits
>
> B.  User controlled, in which case the user is responsible to use the
> iterator inside a with statement.
>
> C.  Old style.  The user is responsible for calling __iterclose__
>
> 3.  An iterator keeps track of __iter__ calls, this allows it to know
> when to cleanup.
>
>
> The two key additions, above, are:
>
> #2B. User can use iterator with __enter__ & __exit cleanly.
>
> #3.  By tracking __iter__ calls, it makes complex user cases easier
>  to handle.

These are interesting ideas! A few general comments:

- I don't think we want the "don't bother to call __iterclose__ on
exhaustion" functionality --it's actually useful to be able to
distinguish between

# closes file_handle
for line in file_handle:
...

and

# leaves file_handle open
for line in preserve(file_handle):
...

To be able to distinguish these cases, it's important that the 'for'
loop always call __iterclose__ (which preserve() might then cancel
out).

- I think it'd be practically difficult and maybe too much magic to
add __enter__/__exit__/nesting-depth counts to every iterator
implementation. But, the idea of using a context manager for repeated
partial iteration is a great idea :-). How's this for a simplified
version that still covers the main use cases?

@contextmanager
def reuse_then_close(it):   # TODO: come up with a better name
it = iter(it)
try:
yield preserve(it)
finally:
iterclose(it)

with itertools.reuse_then_close(some_generator(...)) as it:
for obj in it:
...
# still open here, because our reference to the iterator is
wrapped in preserve(...)
for obj in it:
...
# but then closed here, by the 'with' block

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Ethan Furman


On 10/21/2016 03:48 PM, Amit Green wrote:


NOTE: This is my first post to this mailing list, I'm not really sure
   how to post a message, so I'm attempting a reply-all.


Seems to have worked! :)


I like Nathaniel's idea for __iterclose__.

I suggest the following changes to deal with a few of the complex issues
he discussed.


Your examples are interesting, but they don't seem to address the issue of 
closing down for loops that are using generators when those loops exit early:

-
def some_work():
with some_resource():
  for widget in resource:
  yield widget


for pane in some_work():
break:

# what happens here?
-

How does your solution deal with that situation?  Or are you saying that this 
would be closed with your modifications, and if I didn't want the generator to 
be closed I would have to do:

-
with some_work() as temp_gen:
for pane in temp_gen:
break:

for another_pane in temp_gen:
# temp_gen is still alive here
-

In other words, instead using the preserve() function, we would use a with 
statement?

--
~Ethan~
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Amit Green

NOTE: This is my first post to this mailing list, I'm not really sure
  how to post a message, so I'm attempting a reply-all.

I like Nathaniel's idea for __iterclose__.

I suggest the following changes to deal with a few of the complex issues
he discussed.

1.  Missing __iterclose__, or a value of none, works as before,
no changes.

2.  An iterator can be used in one of three ways:

A. 'for' loop, which will call __iterclose__ when it exits

B.  User controlled, in which case the user is responsible to use the
iterator inside a with statement.

C.  Old style.  The user is responsible for calling __iterclose__

3.  An iterator keeps track of __iter__ calls, this allows it to know
when to cleanup.


The two key additions, above, are:

#2B. User can use iterator with __enter__ & __exit cleanly.

#3.  By tracking __iter__ calls, it makes complex user cases easier
 to handle.

Specification
=

An iterator may implement the following method: __iterclose__.  A missing
method, or a value of None is allowed.

When the user wants to control the iterator, the user is expected to
use the iterator with a with clause.

The core proposal is the change in behavior of ``for`` loops. Given this
Python code:

  for VAR in ITERABLE:
  LOOP-BODY
  else:
  ELSE-BODY

we desugar to the equivalent of:

  _iter = iter(ITERABLE)
  _iterclose = getattr(_iter, '__iterclose__', None)

  if _iterclose is none:
  traditional-for VAR in _iter:
 LOOP-BODY
  else:
 ELSE-BODY
  else:
 _stop_exception_seen = False try:
 traditional-for VAR in _iter:
 LOOP-BODY
 else:
 _stop_exception_seen = True
 ELSE-BODY
 finally:
if not _stop_exception_seen:
_iterclose(_iter)

The test for 'none' allows us to skip the setup of a try/finally clause.

Also we don't bother to call __iterclose__ if the iterator threw
StopException at us.

Modifications to basic iterator types
=

An iterator will implement something like the following:

  _cleanup   - Private funtion, does the following:

_enter_count = _itercount = -1

Do any neccessary cleanup, release resources, etc.

   NOTE: Is also called internally by the iterator,
   before throwing StopIterator

  _iter_count- Private value, starts at 0.

  _enter_count   - Private value, starts at 0.

  __iter__   - if _iter_count >= 0:
   _iter_count += 1

   return self

  __iterclose__  - if _iter_count is 0:
   if _enter_count is 0:
   _cleanup()
   elif _iter_count > 0:
   _iter_count -= 1

  __enter__  - if _enter_count >= 0:
   _enter_count += 1

   Return itself.

  __exit__   - if _enter_count is > 0
   _enter_count -= 1

   if _enter_count is _iter_count is 0:
_cleanup()

The suggetions on _iter_count & _enter_count are just example; internal
details can differ (and better error handling).


Examples:
=

NOTE: Example are givin using xrange() or [1, 2, 3, 4, 5, 6, 7] for
  simplicity.  For real use, the iterator would have resources such
  as open files it needs to close on cleanup.


1.  Simple example:

for v in xrange(7):
print v

Creates an iterator with a _usage_count of 0.  The iterator exits
normally (by throwing StopException), we don't bother to call
__iterclose__


2.  Break example:

for v in [1, 2, 3, 4, 5, 6, 7]:
print v

if v == 3:
break

Creates an iterator with a _usage_count of 0.

The iterator exists after generating 4 numbers, we then call
__iterclose__ & the iterator does any necessary cleanup.

3.  Convert example #2 to print the next value:

with iter([1, 2, 3, 4, 5, 6, 7]) as seven:
for v in seven:
print v

if v == 3:
break

print 'Next value is: ', seven.next()

This will print:

1
2
3
Next value is: 4

How this works:

1.  We create an iterator named seven (by calling list.__iter__).

2.  We call seven.__enter__

3.  The for loop calls: seven.next() 3 times, and then calls:
seven.__iterclose__

Since the _enter_count is 1, the iterator does not do
cleanup yet.

4.  We call seven.next()

5.  We call seven.__exit.  The iterator does its cleanup now.

4.  More complicated example:

with iter([1, 2, 3, 4, 5, 6, 7]) as seven:
for v in seven:
print v

if v == 1:
for v in seven:

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Paul Moore

On 21 October 2016 at 21:59, Chris Barker  wrote:
>> So (it seems to
>> me) that you're talking about changing the behaviour of for-loops to
>> suit only a small proportion of cases: maybe 10% of 10%.
>
>
> I don't see what the big overhead is here. for loops would get a new
> feature, but it would only be used by the objects that chose to implement
> it. So no huge change.

But the point is that the feature *would* affect people who don't need
it. That's what I'm struggling to understand. I keep hearing "most
code won't be affected", but then discussions about how we ensure that
people are warned of where they need to add preserve() to their
existing code to get the behaviour they already have. (And, of course,
they need to add an "if we're on older pythons, define a no-op version
of preserve() backward compatibility wrapper if they want their code
to work cross version). I genuinely expect preserve() to pretty much
instantly appear on people's lists of "python warts", and that bothers
me.

But I'm reaching the point where I'm just saying the same things over
and over, so I'll bow out of this discussion now. I remain confused,
but I'm going to have to trust that the people who have got a handle
on the issue have understood the point I'm making, and have it
covered.

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Chris Barker

On Fri, Oct 21, 2016 at 12:12 AM, Steven D'Aprano 
wrote:

> Portability across Pythons... if all Pythons performed exactly the same,
> why would we need multiple implementations? The way I see it,
> non-deterministic cleanup is the cost you pay for a non-reference
> counting implementation, for those who care about the garbage collection
> implementation. (And yes, ref counting is garbage collection.)
>

Hmm -- and yet "with" was added, and I an't imageine that its largest
use-case is with ( ;-) ) open:

with open(filename, mode) as my_file:

And yet for years I happily counted on reference counting to close my
files, and was particularly happy with:

data = open(filename, mode).read()

I really liked that that file got opened, read, and closed and cleaned up
right off the bat.

And then context managers were introduced. And it seems to be there is a
consensus in the Python community that we all should be using them when
working on files, and I myself have finally started routinely using them,
and teaching newbies to use them -- which is kind of a pain, 'cause I want
to have them do basic file reading stuff before I explain what a "context
manager" is.

Anyway, my point is that the broader Python community really has been
pretty consistent about making it easy to write code that will work the
same way (maybe not with the same performance) across python
implementations. Ans specifically  with deterministic resource management.

On my system, I can open 1000+ files as a regular user. I can't even
> comprehend opening a tenth of that as an ordinary application, although
> I can imagine that if I were writing a server application things would
> be different.

well, what you can image isn't really the point -- I've bumped into that
darn open file limit in my work, which was not a server application (though
it was some pretty serious number crunching...). And I'm sure I'm not
alone. OK, to be fair that was a poorly designed library, not an issue with
determinism of resource management (through designing the lib well WOULD
depend on that)

But then I don't expect to write server applications in
> quite the same way as I do quick scripts or regular user applications.
>

Though data analysts DO write "quick scripts" that might need to do things
like access 100s of files...

> So it seems to me that a leaked file handler or two normally shouldn't
> be a problem in practice. They'll be friend when the script or
> application closes, and in the meantime, you have hundreds more
> available. 90% of the time, using `with file` does exactly what we want,
> and the times it doesn't (because we're writing a generator that isn't
> closed promptly) 90% of those times it doesn't matter.

that was the case with "with file" from the beginning -- particularly on
cPython. And yet we all thought it was a great idea.

> So (it seems to
> me) that you're talking about changing the behaviour of for-loops to
> suit only a small proportion of cases: maybe 10% of 10%.
>

I don't see what the big overhead is here. for loops would get a new
feature, but it would only be used by the objects that chose to implement
it. So no huge change.

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Yury Selivanov




On 2016-10-21 11:19 AM, Gustavo Carneiro wrote:

Personally, I hadn't realised we had this problem in asyncio until now.

Does this problem happen in asyncio at all?  Or does asyncio somehow work
around it by making sure to always explicitly destroy the frames of all
coroutine objects, as long as someone waits on each task?


No, I think asyncio code is free of the problem this proposal
is trying to address.

We might have some "problem" in 3.6 when people start using
async generators more often.  But I think it's important for us
to teach people to manage the associated resources from the
outside of the generator (i.e. don't put 'async with' or 'with'
inside the generator's body; instead, wrap the code that uses
the generator with 'async with' or 'with').

Yury



On 21 October 2016 at 16:08, Yury Selivanov  wrote:



On 2016-10-21 6:29 AM, Steven D'Aprano wrote:


On Wed, Oct 19, 2016 at 05:52:34PM -0400, Yury Selivanov wrote:


[..]


With you proposal, to achieve the same (and make the code compatible

with new for-loop semantics), users will have to implement both
__iterclose__ and __del__.


As I ask above, couldn't we just inherit a default __(a)iterclose__ from
object that looks like this?

  def __iterclose__(self):
  finalizer = getattr(type(self), '__del__', None)
  if finalizer:
  finalizer(self)


I know it looks a bit funny for non-iterables to have an iterclose
method, but they'll never actually be called.


No, we can't call __del__ from __iterclose__.  Otherwise we'd
break even more code that this proposal already breaks:


   for i in iter:
  ...
   iter.something()  # <- this would be call after iter.__del__()

[..]


As for the amount of good, this proposal originally came from PyPy. I
expect that CPython users won't appreciate it as much as PyPy users, and
Jython/IronPython users when they eventually support Python 3.x.


AFAIK the proposal came "for" PyPy, not "from".  And the
issues Nathaniel tries to solve do also exist in CPython.  It's
only a question if changing 'for' statement and iteration protocol
is worth the trouble.

Yury

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/






___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Ronan Lamy


Le 21/10/16 à 14:35, Paul Moore a écrit :


[1] As I understand it. CPython's refcounting GC makes this a
non-issue, correct?


Wrong. Any guarantee that you think the CPython GC provides goes out of 
the window as soon as you have a reference cycle. Refcounting does not 
actually make GC deterministic, it merely hides the problem away from view.


For instance, on CPython 3.5, running this code:

#%

class some_resource:
def __enter__(self):
print("Open resource")
return 42

def __exit__(self, *args):
print("Close resource")

def some_iterator():
with some_resource() as s:
yield s

def main():
it = some_iterator()
for i in it:
if i == 42:
print("The answer is", i)
break
print("End loop")

# later ...
try:
1/0
except ZeroDivisionError as e:
exc = e

main()
print("Exit")

#%%

produces:

Open resource
The answer is 42
End loop
Exit
Close resource

What happens is that 'exc' holds a cyclic reference back to the main() 
frame, which prevents it from being destroyed when the function exits, 
and that frame, in turn, holds a reference to the iterator, via the 
local variable 'it'. And so, the iterator remains alive, and the 
resource unclosed, until the next garbage collection.

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Paul Moore

On 21 October 2016 at 12:23, Steven D'Aprano  wrote:
> On Fri, Oct 21, 2016 at 11:03:51AM +0100, Paul Moore wrote:
>
>> At the moment, the take home message for such users feels like it's
>> "you might need to scatter preserve() around your code, to avoid the
>> behaviour change described above, which you glazed over because it
>> talked about all that coroutiney stuff you don't understand" :-)
>
> I now believe that's not necessarily the case. I think that the message
> should be:
>
> - If your iterator class has a __del__ or close method, then you need
>   to read up on __(a)iterclose__.
>
> - If you iterate over open files twice, then all you need to remember is
>   that the file will be closed when you exit the first loop. To avoid
>   that auto-closing behaviour, use itertools.preserve().
>
> - Iterating over lists, strings, tuples, dicts, etc. won't change, since
>   they don't have __del__ or close() methods.
>
>
> I think that covers all the cases the average Python code will care
> about.

OK, that's certainly a lot less scary.

Some thoughts, remain, though:

1. You mention files. Presumably (otherwise what would be the point of
the change?) there will be other iterables that change similarly.
There's no easy way to know in advance.
2. Cleanup protocols for iterators are pretty messy now - __del__,
close, __iterclose__, __aiterclose__. What's the chance 3rd party
implementers get something wrong?
3. What about generators? If you write your own generator, you don't
control the cleanup code. The example:

def mygen(name):
with open(name) as f:
for line in f:
yield line

is a good example - don't users of this generator need to use
preserve() in order to be able to do partial iteration? And yet how
would the writer of the generator know to document this? And if it
isn't documented, how does the user of the generator know preserve is
needed?

My feeling is that this proposal is a relatively significant amount of
language churn, to solve a relatively niche problem, and furthermore
one that is actually only a problem to non-CPython implementations[1].
My instincts are that we need to back off on the level of such change,
to give users a chance to catch their breath. We're not at the level
of where we need something like the language change moratorium (PEP
3003) but I don't think it would do any harm to give users a chance to
catch their breath after the wave of recent big changes (async,
typing, path protocol, f-strings, funky unpacking, Windows build and
installer changes, ...).

To put this change in perspective - we've lived without it for many
years now, can we not wait a little while longer?

>From another message:
> Bottom line is: at first I thought this was a scary change that would
> break too much code. But now I think it won't break much, and we can
> ease into it really slowly over two or three releases. So I think that
> the cost is probably low. I'm still not sure on how great the benefit
> will be, but I'm leaning towards a +1 on this.

And yet, it still seems to me that it's going to force me to change
(maybe not much, but some of) my existing code, for absolutely zero
direct benefit, as I don't personally use or support PyPy or any other
non-CPython implementations. Don't forget that PyPy still doesn't even
implement Python 3.5 - so no-one benefits from this change until PyPy
supports Python 3.8, or whatever version this becomes the default in.
It's very easy to misuse an argument like this to block *any* sort of
change, and that's not my intention here - but I am trying to
understand what the real-world issue is here, and how (and when!) this
proposal would allow people to write code to fix that problem. At the
moment, it feels like:

   * The problem is file handle leaks in code running under PyPy
   * The ability to fix this will come in around 4 years (random guess
as to when PyPy implements Python 3.8, plus an assumption that the
code needing to be fixed can immediately abandon support for all
earlier versions of PyPy).

Any other cases seem to me to be theoretical at the moment. Am I being
unfair in this assessment? (It feels like I might be, but I can't be
sure how).

Paul

[1] As I understand it. CPython's refcounting GC makes this a
non-issue, correct?
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Steven D'Aprano

On Fri, Oct 21, 2016 at 11:07:46AM +0100, Paul Moore wrote:
> On 21 October 2016 at 10:53, Steven D'Aprano  wrote:
> > On Wed, Oct 19, 2016 at 12:33:57PM -0700, Nathaniel Smith wrote:
> >
> >> I should also say, regarding your specific example, I guess it's an
> >> open question whether we would want list_iterator.__iterclose__ to
> >> actually do anything. It could flip the iterator to a state where it
> >> always raises StopIteration,
> >
> > That seems like the most obvious.

I've changed my mind -- I think maybe it should do nothing, and preserve 
the current behaviour of lists.

I'm now more concerned with keeping current behaviour as much as 
possible than creating some sort of consistent error condition for all 
iterators. Consistency is over-rated, and we already have inconsistency 
here: file iterators behave differently from list iterators, because 
they can be closed:

py> f = open('/proc/mdstat', 'r')
py> a = list(f)
py> b = list(f)
py> len(a), len(b)
(20, 0)
py> f.close()
py> c = list(f)
Traceback (most recent call last):
  File "", line 1, in 
ValueError: I/O operation on closed file.

We don't need to add a close() to list iterators just so they are 
consistent with files. Just let __iterclose__ be a no-op.

> So - does this mean "unless you understand what preserve() does,
> you're OK to not use it and your code will continue to work as
> before"? If so, then I'd be happy with this.

Almost.

Code like this will behave exactly the same as it currently does:

for x in it:
process(x)

y = list(it)

If it is a file object, the second call to list() will raise ValueError; 
if it is a list_iterator, or generator, etc., y will be an empty list.
That part (I think) shouldn't change.

What *will* change is code that partially processes the iterator in two 
different places. A simple example:

py> it = iter([1, 2, 3, 4, 5, 6])
py> for x in it:
... if x == 4: break
...
py> for x in it:
... print(x)
...
5
6

This *may* change. With this proposal, the first loop will "close" the 
iterator when you exit from the loop. For a list, there's no finaliser, 
no __del__ to call, so we can keep the current behaviour and nobody will 
notice any difference.

But if `it` is a file iterator instead of a list iterator, the file will 
be closed when you exit the first for-loop, and the second loop will 
raise ValueError. That will be different.

The fix here is simple: protect the first call from closing:

for x in itertools.preserve(it):  # preserve, protect, whatever
...

Or, if `it` is your own class, give it a __iterclose__ method that does 
nothing.

This is a backwards-incompatible change, so I think we would need to do 
this:

(1) In Python 3.7, we introduce a __future__ directive:

from __future__ import iterclose

to enable the new behaviour. (Remember, future directives apply on a 
module-by-module basis.)

(2) Without the directive, we keep the old behaviour, except that 
warnings are raised if something will change.

(3) Then in 3.8 iterclose becomes the default, the warnings go away, and 
the new behaviour just happens.

If that's too fast for people, we could slow it down: 

(1) Add the future directive to Python 3.7;

(2) but no warnings by default (you have to opt-in to the 
warnings with an environment variable, or command-line switch).

(3) Then in 3.8 the warnings are on by default;

(4) And the iterclose behaviour doesn't become standard until 3.9.

That means if this change worries you, you can ignore it until you 
migrate to 3.8 (which won't be production-ready until about 2020 or so), 
and don't have to migrate your code until 3.9, which will be a year or 
two later. But early adopters can start targetting the new functionality 
from 3.7 if they like.

I don't think there's any need for a __future__ directive for 
aiterclose, since there's not enough backwards-incompatibility to care 
about. (I think, but don't mind if people disagree.) That can happen 
starting in 3.7, and when people complain that their syncronous 
generators don't have deterministic garbage collection like their 
asyncronous ones do, we can point them at the future directive.

Bottom line is: at first I thought this was a scary change that would 
break too much code. But now I think it won't break much, and we can 
ease into it really slowly over two or three releases. So I think that 
the cost is probably low. I'm still not sure on how great the benefit 
will be, but I'm leaning towards a +1 on this.

-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Steven D'Aprano

On Fri, Oct 21, 2016 at 11:03:51AM +0100, Paul Moore wrote:

> At the moment, the take home message for such users feels like it's
> "you might need to scatter preserve() around your code, to avoid the
> behaviour change described above, which you glazed over because it
> talked about all that coroutiney stuff you don't understand" :-)

I now believe that's not necessarily the case. I think that the message 
should be:

- If your iterator class has a __del__ or close method, then you need
  to read up on __(a)iterclose__.

- If you iterate over open files twice, then all you need to remember is 
  that the file will be closed when you exit the first loop. To avoid 
  that auto-closing behaviour, use itertools.preserve().

- Iterating over lists, strings, tuples, dicts, etc. won't change, since 
  they don't have __del__ or close() methods.


I think that covers all the cases the average Python code will care 
about.



-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Paul Moore

On 21 October 2016 at 10:53, Steven D'Aprano  wrote:
> On Wed, Oct 19, 2016 at 12:33:57PM -0700, Nathaniel Smith wrote:
>
>> I should also say, regarding your specific example, I guess it's an
>> open question whether we would want list_iterator.__iterclose__ to
>> actually do anything. It could flip the iterator to a state where it
>> always raises StopIteration,
>
> That seems like the most obvious.

So - does this mean "unless you understand what preserve() does,
you're OK to not use it and your code will continue to work as
before"? If so, then I'd be happy with this.

But I genuinely don't know (without going rummaging through docs) what
that statement means in any practical sense.
Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Steven D'Aprano

You know, I'm actually starting to lean towards this proposal and away 
from my earlier objections...

On Wed, Oct 19, 2016 at 12:33:57PM -0700, Nathaniel Smith wrote:

> I should also say, regarding your specific example, I guess it's an
> open question whether we would want list_iterator.__iterclose__ to
> actually do anything. It could flip the iterator to a state where it
> always raises StopIteration,

That seems like the most obvious.

[...]
> The __iterclose__ contract is that you're not supposed
> to call __next__ afterwards, so there's no real rule about what
> happens if you do.

If I recall correctly, in your proposal you use language like "behaviour 
is undefined". I don't like that language, because it sounds like 
undefined behaviour in C, which is something to be avoided like the 
plague. I hope I don't need to explain why, but for those who may not 
understand the dangers of "undefined behaviour" as per the C standard, 
you can start here:

https://randomascii.wordpress.com/2014/05/19/undefined-behavior-can-format-your-drive/

So let's make it clear that what we actually mean is not C-ish undefined 
behaviour, where the compiler is free to open a portal to the Dungeon 
Dimensions or use Guido's time machine to erase code that executes 
before the undefined code:

https://blogs.msdn.microsoft.com/oldnewthing/20140627-00/?p=633/

but rather ordinary, standard "implementation-dependent behaviour". If 
you call next() on a closed iterator, you'll get whatever the iterator 
happens to do when it is closed. That will be *recommended* to raise 
whatever error is appropriate to the iterator, but not enforced.

That makes it just like the part of the iterator protocol that says that 
once an iterator raise StopIterator, it should always raise 
StopIterator. Those that don't are officially called "broken", but they 
are allowed and you can write one if you want to.

Shorter version:

- calling next() on a closed iterator is expected to be an error of 
  some sort, often RuntimeError error, but the iterator is free to use a 
  different error if that makes sense (e.g. closed files);

- if your own iterator classes break that convention, they will be 
  called "broken", but nobody will stop you from writing such "broken" 
  iterators.

-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Nathaniel Smith

On Wed, Oct 19, 2016 at 7:07 PM, Terry Reedy  wrote:
> On 10/19/2016 12:38 AM, Nathaniel Smith wrote:
>
>> I'd like to propose that Python's iterator protocol be enhanced to add
>> a first-class notion of completion / cleanup.
>
>
> With respect the the standard iterator protocol, a very solid -1 from me.
> (I leave commenting specifically on __aiterclose__ to Yury.)
>
> 1. I consider the introduction of iterables and the new iterator protocol in
> 2.2 and their gradual replacement of lists in many situations to be the
> greatest enhancement to Python since 1.3 (my first version).  They are, to
> me, they one of Python's greatest features and the minimal nature of the
> protocol an essential part of what makes them great.

Minimalism for its own sake isn't really a core Python value, and in
any case the minimalism ship has kinda sailed -- we effectively
already have send/throw/close as optional parts of the protocol
(they're most strongly associated with generators, but you're free to
add them to your own iterators and e.g. yield from will happily work
with that). This proposal is basically "we formalize and start
automatically calling the 'close' methods that are already there".

> 2. I think you greatly underestimate the negative impact, just as we did
> with changing str is bytes to str is unicode.  The change itself, embodied
> in for loops, will break most non-trivial programs.  You yourself note that
> there will have to be pervasive changes in the stdlib just to begin fixing
> the breakage.

The long-ish list of stdlib changes is about enabling the feature
everywhere, not about fixing backwards incompatibilities.

It's an important question though what programs will break and how
badly. To try and get a better handle on it I've been playing a bit
with an instrumented version of CPython that logs whenever the same
iterator is passed to multiple 'for' loops. I'll write up the results
in more detail, but the summary so far is that there seem to be ~8
places in the stdlib that would need preserve() calls added, and ~3 in
django. Maybe 2-3 hours and 1 hour of work respectively to fix?

It's not a perfect measure, and the cost certainly isn't zero, but
it's at a completely different order of magnitude than the str
changes. Among other things, this is a transition that allows for
gradual opt-in via a __future__, and fine-grained warnings pointing
you at what you need to fix, neither of which were possible for
str->unicode.

> 3. Though perhaps common for what you do, the need for the change is
> extremely rare in the overall Python world.  Iterators depending on an
> external resource are rare (< 1%, I would think).  Incomplete iteration is
> also rare (also < 1%, I think).  And resources do not always need to
> releases immediately.

This could equally well be an argument that the change is fine -- e.g.
if you're always doing complete iteration, or just iterating over
lists and stuff, then it literally doesn't affect you at all either
way...

> 4. Previous proposals to officially augment the iterator protocol, even with
> optional methods, have been rejected, and I think this one should be too.
>
> a. Add .__len__ as an option.  We added __length_hint__, which an iterator
> may implement, but which is not part of the iterator protocol. It is also
> ignored by bool().
>
> b., c. Add __bool__ and/or peek().  I posted a LookAhead wrapper class that
> implements both for most any iterable.  I suspect that the is rarely used.
>
>
>>   def read_newline_separated_json(path):
>>   with open(path) as file_handle:  # <-- with block
>>   for line in file_handle:
>>   yield json.loads(line)
>
>
> One problem with passing paths around is that it makes the receiving
> function hard to test.  I think functions should at least optionally take an
> iterable of lines, and make the open part optional.  But then closing should
> also be conditional.

Sure, that's all true, but this is the problem with tiny documentation
examples :-). The point here was to explain the surprising interaction
between generators and with blocks in the simplest way, not to
demonstrate the ideal solution to the problem of reading
newline-separated JSON. Everything you want is still doable in a
post-__iterclose__ world -- in particular, if you do

  for doc in read_newline_separated_json(lines_generator()):
  ...

then both iterators will be closed when the for loop exits. But if you
want to re-use the lines_generator, just write:

  it = lines_generator()
  for doc in read_newline_separated_json(preserve(it)):
  ...
  for more_lines in it:
  ...

> If the combination of 'with', 'for', and 'yield' do not work together, then
> do something else, rather than changing the meaning of 'for'. Moving
> responsibility for closing the file from 'with' to 'for', makes 'with'
> pretty useless, while overloading 'for' with something that is rarely
> needed.  This does not strike me as the right

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-21 Thread Nathaniel Smith

On Wed, Oct 19, 2016 at 3:07 PM, Paul Moore  wrote:
> On 19 October 2016 at 20:21, Nathaniel Smith  wrote:
>> On Wed, Oct 19, 2016 at 11:38 AM, Paul Moore  wrote:
>>> On 19 October 2016 at 19:13, Chris Angelico  wrote:
 Now it *won't* correctly call the end-of-iteration function, because
 there's no 'for' loop. This is going to either (a) require that EVERY
 consumer of an iterator follow this new protocol, or (b) introduce a
 ton of edge cases.
>>>
>>> Also, unless I'm misunderstanding the proposal, there's a fairly major
>>> compatibility break. At present we have:
>>>
>> lst = [1,2,3,4]
>> it = iter(lst)
>> for i in it:
>>> ...   if i == 2: break
>>>
>> for i in it:
>>> ...   print(i)
>>> 3
>>> 4
>>
>>>
>>> With the proposed behaviour, if I understand it, "it" would be closed
>>> after the first loop, so resuming "it" for the second loop wouldn't
>>> work. Am I right in that? I know there's a proposed itertools function
>>> to bring back the old behaviour, but it's still a compatibility break.
>>> And code like this, that partially consumes an iterator, is not
>>> uncommon.
>>
>> Right -- did you reach the "transition plan" section? (I know it's
>> wayyy down there.) The proposal is to hide this behind a __future__ at
>> first + a mechanism during the transition period to catch code that
>> depends on the old behavior and issue deprecation warnings. But it is
>> a compatibility break, yes.
>
> I missed that you propose phasing this in, but it doesn't really alter
> much, I think the current behaviour is valuable and common, and I'm -1
> on breaking it. It's just too much of a fundamental change to how
> loops and iterators interact for me to be comfortable with it -
> particularly as it's only needed for a very specific use case (none of
> my programs ever use async - why should I have to rewrite my loops
> with a clumsy extra call just to cater for a problem that only occurs
> in async code?)
>
> IMO, and I'm sorry if this is controversial, there's a *lot* of new
> language complexity that's been introduced for the async use case, and
> it's only the fact that it can be pretty much ignored by people who
> don't need or use async features that makes it acceptable (the "you
> don't pay for what you don't use" principle). The problem with this
> proposal is that it doesn't conform to that principle - it has a
> direct, negative impact on users who have no interest in async.

Oh, goodness, no -- like Yury said, the use cases here are not
specific to async at all. I mean, none of the examples are async even
:-).

The motivation here is that prompt (non-GC-dependent) cleanup is a
good thing for a variety of reasons: determinism, portability across
Python implementations, proper exception propagation, etc. async does
add yet another entry to this list, but I don't the basic principle is
controversial. 'with' blocks are a whole chunk of extra syntax that
were added to the language just for this use case. In fact 'with'
blocks weren't even needed for the functionality -- we already had
'try/finally', they just weren't ergonomic enough. This use case is so
important that it's had multiple rounds of syntax directed at it
before async/await was even a glimmer in C#'s eye :-).

BUT, currently, 'with' and 'try/finally' have a gap: if you use them
inside a generator (async or not, doesn't matter), then they often
fail at accomplishing their core purpose. Sure, they'll execute their
cleanup code whenever the generator is cleaned up, but there's no
ergonomic way to clean up the generator. Oops. I mean, you *could*
respond by saying "you should never use 'with' or 'try/finally' inside
a generator" and maybe add that as a rule to your style manual and
linter -- and some people in this thread have suggested more-or-less
that -- but that seems like a step backwards. This proposal instead
tries to solve the problem of making 'with'/'try/finally' work and be
ergonomic in general, and it should be evaluated on that basis, not on
the async/await stuff.

The reason I'm emphasizing async generators is that they effect the
timeline, not the motivation:

- PEP 525 actually does add async-only complexity to the language (the
new GC hooks). It doesn't affect non-async users, but it is still
complexity. And it's possible that if we have iterclose, then we don't
need the new GC hooks (though this is still an open discussion :-)).
If this is true, then now is the time to act, while reverting the GC
hooks change is still a possibility; otherwise, we risk the situation
where we add iterclose later, decide that the GC hooks no longer
provide enough additional value to justify their complexity... but
we're stuck with them anyway.

- For synchronous iteration, the need for a transition period means
that the iterclose proposal will take a few years to provide benefits.
For asynchronous iteration, it could potentially start

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Terry Reedy


On 10/19/2016 12:38 AM, Nathaniel Smith wrote:


I'd like to propose that Python's iterator protocol be enhanced to add
a first-class notion of completion / cleanup.


With respect the the standard iterator protocol, a very solid -1 from 
me.  (I leave commenting specifically on __aiterclose__ to Yury.)


1. I consider the introduction of iterables and the new iterator 
protocol in 2.2 and their gradual replacement of lists in many 
situations to be the greatest enhancement to Python since 1.3 (my first 
version).  They are, to me, they one of Python's greatest features and 
the minimal nature of the protocol an essential part of what makes them 
great.


2. I think you greatly underestimate the negative impact, just as we did 
with changing str is bytes to str is unicode.  The change itself, 
embodied in for loops, will break most non-trivial programs.  You 
yourself note that there will have to be pervasive changes in the stdlib 
just to begin fixing the breakage.


3. Though perhaps common for what you do, the need for the change is 
extremely rare in the overall Python world.  Iterators depending on an 
external resource are rare (< 1%, I would think).  Incomplete iteration 
is also rare (also < 1%, I think).  And resources do not always need to 
releases immediately.


4. Previous proposals to officially augment the iterator protocol, even 
with optional methods, have been rejected, and I think this one should 
be too.


a. Add .__len__ as an option.  We added __length_hint__, which an 
iterator may implement, but which is not part of the iterator protocol. 
It is also ignored by bool().


b., c. Add __bool__ and/or peek().  I posted a LookAhead wrapper class 
that implements both for most any iterable.  I suspect that the is 
rarely used.




  def read_newline_separated_json(path):
  with open(path) as file_handle:  # <-- with block
  for line in file_handle:
  yield json.loads(line)


One problem with passing paths around is that it makes the receiving 
function hard to test.  I think functions should at least optionally 
take an iterable of lines, and make the open part optional.  But then 
closing should also be conditional.


If the combination of 'with', 'for', and 'yield' do not work together, 
then do something else, rather than changing the meaning of 'for'. 
Moving responsibility for closing the file from 'with' to 'for', makes 
'with' pretty useless, while overloading 'for' with something that is 
rarely needed.  This does not strike me as the right solution to the 
problem.



  for document in read_newline_separated_json(path):  # <-- outer for loop
  ...


If the outer loop determines when the file should be closed, then why 
not open it there?  What fails with


try:
lines = open(path)
gen = read_newline_separated_json(lines)
for doc in gen: do_something(doc)
finally:
lines.close
# and/or gen.throw(...) to stop the generator.

--
Terry Jan Reedy

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Yury Selivanov




On 2016-10-19 6:07 PM, Paul Moore wrote:

I missed that you propose phasing this in, but it doesn't really alter
much, I think the current behaviour is valuable and common, and I'm -1
on breaking it. It's just too much of a fundamental change to how
loops and iterators interact for me to be comfortable with it -
particularly as it's only needed for a very specific use case (none of
my programs ever use async - why should I have to rewrite my loops
with a clumsy extra call just to cater for a problem that only occurs
in async code?)


If I understand Nathaniel's proposal, fixing 'async for' isn't the only 
motivation.  Moreover, async generators aren't that different from sync 
generators in terms of finalization.


Yury
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Robert Collins

Hey Nathaniel - I like the intent here, but I think perhaps it would
be better if the problem is approached differently.

Seems to me that making *generators* have a special 'you are done now'
interface is special casing, which usually makes things harder to
learn and predict; and that more the net effect is that all loop
constructs will need to learn about that special case, whether looping
over a list, a generator, or whatever.

Generators already have a well defined lifecycle - but as you say its
not defined consistently across Python VM's. The language has no
guarantees about when finalisation will occur :(. The PEP 525 aclose
is a bit awkward itself in this way - but unlike regular generators it
does have a reason, which is that the language doesn't define an event
loop context as a built in thing - so finalisation can't reliably
summon one up.

So rather than adding a special case to finalise objects used in one
particular iteration - which will play havoc with break statements,
can we instead look at making escape analysis a required part of the
compiler: the borrow checker in rust is getting pretty good at
managing a very similar problem :).

I haven't fleshed out exactly what would be entailed, so consider this
a 'what if' and YMMV :).

-Rob


On 19 October 2016 at 17:38, Nathaniel Smith  wrote:
> Hi all,
>
> I'd like to propose that Python's iterator protocol be enhanced to add
> a first-class notion of completion / cleanup.
>
> This is mostly motivated by thinking about the issues around async
> generators and cleanup. Unfortunately even though PEP 525 was accepted
> I found myself unable to stop pondering this, and the more I've
> pondered the more convinced I've become that the GC hooks added in PEP
> 525 are really not enough, and that we'll regret it if we stick with
> them, or at least with them alone :-/. The strategy here is pretty
> different -- it's an attempt to dig down and make a fundamental
> improvement to the language that fixes a number of long-standing rough
> spots, including async generators.
>
> The basic concept is relatively simple: just adding a '__iterclose__'
> method that 'for' loops call upon completion, even if that's via break
> or exception. But, the overall issue is fairly complicated + iterators
> have a large surface area across the language, so the text below is
> pretty long. Mostly I wrote it all out to convince myself that there
> wasn't some weird showstopper lurking somewhere :-). For a first pass
> discussion, it probably makes sense to mainly focus on whether the
> basic concept makes sense? The main rationale is at the top, but the
> details are there too for those who want them.
>
> Also, for *right* now I'm hoping -- probably unreasonably -- to try to
> get the async iterator parts of the proposal in ASAP, ideally for
> 3.6.0 or 3.6.1. (I know this is about the worst timing for a proposal
> like this, which I apologize for -- though async generators are
> provisional in 3.6, so at least in theory changing them is not out of
> the question.) So again, it might make sense to focus especially on
> the async parts, which are a pretty small and self-contained part, and
> treat the rest of the proposal as a longer-term plan provided for
> context. The comparison to PEP 525 GC hooks comes right after the
> initial rationale.
>
> Anyway, I'll be interested to hear what you think!
>
> -n
>
> --
>
> Abstract
> 
>
> We propose to extend the iterator protocol with a new
> ``__(a)iterclose__`` slot, which is called automatically on exit from
> ``(async) for`` loops, regardless of how they exit. This allows for
> convenient, deterministic cleanup of resources held by iterators
> without reliance on the garbage collector. This is especially valuable
> for asynchronous generators.
>
>
> Note on timing
> ==
>
> In practical terms, the proposal here is divided into two separate
> parts: the handling of async iterators, which should ideally be
> implemented ASAP, and the handling of regular iterators, which is a
> larger but more relaxed project that can't start until 3.7 at the
> earliest. But since the changes are closely related, and we probably
> don't want to end up with async iterators and regular iterators
> diverging in the long run, it seems useful to look at them together.
>
>
> Background and motivation
> =
>
> Python iterables often hold resources which require cleanup. For
> example: ``file`` objects need to be closed; the `WSGI spec
> `_ adds a ``close`` method
> on top of the regular iterator protocol and demands that consumers
> call it at the appropriate time (though forgetting to do so is a
> `frequent source of bugs
> `_);
> and PEP 342 (based on PEP 325) extended generator objects to add a
> ``close`` method to allow generators to clean up after themselves.
>
>

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Paul Moore

On 19 October 2016 at 20:21, Nathaniel Smith  wrote:
> On Wed, Oct 19, 2016 at 11:38 AM, Paul Moore  wrote:
>> On 19 October 2016 at 19:13, Chris Angelico  wrote:
>>> Now it *won't* correctly call the end-of-iteration function, because
>>> there's no 'for' loop. This is going to either (a) require that EVERY
>>> consumer of an iterator follow this new protocol, or (b) introduce a
>>> ton of edge cases.
>>
>> Also, unless I'm misunderstanding the proposal, there's a fairly major
>> compatibility break. At present we have:
>>
> lst = [1,2,3,4]
> it = iter(lst)
> for i in it:
>> ...   if i == 2: break
>>
> for i in it:
>> ...   print(i)
>> 3
>> 4
>
>>
>> With the proposed behaviour, if I understand it, "it" would be closed
>> after the first loop, so resuming "it" for the second loop wouldn't
>> work. Am I right in that? I know there's a proposed itertools function
>> to bring back the old behaviour, but it's still a compatibility break.
>> And code like this, that partially consumes an iterator, is not
>> uncommon.
>
> Right -- did you reach the "transition plan" section? (I know it's
> wayyy down there.) The proposal is to hide this behind a __future__ at
> first + a mechanism during the transition period to catch code that
> depends on the old behavior and issue deprecation warnings. But it is
> a compatibility break, yes.

I missed that you propose phasing this in, but it doesn't really alter
much, I think the current behaviour is valuable and common, and I'm -1
on breaking it. It's just too much of a fundamental change to how
loops and iterators interact for me to be comfortable with it -
particularly as it's only needed for a very specific use case (none of
my programs ever use async - why should I have to rewrite my loops
with a clumsy extra call just to cater for a problem that only occurs
in async code?)

IMO, and I'm sorry if this is controversial, there's a *lot* of new
language complexity that's been introduced for the async use case, and
it's only the fact that it can be pretty much ignored by people who
don't need or use async features that makes it acceptable (the "you
don't pay for what you don't use" principle). The problem with this
proposal is that it doesn't conform to that principle - it has a
direct, negative impact on users who have no interest in async.

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Yury Selivanov


Nathaniel,

On 2016-10-19 5:02 PM, Nathaniel Smith wrote:


Hi Yury,

Thanks for the detailed comments! Replies inline below.


NP!



On Wed, Oct 19, 2016 at 8:51 AM, Yury Selivanov  wrote:

I'm -1 on the idea.  Here's why:


1. Python is a very dynamic language with GC and that is one of its
fundamental properties.  This proposal might make GC of iterators more
deterministic, but that is only one case.

For instance, in some places in asyncio source code we have statements like
this: "self = None".  Why?  When an exception occurs and we want to save it
(for instance to log it), it holds a reference to the Traceback object.
Which in turn references frame objects.  Which means that a lot of objects
in those frames will be alive while the exception object is alive.  So in
asyncio we go to great lengths to avoid unnecessary runs of GC, but this is
an exception!  Most of Python code out there today doesn't do this sorts of
tricks.

And this is just one example of how you can have cycles that require a run
of GC.  It is not possible to have deterministic GC in real life Python
applications.  This proposal addresses only *one* use case, leaving 100s of
others unresolved.

Maybe I'm misunderstanding, but I think those 100s of other cases
where you need deterministic cleanup are why 'with' blocks were
invented, and in my experience they work great for that. Once you get
in the habit, it's very easy and idiomatic to attach a 'with' to each
file handle, socket, etc., at the point where you create it. So from
where I stand, it seems like those 100s of unresolved cases actually
are resolved?


Not all code can be written with 'with' statements, see my example with 
'self = None' in asyncio.  Python code can be quite complex, involving 
classes with __del__ that do some cleanups etc. Fundamentally, you 
cannot make GC of such objects deterministic.


IOW I'm not convinced that if we implement your proposal we'll fix 90% 
(or even 30%) of cases where non-deterministic and postponed cleanup is 
harmful.

The problem is that 'with' blocks are great, and generators are great,
but when you put them together into the same language there's this
weird interaction that emerges, where 'with' blocks inside generators
don't really work for their intended purpose unless you're very
careful and willing to write boilerplate.

Adding deterministic cleanup to generators plugs this gap. Beyond
that, I do think it's a nice bonus that other iterables can take
advantage of the feature, but this isn't just a random "hey let's
smush two constructs together to save a line of code" thing --
iteration is special because it's where generator call stacks and
regular call stacks meet.


Yes, I understand that your proposal really improves some things. OTOH 
it undeniably complicates the iteration protocol and requires a long 
period of deprecations, teaching users and library authors new 
semantics, etc.


We only now begin to see Python 3 gaining traction.  I don't want us to 
harm that by introducing another set of things to Python 3 that are 
significantly different from Python 2. DeprecationWarnings/future 
imports don't excite users either.



IMO, while GC-related issues can be annoying to debug sometimes, it's not
worth it to change the behaviour of iteration in Python only to slightly
improve on this.

2. This proposal will make writing iterators significantly harder. Consider
'itertools.chain'.  We will have to rewrite it to add the proposed
__iterclose__ method.  The Chain iterator object will have to track all of
its iterators, call __iterclose__ on them when it's necessary (there are a
few corner cases).  Given that this object is implemented in C, it's quite a
bit of work.  And we'll have a lot of objects to fix.

When you say "make writing iterators significantly harder", is it fair
to say that you're thinking mostly of what I'm calling "iterator
wrappers"? For most day-to-day iterators, it's pretty trivial to
either add a close method or not; the tricky cases are when you're
trying to manage a collection of sub-iterators.


Yes, mainly iterator wrappers.  You'll also will need to educate users 
to refactor (more on that below) their __del__ methods to 
__(a)iterclose__ in 3.6.


itertools.chain is a great challenge / test case here, because I think
it's about as hard as this gets :-). It took me a bit to wrap my head
around, but I think I've got it, and that it's not so bad actually.
Now imagine that being applied throughout the stdlib, plus some of it 
will have to be implemented in C.  I'm not saying it's impossible, I'm 
saying that it will require additional effort for CPython and ecosystem.


[..]



3. This proposal changes the behaviour of 'for' and 'async for' statements
significantly.  To do partial iteration you will have to use a special
builtin function to guard the iterator from being closed.  This is
completely non-obvious to any existing Python user and will be hard to
explain to newcomers.

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Nathaniel Smith

On Wed, Oct 19, 2016 at 1:33 PM, Yury Selivanov  wrote:
> On 2016-10-19 3:33 PM, Nathaniel Smith wrote:
>
>>> lst = [1,2,3,4]
>>> >it = iter(lst)
>>> >for i in it:

 >>...   if i == 2: break
 >>
>>>
>>> >for i in it:

 >>...   print(i)
 >>3
 >>4
>>>
>>> >

 >>
 >>With the proposed behaviour, if I understand it, "it" would be closed
 >>after the first loop, so resuming "it" for the second loop wouldn't
 >>work. Am I right in that? I know there's a proposed itertools function
 >>to bring back the old behaviour, but it's still a compatibility break.
 >>And code like this, that partially consumes an iterator, is not
 >>uncommon.
>>>
>>> >
>>> >Right -- did you reach the "transition plan" section? (I know it's
>>> >wayyy down there.) The proposal is to hide this behind a __future__ at
>>> >first + a mechanism during the transition period to catch code that
>>> >depends on the old behavior and issue deprecation warnings. But it is
>>> >a compatibility break, yes.
>>
>> I should also say, regarding your specific example, I guess it's an
>> open question whether we would want list_iterator.__iterclose__ to
>> actually do anything. It could flip the iterator to a state where it
>> always raises StopIteration, or RuntimeError, or it could just be a
>> no-op that allows iteration to continue normally afterwards.
>
>
> Making 'for' loop to behave differently for built-in containers (i.e. make
> __iterclose__ a no-op for them) will only make this whole thing even more
> confusing.
>
> It has to be consistent: if you partially iterate over *anything* without
> wrapping it with `preserve()`, it should always close the iterator.

You're probably right. My gut is leaning the same way, I'm just
hesitant to commit because I haven't thought about it for long. But I
do stand by the claim that this is probably not *that* important
either way :-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Nathaniel Smith

Hi Yury,

Thanks for the detailed comments! Replies inline below.

On Wed, Oct 19, 2016 at 8:51 AM, Yury Selivanov  wrote:
> I'm -1 on the idea.  Here's why:
>
>
> 1. Python is a very dynamic language with GC and that is one of its
> fundamental properties.  This proposal might make GC of iterators more
> deterministic, but that is only one case.
>
> For instance, in some places in asyncio source code we have statements like
> this: "self = None".  Why?  When an exception occurs and we want to save it
> (for instance to log it), it holds a reference to the Traceback object.
> Which in turn references frame objects.  Which means that a lot of objects
> in those frames will be alive while the exception object is alive.  So in
> asyncio we go to great lengths to avoid unnecessary runs of GC, but this is
> an exception!  Most of Python code out there today doesn't do this sorts of
> tricks.
>
> And this is just one example of how you can have cycles that require a run
> of GC.  It is not possible to have deterministic GC in real life Python
> applications.  This proposal addresses only *one* use case, leaving 100s of
> others unresolved.

Maybe I'm misunderstanding, but I think those 100s of other cases
where you need deterministic cleanup are why 'with' blocks were
invented, and in my experience they work great for that. Once you get
in the habit, it's very easy and idiomatic to attach a 'with' to each
file handle, socket, etc., at the point where you create it. So from
where I stand, it seems like those 100s of unresolved cases actually
are resolved?

The problem is that 'with' blocks are great, and generators are great,
but when you put them together into the same language there's this
weird interaction that emerges, where 'with' blocks inside generators
don't really work for their intended purpose unless you're very
careful and willing to write boilerplate.

Adding deterministic cleanup to generators plugs this gap. Beyond
that, I do think it's a nice bonus that other iterables can take
advantage of the feature, but this isn't just a random "hey let's
smush two constructs together to save a line of code" thing --
iteration is special because it's where generator call stacks and
regular call stacks meet.

> IMO, while GC-related issues can be annoying to debug sometimes, it's not
> worth it to change the behaviour of iteration in Python only to slightly
> improve on this.
>
> 2. This proposal will make writing iterators significantly harder. Consider
> 'itertools.chain'.  We will have to rewrite it to add the proposed
> __iterclose__ method.  The Chain iterator object will have to track all of
> its iterators, call __iterclose__ on them when it's necessary (there are a
> few corner cases).  Given that this object is implemented in C, it's quite a
> bit of work.  And we'll have a lot of objects to fix.

When you say "make writing iterators significantly harder", is it fair
to say that you're thinking mostly of what I'm calling "iterator
wrappers"? For most day-to-day iterators, it's pretty trivial to
either add a close method or not; the tricky cases are when you're
trying to manage a collection of sub-iterators.

itertools.chain is a great challenge / test case here, because I think
it's about as hard as this gets :-). It took me a bit to wrap my head
around, but I think I've got it, and that it's not so bad actually.

Right now, chain's semantics are:

# copied directly from the docs
def chain(*iterables):
for it in iterables:
for element in it:
yield element

In a post-__iterclose__ world, the inner for loop there will already
handle closing each iterators as its finished being consumed, and if
the generator is closed early then the inner for loop will also close
the current iterator. What we need to add is that if the generator is
closed early, we should also close all the unprocessed iterators.

The first change is to replace the outer for loop with a while/pop
loop, so that if an exception occurs we'll know which iterables remain
to be processed:

def chain(*iterables):
try:
while iterables:
for element in iterables.pop(0):
yield element
...

Now, what do we do if an exception does occur? We need to call
iterclose on all of the remaining iterables, but the tricky bit is
that this might itself raise new exceptions. If this happens, we don't
want to abort early; instead, we want to continue until we've closed
all the iterables, and then raise a chained exception. Basically what
we want is:

def chain(*iterables):
try:
while iterables:
for element in iterables.pop(0):
yield element
finally:
try:
operators.iterclose(iter(iterables[0]))
finally:
try:
operators.iterclose(iter(iterables[1]))
finally:
try:
operators.iterclose(iter(iterables[2]))
finally:

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Yury Selivanov

On 2016-10-19 3:33 PM, Nathaniel Smith wrote:

lst = [1,2,3,4]
>it = iter(lst)
>for i in it:

>>...   if i == 2: break
>>

>for i in it:

>>...   print(i)
>>3
>>4

>

>>
>>With the proposed behaviour, if I understand it, "it" would be closed
>>after the first loop, so resuming "it" for the second loop wouldn't
>>work. Am I right in that? I know there's a proposed itertools function
>>to bring back the old behaviour, but it's still a compatibility break.
>>And code like this, that partially consumes an iterator, is not
>>uncommon.

>
>Right -- did you reach the "transition plan" section? (I know it's
>wayyy down there.) The proposal is to hide this behind a __future__ at
>first + a mechanism during the transition period to catch code that
>depends on the old behavior and issue deprecation warnings. But it is
>a compatibility break, yes.

I should also say, regarding your specific example, I guess it's an
open question whether we would want list_iterator.__iterclose__ to
actually do anything. It could flip the iterator to a state where it
always raises StopIteration, or RuntimeError, or it could just be a
no-op that allows iteration to continue normally afterwards.

Making 'for' loop to behave differently for built-in containers (i.e. 
make __iterclose__ a no-op for them) will only make this whole thing 
even more confusing.

It has to be consistent: if you partially iterate over *anything* 
without wrapping it with `preserve()`, it should always close the iterator.

Yury
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Brendan Barnwell


On 2016-10-19 12:21, Nathaniel Smith wrote:

>Also, unless I'm misunderstanding the proposal, there's a fairly major
>compatibility break. At present we have:
>

lst = [1,2,3,4]
it = iter(lst)
for i in it:

>...   if i == 2: break
>

for i in it:

>...   print(i)
>3
>4



>
>With the proposed behaviour, if I understand it, "it" would be closed
>after the first loop, so resuming "it" for the second loop wouldn't
>work. Am I right in that? I know there's a proposed itertools function
>to bring back the old behaviour, but it's still a compatibility break.
>And code like this, that partially consumes an iterator, is not
>uncommon.

>

Right -- did you reach the "transition plan" section? (I know it's
wayyy down there.) The proposal is to hide this behind a __future__ at
first + a mechanism during the transition period to catch code that
depends on the old behavior and issue deprecation warnings. But it is
a compatibility break, yes.


	To me this makes the change too hard to swallow.  Although the issues 
you describe are real, it doesn't seem worth it to me to change the 
entire semantics of for loops just for these cases.  There are lots of 
for loops that are not async and/or do not rely on resource cleanup. 
This will change how all of them work, just to fix something that 
sometimes is a problem for some resource-wrapping iterators.


	Moreover, even when the iterator does wrap a resource, sometimes I want 
to be able to stop and resume iteration.  It's not uncommon, for 
instance, to have code using the csv module that reads some rows, pauses 
to make a decision (e.g., to parse differently depending what header 
columns are present, or skip some number of rows), and then resumes. 
This would increase the burden of updating code to adapt to the new 
breakage (since in this case the programmer would likely have to, or at 
least want to, think about what is going on rather than just blindly 
wrapping everything with protect() ).


--
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no 
path, and leave a trail."

   --author unknown
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Nathaniel Smith

On Wed, Oct 19, 2016 at 12:21 PM, Nathaniel Smith  wrote:
> On Wed, Oct 19, 2016 at 11:38 AM, Paul Moore  wrote:
>> On 19 October 2016 at 19:13, Chris Angelico  wrote:
>>> Now it *won't* correctly call the end-of-iteration function, because
>>> there's no 'for' loop. This is going to either (a) require that EVERY
>>> consumer of an iterator follow this new protocol, or (b) introduce a
>>> ton of edge cases.
>>
>> Also, unless I'm misunderstanding the proposal, there's a fairly major
>> compatibility break. At present we have:
>>
> lst = [1,2,3,4]
> it = iter(lst)
> for i in it:
>> ...   if i == 2: break
>>
> for i in it:
>> ...   print(i)
>> 3
>> 4
>
>>
>> With the proposed behaviour, if I understand it, "it" would be closed
>> after the first loop, so resuming "it" for the second loop wouldn't
>> work. Am I right in that? I know there's a proposed itertools function
>> to bring back the old behaviour, but it's still a compatibility break.
>> And code like this, that partially consumes an iterator, is not
>> uncommon.
>
> Right -- did you reach the "transition plan" section? (I know it's
> wayyy down there.) The proposal is to hide this behind a __future__ at
> first + a mechanism during the transition period to catch code that
> depends on the old behavior and issue deprecation warnings. But it is
> a compatibility break, yes.

I should also say, regarding your specific example, I guess it's an
open question whether we would want list_iterator.__iterclose__ to
actually do anything. It could flip the iterator to a state where it
always raises StopIteration, or RuntimeError, or it could just be a
no-op that allows iteration to continue normally afterwards.
list_iterator doesn't have a close method right now, and it certainly
can't "close" the underlying list (whatever that would even mean), so
I don't think there's a strong expectation that it should do anything
in particular. The __iterclose__ contract is that you're not supposed
to call __next__ afterwards, so there's no real rule about what
happens if you do. And there aren't strong conventions right now about
what happens when you try to iterate an explicitly closed iterator --
files raise an error, generators just act like they were exhausted. So
there's a few options that all seem more-or-less reasonable and I
don't know that it's very important which one we pick.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Nathaniel Smith

On Wed, Oct 19, 2016 at 11:38 AM, Paul Moore  wrote:
> On 19 October 2016 at 19:13, Chris Angelico  wrote:
>> Now it *won't* correctly call the end-of-iteration function, because
>> there's no 'for' loop. This is going to either (a) require that EVERY
>> consumer of an iterator follow this new protocol, or (b) introduce a
>> ton of edge cases.
>
> Also, unless I'm misunderstanding the proposal, there's a fairly major
> compatibility break. At present we have:
>
 lst = [1,2,3,4]
 it = iter(lst)
 for i in it:
> ...   if i == 2: break
>
 for i in it:
> ...   print(i)
> 3
> 4

>
> With the proposed behaviour, if I understand it, "it" would be closed
> after the first loop, so resuming "it" for the second loop wouldn't
> work. Am I right in that? I know there's a proposed itertools function
> to bring back the old behaviour, but it's still a compatibility break.
> And code like this, that partially consumes an iterator, is not
> uncommon.

Right -- did you reach the "transition plan" section? (I know it's
wayyy down there.) The proposal is to hide this behind a __future__ at
first + a mechanism during the transition period to catch code that
depends on the old behavior and issue deprecation warnings. But it is
a compatibility break, yes.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Todd

On Wed, Oct 19, 2016 at 2:38 PM, Paul Moore  wrote:

> On 19 October 2016 at 19:13, Chris Angelico  wrote:
> > Now it *won't* correctly call the end-of-iteration function, because
> > there's no 'for' loop. This is going to either (a) require that EVERY
> > consumer of an iterator follow this new protocol, or (b) introduce a
> > ton of edge cases.
>
> Also, unless I'm misunderstanding the proposal, there's a fairly major
> compatibility break. At present we have:
>
> >>> lst = [1,2,3,4]
> >>> it = iter(lst)
> >>> for i in it:
> ...   if i == 2: break
>
> >>> for i in it:
> ...   print(i)
> 3
> 4
> >>>
>
> With the proposed behaviour, if I understand it, "it" would be closed
> after the first loop, so resuming "it" for the second loop wouldn't
> work. Am I right in that? I know there's a proposed itertools function
> to bring back the old behaviour, but it's still a compatibility break.
> And code like this, that partially consumes an iterator, is not
> uncommon.
>
> Paul
>

I may very well be misunderstanding the purpose of the proposal, but that
is not how I saw it being used.  I thought of it being used to clean up
things that happened in the loop, rather than clean up the iterator
itself.  This would allow the iterator to manage events that occurred in
the body of the loop.  So it would be more like this scenario:

>>> lst = objiterer([obj1, obj2, obj3, obj4])
>>> it = iter(lst)
>>> for i, _ in zip(it, [1, 2]):
...   b = i.some_method()
>>> for i in it:
...   c = i.other_method()
>>>

In this case, objiterer would do some cleanup related to obj1 and obj2 in
the first loop and some cleanup related to obj3 and obj4 in the second
loop.  There would be no backwards-compatibility break, the method would be
purely opt-in and most typical iterators wouldn't need it.

However, in this case perhaps it might be better to have some method that
is called after every loop, no matter how the loop is terminated (break,
continue, return).  This would allow the cleanup to be done every loop
rather than just at the end.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Paul Moore

On 19 October 2016 at 19:13, Chris Angelico  wrote:
> Now it *won't* correctly call the end-of-iteration function, because
> there's no 'for' loop. This is going to either (a) require that EVERY
> consumer of an iterator follow this new protocol, or (b) introduce a
> ton of edge cases.

Also, unless I'm misunderstanding the proposal, there's a fairly major
compatibility break. At present we have:

>>> lst = [1,2,3,4]
>>> it = iter(lst)
>>> for i in it:
...   if i == 2: break

>>> for i in it:
...   print(i)
3
4
>>>

With the proposed behaviour, if I understand it, "it" would be closed
after the first loop, so resuming "it" for the second loop wouldn't
work. Am I right in that? I know there's a proposed itertools function
to bring back the old behaviour, but it's still a compatibility break.
And code like this, that partially consumes an iterator, is not
uncommon.

Paul
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Chris Angelico

On Thu, Oct 20, 2016 at 3:38 AM, Random832  wrote:
> On Wed, Oct 19, 2016, at 11:51, Yury Selivanov wrote:
>> I'm -1 on the idea.  Here's why:
>>
>>
>> 1. Python is a very dynamic language with GC and that is one of its
>> fundamental properties.  This proposal might make GC of iterators more
>> deterministic, but that is only one case.
>
> There is a huge difference between wanting deterministic GC and wanting
> cleanup code to be called deterministically. We're not talking about
> memory usage here.

Currently, iterators get passed around casually - you can build on
them, derive from them, etc, etc, etc. If you change the 'for' loop to
explicitly close an iterator, will you also change 'yield from'? What
about other forms of iteration? Will the iterator be closed when it
runs out normally?

This proposal is to iterators what 'with' is to open files and other
resources. I can build on top of an open file fairly easily:

@contextlib.contextmanager
def file_with_header(fn):
with open(fn, "w") as f:
f.write("Header Row")
yield f

def main():
with file_with_header("asdf") as f:
"""do stuff"""

I create a context manager based on another context manager, and I
have a guarantee that the end of the main() 'with' block is going to
properly close the file. Now, what happens if I do something similar
with an iterator?

def every_second(it):
try:
next(it)
except StopIteration:
return
for value in it:
yield value
try:
next(it)
except StopIteration:
break

This will work, because it's built on a 'for' loop. What if it's built
on a 'while' loop instead?

def every_second_broken(it):
try:
while True:
nextIit)
yield next(it)
except StopIteration:
pass

Now it *won't* correctly call the end-of-iteration function, because
there's no 'for' loop. This is going to either (a) require that EVERY
consumer of an iterator follow this new protocol, or (b) introduce a
ton of edge cases.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Nathaniel Smith

On Wed, Oct 19, 2016 at 10:08 AM, Neil Girdhar  wrote:
>
>
> On Wed, Oct 19, 2016 at 11:08 AM Todd  wrote:
>>
>> On Wed, Oct 19, 2016 at 3:38 AM, Neil Girdhar 
>> wrote:
>>>
>>> This is a very interesting proposal.  I just wanted to share something I
>>> found in my quick search:
>>>
>>>
>>> http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-file-on-stopiteration
>>>
>>> Could you explain why the accepted answer there doesn't address this
>>> issue?
>>>
>>> class Parse(object):
>>> """A generator that iterates through a file"""
>>> def __init__(self, path):
>>> self.path = path
>>>
>>>   def __iter__(self):
>>> with open(self.path) as f:
>>> yield from f

BTW it may make this easier to read if we notice that it's essentially
a verbose way of writing:

def parse(path):
with open(path) as f:
yield from f

>>
>> I think the difference is that this new approach guarantees cleanup the
>> exact moment the loop ends, no matter how it ends.
>>
>> If I understand correctly, your approach will do cleanup when the loop
>> ends only if the iterator is exhausted.  But if someone zips it with a
>> shorter iterator, uses itertools.islice or something similar, breaks the
>> loop, returns inside the loop, or in some other way ends the loop before the
>> iterator is exhausted, the cleanup won't happen when the iterator is garbage
>> collected.  And for non-reference-counting python implementations, when this
>> happens is completely unpredictable.
>>
>> --
>
>
> I don't see that.  The "cleanup" will happen when collection is interrupted
> by an exception.  This has nothing to do with garbage collection either
> since the cleanup happens deterministically when the block is ended.  If
> this is the only example, then I would say this behavior is already provided
> and does not need to be added.

I think there might be a misunderstanding here. Consider code like
this, that breaks out from the middle of the for loop:

def use_that_generator():
for line in parse(...):
if found_the_line_we_want(line):
break
# -- mark --
do_something_with_that_line(line)

With current Python, what will happen is that when we reach the marked
line, then the for loop has finished and will drop its reference to
the generator object. At this point, the garbage collector comes into
play. On CPython, with its reference counting collector, the garbage
collector will immediately collect the generator object, and then the
generator object's __del__ method will restart 'parse' by having the
last 'yield' raise a GeneratorExit, and *that* exception will trigger
the 'with' block's cleanup. But in order to get there, we're
absolutely depending on the garbage collector to inject that
GeneratorExit. And on an implementation like PyPy that doesn't use
reference counting, the generator object will become collect*ible* at
the marked line, but might not actually be collect*ed* for an
arbitrarily long time afterwards. And until it's collected, the file
will remain open. 'with' blocks guarantee that the resources they hold
will be cleaned up promptly when the enclosing stack frame gets
cleaned up, but for a 'with' block inside a generator then you still
need something to guarantee that the enclosing stack frame gets
cleaned up promptly!

This proposal is about providing that thing -- with __(a)iterclose__,
the end of the for loop immediately closes the generator object, so
the garbage collector doesn't need to get involved.

Essentially the same thing happens if we replace the 'break' with a
'raise'. Though with exceptions, things can actually get even messier,
even on CPython. Here's a similar example except that (a) it exits
early due to an exception (which then gets caught elsewhere), and (b)
the invocation of the generator function ended up being kind of long,
so I split the for loop into two lines with a temporary variable:

def use_that_generator2():
it = parse("/a/really/really/really/really/really/really/really/long/path")
for line in it:
if not valid_format(line):
raise ValueError()

def catch_the_exception():
try:
use_that_generator2()
except ValueError:
# -- mark --
...

Here the ValueError() is raised from use_that_generator2(), and then
caught in catch_the_exception(). At the marked line,
use_that_generator2's stack frame is still pinned in memory by the
exception's traceback. And that means that all the local variables are
also pinned in memory, including our temporary 'it'. Which means that
parse's stack frame is also pinned in memory, and the file is not
closed.

With the __(a)iterclose__ proposal, when the exception is thrown then
the 'for' loop in use_that_generator2() immediately closes the
generator object, which in turn triggers parse's 'with' block, and
that closes the file handle. And then after the file handle is

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Neil Girdhar

On Wed, Oct 19, 2016 at 11:08 AM Todd  wrote:

> On Wed, Oct 19, 2016 at 3:38 AM, Neil Girdhar 
> wrote:
>
> This is a very interesting proposal.  I just wanted to share something I
> found in my quick search:
>
>
> http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-file-on-stopiteration
>
> Could you explain why the accepted answer there doesn't address this issue?
>
> class Parse(object):
> """A generator that iterates through a file"""
> def __init__(self, path):
> self.path = path
>
>   def __iter__(self):
> with open(self.path) as f:
> yield from f
>
>
> Best,
>
> Neil
>
>
> I think the difference is that this new approach guarantees cleanup the
> exact moment the loop ends, no matter how it ends.
>
> If I understand correctly, your approach will do cleanup when the loop
> ends only if the iterator is exhausted.  But if someone zips it with a
> shorter iterator, uses itertools.islice or something similar, breaks the
> loop, returns inside the loop, or in some other way ends the loop before
> the iterator is exhausted, the cleanup won't happen when the iterator is
> garbage collected.  And for non-reference-counting python implementations,
> when this happens is completely unpredictable.
>
> --
>

I don't see that.  The "cleanup" will happen when collection is interrupted
by an exception.  This has nothing to do with garbage collection either
since the cleanup happens deterministically when the block is ended.  If
this is the only example, then I would say this behavior is already
provided and does not need to be added.


>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/5xdf0WF1WyY/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "python-ideas" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/python-ideas/5xdf0WF1WyY/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> python-ideas+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Yury Selivanov


On 2016-10-19 12:38 PM, Random832 wrote:

On Wed, Oct 19, 2016, at 11:51, Yury Selivanov wrote:

I'm -1 on the idea.  Here's why:


1. Python is a very dynamic language with GC and that is one of its
fundamental properties.  This proposal might make GC of iterators more
deterministic, but that is only one case.

There is a huge difference between wanting deterministic GC and wanting
cleanup code to be called deterministically. We're not talking about
memory usage here.



I understand, but both topics are closely tied together.  Cleanup code 
can be implemented in some __del__ method of some non-iterator object.  
This proposal doesn't address such cases, it focuses only on iterators.


My point is that it's not worth it to *significantly* change iteration 
(protocols and statements) in Python to only *partially* address the issue.


Yury
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Yury Selivanov


I'm -1 on the idea.  Here's why:


1. Python is a very dynamic language with GC and that is one of its 
fundamental properties.  This proposal might make GC of iterators more 
deterministic, but that is only one case.


For instance, in some places in asyncio source code we have statements 
like this: "self = None".  Why?  When an exception occurs and we want to 
save it (for instance to log it), it holds a reference to the Traceback 
object.  Which in turn references frame objects.  Which means that a lot 
of objects in those frames will be alive while the exception object is 
alive.  So in asyncio we go to great lengths to avoid unnecessary runs 
of GC, but this is an exception!  Most of Python code out there today 
doesn't do this sorts of tricks.


And this is just one example of how you can have cycles that require a 
run of GC.  It is not possible to have deterministic GC in real life 
Python applications.  This proposal addresses only *one* use case, 
leaving 100s of others unresolved.


IMO, while GC-related issues can be annoying to debug sometimes, it's 
not worth it to change the behaviour of iteration in Python only to 
slightly improve on this.



2. This proposal will make writing iterators significantly harder. 
Consider 'itertools.chain'.  We will have to rewrite it to add the 
proposed __iterclose__ method.  The Chain iterator object will have to 
track all of its iterators, call __iterclose__ on them when it's 
necessary (there are a few corner cases).  Given that this object is 
implemented in C, it's quite a bit of work.  And we'll have a lot of 
objects to fix.


We can probably update all iterators in standard library (in 3.7), but 
what about third-party code?  It will take many years until you can say 
with certainty that most of Python code supports __iterclose__ / 
__aiterclose__.



3. This proposal changes the behaviour of 'for' and 'async for' 
statements significantly.  To do partial iteration you will have to use 
a special builtin function to guard the iterator from being closed.  
This is completely non-obvious to any existing Python user and will be 
hard to explain to newcomers.



4. This proposal only addresses iteration with 'for' and 'async for' 
statements.  If you iterate using a 'while' loop and 'next()' function, 
this proposal wouldn't help you.  Also see the point #2 about 
third-party code.



5. Asynchronous generators (AG) introduced by PEP 525 are finalized in a 
very similar fashion to synchronous generators.  There is an API to help 
Python to call event loop to finalize AGs.  asyncio in 3.6 (and other 
event loops in the near future) already uses this API to ensure that 
*all AGs in a long-running program are properly finalized* while it is 
being run.


There is an extra loop method (`loop.shutdown_asyncgens`) that should be 
called right before stopping the loop (exiting the program) to make sure 
that all AGs are finalized, but if you forget to call it the world won't 
end.  The process will end and the interpreter will shutdown, maybe 
issuing a couple of ResourceWarnings.


No exception will pass silently in the current PEP 525 implementation.  
And if some AG isn't properly finalized a warning will be issued.


The current AG finalization mechanism must stay even if this proposal 
gets accepted, as it ensures that even manually iterated AGs are 
properly finalized.



6. If this proposal gets accepted, I think we shouldn't introduce it in 
any form in 3.6.  It's too late to implement it for both sync- and 
async-generators.  Implementing it only for async-generators will only 
add cognitive overhead.  Even implementing this only for 
async-generators will (and should!) delay 3.6 release significantly.



7. To conclude: I'm not convinced that this proposal fully solves the 
issue of non-deterministic GC of iterators.  It cripples iteration 
protocols to partially solve the problem for 'for' and 'async for' 
statements, leaving manual iteration unresolved.  It will make it harder 
to write *correct* (async-) iterators.  It introduces some *implicit* 
context management to 'for' and 'async for' statements -- something that 
IMO should be done by user with an explicit 'with' or 'async with'.



Yury
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Todd

On Wed, Oct 19, 2016 at 3:38 AM, Neil Girdhar  wrote:

> This is a very interesting proposal.  I just wanted to share something I
> found in my quick search:
>
> http://stackoverflow.com/questions/14797930/python-
> custom-iterator-close-a-file-on-stopiteration
>
> Could you explain why the accepted answer there doesn't address this issue?
>
> class Parse(object):
> """A generator that iterates through a file"""
> def __init__(self, path):
> self.path = path
>
>   def __iter__(self):
> with open(self.path) as f:
> yield from f
>
>
> Best,
>
> Neil
>
>
I think the difference is that this new approach guarantees cleanup the
exact moment the loop ends, no matter how it ends.

If I understand correctly, your approach will do cleanup when the loop ends
only if the iterator is exhausted.  But if someone zips it with a shorter
iterator, uses itertools.islice or something similar, breaks the loop,
returns inside the loop, or in some other way ends the loop before the
iterator is exhausted, the cleanup won't happen when the iterator is
garbage collected.  And for non-reference-counting python implementations,
when this happens is completely unpredictable.
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Neil Girdhar

This is a very interesting proposal.  I just wanted to share something I 
found in my quick search:

http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-file-on-stopiteration

Could you explain why the accepted answer there doesn't address this issue?

class Parse(object):
"""A generator that iterates through a file"""
def __init__(self, path):
self.path = path

  def __iter__(self):
with open(self.path) as f:
yield from f


Best,

Neil

On Wednesday, October 19, 2016 at 12:39:34 AM UTC-4, Nathaniel Smith wrote:
>
> Hi all, 
>
> I'd like to propose that Python's iterator protocol be enhanced to add 
> a first-class notion of completion / cleanup. 
>
> This is mostly motivated by thinking about the issues around async 
> generators and cleanup. Unfortunately even though PEP 525 was accepted 
> I found myself unable to stop pondering this, and the more I've 
> pondered the more convinced I've become that the GC hooks added in PEP 
> 525 are really not enough, and that we'll regret it if we stick with 
> them, or at least with them alone :-/. The strategy here is pretty 
> different -- it's an attempt to dig down and make a fundamental 
> improvement to the language that fixes a number of long-standing rough 
> spots, including async generators. 
>
> The basic concept is relatively simple: just adding a '__iterclose__' 
> method that 'for' loops call upon completion, even if that's via break 
> or exception. But, the overall issue is fairly complicated + iterators 
> have a large surface area across the language, so the text below is 
> pretty long. Mostly I wrote it all out to convince myself that there 
> wasn't some weird showstopper lurking somewhere :-). For a first pass 
> discussion, it probably makes sense to mainly focus on whether the 
> basic concept makes sense? The main rationale is at the top, but the 
> details are there too for those who want them. 
>
> Also, for *right* now I'm hoping -- probably unreasonably -- to try to 
> get the async iterator parts of the proposal in ASAP, ideally for 
> 3.6.0 or 3.6.1. (I know this is about the worst timing for a proposal 
> like this, which I apologize for -- though async generators are 
> provisional in 3.6, so at least in theory changing them is not out of 
> the question.) So again, it might make sense to focus especially on 
> the async parts, which are a pretty small and self-contained part, and 
> treat the rest of the proposal as a longer-term plan provided for 
> context. The comparison to PEP 525 GC hooks comes right after the 
> initial rationale. 
>
> Anyway, I'll be interested to hear what you think! 
>
> -n 
>
> -- 
>
> Abstract 
>  
>
> We propose to extend the iterator protocol with a new 
> ``__(a)iterclose__`` slot, which is called automatically on exit from 
> ``(async) for`` loops, regardless of how they exit. This allows for 
> convenient, deterministic cleanup of resources held by iterators 
> without reliance on the garbage collector. This is especially valuable 
> for asynchronous generators. 
>
>
> Note on timing 
> == 
>
> In practical terms, the proposal here is divided into two separate 
> parts: the handling of async iterators, which should ideally be 
> implemented ASAP, and the handling of regular iterators, which is a 
> larger but more relaxed project that can't start until 3.7 at the 
> earliest. But since the changes are closely related, and we probably 
> don't want to end up with async iterators and regular iterators 
> diverging in the long run, it seems useful to look at them together. 
>
>
> Background and motivation 
> = 
>
> Python iterables often hold resources which require cleanup. For 
> example: ``file`` objects need to be closed; the `WSGI spec 
> `_ adds a ``close`` method 
> on top of the regular iterator protocol and demands that consumers 
> call it at the appropriate time (though forgetting to do so is a 
> `frequent source of bugs 
> `_); 
>
> and PEP 342 (based on PEP 325) extended generator objects to add a 
> ``close`` method to allow generators to clean up after themselves. 
>
> Generally, objects that need to clean up after themselves also define 
> a ``__del__`` method to ensure that this cleanup will happen 
> eventually, when the object is garbage collected. However, relying on 
> the garbage collector for cleanup like this causes serious problems in 
> at least two cases: 
>
> - In Python implementations that do not use reference counting (e.g. 
> PyPy, Jython), calls to ``__del__`` may be arbitrarily delayed -- yet 
> many situations require *prompt* cleanup of resources. Delayed cleanup 
> produces problems like crashes due to file descriptor exhaustion, or 
> WSGI timing middleware that collects bogus times. 
>
> - Async generators (PEP 525) can only perform

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Oscar Benjamin

On 19 October 2016 at 12:33, Oscar Benjamin  wrote:
>
>> New convenience functions
>> -
>>
>> The ``itertools`` module gains a new iterator wrapper that can be used
>> to selectively disable the new ``__iterclose__`` behavior::
>>
>>   # XX FIXME: I feel like there might be a better name for this one?
>>   class protect(iterable):
>>   def __init__(self, iterable):
>>   self._it = iter(iterable)
>>
>>   def __iter__(self):
>>   return self
>>
>>   def __next__(self):
>>   return next(self._it)
>>
>>   def __iterclose__(self):
>>   # Swallow __iterclose__ without passing it on
>>   pass
>>
>> Example usage (assuming that file objects implements ``__iterclose__``)::
>>
>>   with open(...) as handle:
>>   # Iterate through the same file twice:
>>   for line in itertools.protect(handle):
>>   ...
>>   handle.seek(0)
>>   for line in itertools.protect(handle):
>>   ...
>
> It would be much simpler to reverse this suggestion and say let's
> introduce a helper that selectively *enables* the new behaviour you're
> proposing i.e.:
>
> for line in itertools.closeafter(open(...)):
> ...
> if not line.startswith('#'):
> break  # <--- file gets closed here

Looking more closely at this I realise that there is no way to
implement closeafter like this without depending on closeafter.__del__
to do the closing. So actually this is not a solution to the problem
at all. Sorry for the noise there!

-- 
Oscar
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Oscar Benjamin

On 17 October 2016 at 09:08, Nathaniel Smith  wrote:
> Hi all,

Hi Nathaniel. I'm just reposting what I wrote on pypy-dev (as
requested) but under the assumption that you didn't substantially
alter your draft - I apologise if some of the quoted text below has
already been edited.

> Always inject resources, and do all cleanup at the top level
> 
>
> It was suggested on python-dev (XX find link) that a pattern to avoid
> these problems is to always pass resources in from above, e.g.
> ``read_newline_separated_json`` should take a file object rather than
> a path, with cleanup handled at the top level::

I suggested this and I still think that it is the best idea.

>   def read_newline_separated_json(file_handle):
>   for line in file_handle:
>   yield json.loads(line)
>
>   def read_users(file_handle):
>   for document in read_newline_separated_json(file_handle):
>   yield User.from_json(document)
>
>   with open(path) as file_handle:
>   for user in read_users(file_handle):
>   ...
>
> This works well in simple cases; here it lets us avoid the "N+1
> problem". But unfortunately, it breaks down quickly when things get
> more complex. Consider if instead of reading from a file, our
> generator was processing the body returned by an HTTP GET request --
> while handling redirects and authentication via OAUTH. Then we'd
> really want the sockets to be managed down inside our HTTP client
> library, not at the top level. Plus there are other cases where
> ``finally`` blocks embedded inside generators are important in their
> own right: db transaction management, emitting logging information
> during cleanup (one of the major motivating use cases for WSGI
> ``close``), and so forth.

I haven't written the kind of code that you're describing so I can't
say exactly how I would do it. I imagine though that helpers could be
used to solve some of the problems that you're referring to though.
Here's a case I do know where the above suggestion is awkward:

def concat(filenames):
for filename in filenames:
with open(filename) as inputfile:
yield from inputfile

for line in concat(filenames):
...

It's still possible to safely handle this use case by creating a
helper though. fileinput.input almost does what you want:

with fileinput.input(filenames) as lines:
for line in lines:
...

Unfortunately if filenames is empty this will default to sys.stdin so
it's not perfect but really I think introducing useful helpers for
common cases (rather than core language changes) should be considered
as the obvious solution here. Generally it would have been better if
the discussion for PEP 525 has focussed more on helping people to
debug/fix dependence on __del__ rather than trying to magically fix
broken code.

> New convenience functions
> -
>
> The ``itertools`` module gains a new iterator wrapper that can be used
> to selectively disable the new ``__iterclose__`` behavior::
>
>   # XX FIXME: I feel like there might be a better name for this one?
>   class protect(iterable):
>   def __init__(self, iterable):
>   self._it = iter(iterable)
>
>   def __iter__(self):
>   return self
>
>   def __next__(self):
>   return next(self._it)
>
>   def __iterclose__(self):
>   # Swallow __iterclose__ without passing it on
>   pass
>
> Example usage (assuming that file objects implements ``__iterclose__``)::
>
>   with open(...) as handle:
>   # Iterate through the same file twice:
>   for line in itertools.protect(handle):
>   ...
>   handle.seek(0)
>   for line in itertools.protect(handle):
>   ...

It would be much simpler to reverse this suggestion and say let's
introduce a helper that selectively *enables* the new behaviour you're
proposing i.e.:

for line in itertools.closeafter(open(...)):
...
if not line.startswith('#'):
break  # <--- file gets closed here

Then we can leave (async) for loops as they are and there are no
backward compatbility problems etc.

-- 
Oscar
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Deterministic iterator cleanup

2016-10-19 Thread Vincent Michel


Thanks Nathaniel for this great proposal.

As I went through your mail, I realized all the comments I wanted to 
make were already covered in later paragraphs. And I don't think there's 
a single point I disagree with.


I don't have a strong opinion about the synchronous part of the 
proposal. I actually wouldn't mind the disparity between asynchronous 
and synchronous iterators if '__aiterclose__' were to be accepted and 
'__iterclose__' rejected.


However, I would like very much to see the asynchronous part happening 
in python 3.6. I can add another example for the reference: aioreactive 
(a fresh implementation of Rx for asyncio) is planning to handle 
subscriptions to a producer using a context manager:


https://github.com/dbrattli/aioreactive#subscriptions-are-async-iterables

async with listen(xs) as ys:
async for x in ys:
do_something(x)

Like the proposal points out, this happens in the *user* code. With 
'__aiterclose__', the former example could be simplified as:


async for x in listen(xs):
do_something(x)

Or even better:

async for x in xs:
do_something(x)


Cheers,
/Vincent


On 10/19/2016 06:38 AM, Nathaniel Smith wrote:

Hi all,

I'd like to propose that Python's iterator protocol be enhanced to add
a first-class notion of completion / cleanup.

This is mostly motivated by thinking about the issues around async
generators and cleanup. Unfortunately even though PEP 525 was accepted
I found myself unable to stop pondering this, and the more I've
pondered the more convinced I've become that the GC hooks added in PEP
525 are really not enough, and that we'll regret it if we stick with
them, or at least with them alone :-/. The strategy here is pretty
different -- it's an attempt to dig down and make a fundamental
improvement to the language that fixes a number of long-standing rough
spots, including async generators.

The basic concept is relatively simple: just adding a '__iterclose__'
method that 'for' loops call upon completion, even if that's via break
or exception. But, the overall issue is fairly complicated + iterators
have a large surface area across the language, so the text below is
pretty long. Mostly I wrote it all out to convince myself that there
wasn't some weird showstopper lurking somewhere :-). For a first pass
discussion, it probably makes sense to mainly focus on whether the
basic concept makes sense? The main rationale is at the top, but the
details are there too for those who want them.

Also, for *right* now I'm hoping -- probably unreasonably -- to try to
get the async iterator parts of the proposal in ASAP, ideally for
3.6.0 or 3.6.1. (I know this is about the worst timing for a proposal
like this, which I apologize for -- though async generators are
provisional in 3.6, so at least in theory changing them is not out of
the question.) So again, it might make sense to focus especially on
the async parts, which are a pretty small and self-contained part, and
treat the rest of the proposal as a longer-term plan provided for
context. The comparison to PEP 525 GC hooks comes right after the
initial rationale.

Anyway, I'll be interested to hear what you think!

-n

--

Abstract


We propose to extend the iterator protocol with a new
``__(a)iterclose__`` slot, which is called automatically on exit from
``(async) for`` loops, regardless of how they exit. This allows for
convenient, deterministic cleanup of resources held by iterators
without reliance on the garbage collector. This is especially valuable
for asynchronous generators.


Note on timing
==

In practical terms, the proposal here is divided into two separate
parts: the handling of async iterators, which should ideally be
implemented ASAP, and the handling of regular iterators, which is a
larger but more relaxed project that can't start until 3.7 at the
earliest. But since the changes are closely related, and we probably
don't want to end up with async iterators and regular iterators
diverging in the long run, it seems useful to look at them together.


Background and motivation
=

Python iterables often hold resources which require cleanup. For
example: ``file`` objects need to be closed; the `WSGI spec
`_ adds a ``close`` method
on top of the regular iterator protocol and demands that consumers
call it at the appropriate time (though forgetting to do so is a
`frequent source of bugs
`_);
and PEP 342 (based on PEP 325) extended generator objects to add a
``close`` method to allow generators to clean up after themselves.

Generally, objects that need to clean up after themselves also define
a ``__del__`` method to ensure that this cleanup will happen
eventually, when the object is garbage collected. However, relying on
the garbage collector for cleanup like this causes serious problems in
at least two cases:

- In Python

45 matches

Mail list logo