On Wed, Oct 19, 2016 at 2:11 PM Nathaniel Smith <n...@pobox.com> wrote:
> On Wed, Oct 19, 2016 at 10:08 AM, Neil Girdhar <mistersh...@gmail.com> > wrote: > > > > > > On Wed, Oct 19, 2016 at 11:08 AM Todd <toddr...@gmail.com> wrote: > >> > >> On Wed, Oct 19, 2016 at 3:38 AM, Neil Girdhar <mistersh...@gmail.com> > >> wrote: > >>> > >>> This is a very interesting proposal. I just wanted to share something > I > >>> found in my quick search: > >>> > >>> > >>> > http://stackoverflow.com/questions/14797930/python-custom-iterator-close-a-file-on-stopiteration > >>> > >>> Could you explain why the accepted answer there doesn't address this > >>> issue? > >>> > >>> class Parse(object): > >>> """A generator that iterates through a file""" > >>> def __init__(self, path): > >>> self.path = path > >>> > >>> def __iter__(self): > >>> with open(self.path) as f: > >>> yield from f > > BTW it may make this easier to read if we notice that it's essentially > a verbose way of writing: > > def parse(path): > with open(path) as f: > yield from f > > >> > >> I think the difference is that this new approach guarantees cleanup the > >> exact moment the loop ends, no matter how it ends. > >> > >> If I understand correctly, your approach will do cleanup when the loop > >> ends only if the iterator is exhausted. But if someone zips it with a > >> shorter iterator, uses itertools.islice or something similar, breaks the > >> loop, returns inside the loop, or in some other way ends the loop > before the > >> iterator is exhausted, the cleanup won't happen when the iterator is > garbage > >> collected. And for non-reference-counting python implementations, when > this > >> happens is completely unpredictable. > >> > >> -- > > > > > > I don't see that. The "cleanup" will happen when collection is > interrupted > > by an exception. This has nothing to do with garbage collection either > > since the cleanup happens deterministically when the block is ended. If > > this is the only example, then I would say this behavior is already > provided > > and does not need to be added. > > I think there might be a misunderstanding here. Consider code like > this, that breaks out from the middle of the for loop: > > def use_that_generator(): > for line in parse(...): > if found_the_line_we_want(line): > break > # -- mark -- > do_something_with_that_line(line) > > With current Python, what will happen is that when we reach the marked > line, then the for loop has finished and will drop its reference to > the generator object. At this point, the garbage collector comes into > play. On CPython, with its reference counting collector, the garbage > collector will immediately collect the generator object, and then the > generator object's __del__ method will restart 'parse' by having the > last 'yield' raise a GeneratorExit, and *that* exception will trigger > the 'with' block's cleanup. But in order to get there, we're > absolutely depending on the garbage collector to inject that > GeneratorExit. And on an implementation like PyPy that doesn't use > reference counting, the generator object will become collect*ible* at > the marked line, but might not actually be collect*ed* for an > arbitrarily long time afterwards. And until it's collected, the file > will remain open. 'with' blocks guarantee that the resources they hold > will be cleaned up promptly when the enclosing stack frame gets > cleaned up, but for a 'with' block inside a generator then you still > need something to guarantee that the enclosing stack frame gets > cleaned up promptly! > Yes, I understand that. Maybe this is clearer. This class adds an iterclose to any iterator so that when iteration ends, iterclose is automatically called: def my_iterclose(): print("Closing!") class AddIterclose: def __init__(self, iterable, iterclose): self.iterable = iterable self.iterclose = iterclose def __iter__(self): try: for x in self.iterable: yield x finally: self.iterclose() try: for x in AddIterclose(range(10), my_iterclose): print(x) if x == 5: raise ValueError except: pass > > This proposal is about providing that thing -- with __(a)iterclose__, > the end of the for loop immediately closes the generator object, so > the garbage collector doesn't need to get involved. > > Essentially the same thing happens if we replace the 'break' with a > 'raise'. Though with exceptions, things can actually get even messier, > even on CPython. Here's a similar example except that (a) it exits > early due to an exception (which then gets caught elsewhere), and (b) > the invocation of the generator function ended up being kind of long, > so I split the for loop into two lines with a temporary variable: > > def use_that_generator2(): > it = > parse("/a/really/really/really/really/really/really/really/long/path") > for line in it: > if not valid_format(line): > raise ValueError() > > def catch_the_exception(): > try: > use_that_generator2() > except ValueError: > # -- mark -- > ... > > Here the ValueError() is raised from use_that_generator2(), and then > caught in catch_the_exception(). At the marked line, > use_that_generator2's stack frame is still pinned in memory by the > exception's traceback. And that means that all the local variables are > also pinned in memory, including our temporary 'it'. Which means that > parse's stack frame is also pinned in memory, and the file is not > closed. > > With the __(a)iterclose__ proposal, when the exception is thrown then > the 'for' loop in use_that_generator2() immediately closes the > generator object, which in turn triggers parse's 'with' block, and > that closes the file handle. And then after the file handle is closed, > the exception continues propagating. So at the marked line, it's still > the case that 'it' will be pinned in memory, but now 'it' is a closed > generator object that has already relinquished its resources. > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org >
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/