On Wed, Aug 8, 2018 at 10:32 PM, Oscar Benjamin <oscar.j.benja...@gmail.com> wrote: > Without the context manager you could write: > > def read_multiple(*filenames): > for filename in filenames: > f = open(filename) > yield f.read() > f.close() > > Which also only leaks one file descriptor. The point of the with > statement is that this was considered unsatisfactory but when you > yield from a with statement the advantage of with is lost.
What if you remove yield from the picture? def read_multiple(callback, *filenames): for filename in filenames: f = open(filename) callback(f.read()) f.close() This is a classic approach as done in many languages that lack generators. The problem isn't that one file handle is being leaked until the next iteration; it's that an exception could permanently leak that file. Also of note is the "write-then-read" approach: def writer(fn): f = open(fn, "w") f.write(...) def reader(fn): f = open(fn) return f.read() writer("foo"); reader("foo") Without a guarantee that the file is closed *immediately* on losing its last reference, this is risky, because the write handle might still be open when the read handle is opened. The context manager guarantees both of these situations, and that doesn't change when a yield point is inserted, same as it doesn't change when a callback is inserted. The generator mechanism inverts the call stack a bit, but it isn't fundamentally different; at some point in read_multiple, it waits for some other code to do its stuff, then it continues. > In fact if you're happy depending on garbage collection for cleanup > then you can write this more conveniently: > > def read_multiple(*filenames): > for filename in filenames: > yield open(filename).read() And that's the whole point: you cannot be happy with garbage collection cleanup, because a non-refcounting Python implementation is not guaranteed to clean these up promptly. > In any case saying that you only leak one file descriptor misses the > point. The with statement can do many different things and it's > purpose is to guarantee *without depending on garbage collection* that > the __exit__ method will be called. Yielding from the with statement > circumvents that guarantee. Technically, it guarantees that __exit__ will be called before the next line of code after the 'with' block. It cannot actually guarantee that it will be called: with open(filename) as f: while True: pass print("Won't happen") print("Won't happen either") All that's guaranteed is that the second print call will not happen *before* the file is closed. > Actually the general way to solve this problem is to move all context > managers out of iterators/generators. Instead of a helper generator > what you really want in this situation is a helper context manager. I > think the reason this isn't always used is just because it's a bit > harder to write a context manager than a generator e.g.: > > from contextlib import contextmanager > > @contextmanager > def open_cat_split(*filenames): > f = None > def get_words(): > nonlocal f > for filename in filenames: > with open(filename) as f: > for line in f: > yield from line.split() > else: > f = None > try: > yield get_words() > finally: > if f is not None: > f.close() > > Then you can do: > > with open_cat_split('file1.txt', 'file2.txt') as words: > for word in words: > print(word) > So... you still have a yield (in this case a "yield from") inside your 'with' block, but you've wrapped it up in an extra layer, and done a manual try/finally. How is this different? Looking only at the lower half of the function, you have this: try: yield # <== the function stops here finally: # <== the function cleans up here This is exactly the same as the context manager situation. The finally block cannot be executed until the function resumes. What you have accomplished here is a level of nesting, and you can achieve that far more simply by just closing the context manager itself. Going right back to the original "problem code": def read_multiple(*filenames): for filename in filenames: with open(filename) as f: yield f.read() for contents in read_multiple('chunk1', 'chunk2', 'chunk3'): if contents == 'hello': break We can keep the generator completely unchanged, and add a context manager to the loop, since we're not going to pump it completely: with contextlib.closing(read_multiple('chunk1', 'chunk2', 'chunk3')) as stuff: for contents in stuff: if contents == 'hello': break Now, when you exit the loop, the generator will be closed, which uses normal exception handling to clean up everything inside that function. The files are guaranteed to be closed as part of the closing of the generator. > Perhaps actually what is wanted is a more generic contextlib helper. I > guess ExitStack is intended for this sort of thing but it's a bit > unwieldy to use: > https://docs.python.org/3.7/library/contextlib.html#contextlib.ExitStack > > We can use ExitStack to make something more convenient though: > > from contextlib import ExitStack, contextmanager > > @contextmanager > def map_with(func, iterable): > stack = ExitStack() > def g(): > for arg in iterable: > stack.close() > iterable_cm = func(arg) > stack.push(iterable_cm) > yield iterable_cm > with stack: > yield g() > > The you can do something like: > > def cat_split(files): > for f in files: > for line in f: > yield from line.split() > > with map_with(open, ['file1.txt', 'file2.txt']) as files: > for word in cat_split(files): > print(word) Yeah, I think what you have here is close to what you want, but contextlib.closing() can cover it here. The key thing to remember is that generators can be closed. Use that functionality. It'll save you a lot of hassle :) ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/