[issue29842] Make Executor.map work with infinite/large inputs correctly
Changes by Klamann <sebastian-str...@gmx.net>: -- nosy: +Klamann ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue29842> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30323] concurrent.futures.Executor.map() consumes all memory when big generators are used
Klamann added the comment: Thanks for pointing this out. *closed* -- resolution: -> duplicate stage: -> resolved status: open -> closed ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30323> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30323] concurrent.futures.Executor.map() consumes all memory when big generators are used
Klamann added the comment: Yes, I was wrong in my assumption that simply replacing the list comprehension with a generator expression would fix the issue. Nevertheless, there is no need to load the *entire* generator into memory by converting it to a list. All we have to read are the first n elements, where n is the number of workers that are currently available. I've implemented an alternative solution that works for me, using wait() and notify() from threading.Condition, but I'm not quite sure if this would be the best solution for everyone. But I could post it here, if you're intrested. We should also consider that not strictly evaluating every iterable that is passed to the map() function might break existing code that implicitly relies on that fact that this is happening (although this is not a documented feature of the map function and was probably not the intended behaviour in the first place). -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30323> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30323] concurrent.futures.Executor.map() consumes all memory when big generators are used
New submission from Klamann: The Executor's map() function accepts a function and an iterable that holds the function arguments for each call to the function that should be made. This iterable could be a generator, and as such it could reference data that won't fit into memory. The behaviour I would expect is that the Executor requests the next element from the iterable whenever a thread, process or whatever is ready to make the next function call. But what actually happens is that the entire iterable gets converted into a list right after the map function is called and therefore any underlying generator will load all referenced data into memory. Here's where the list gets built from the iterable: https://github.com/python/cpython/blob/3.6/Lib/concurrent/futures/_base.py#L548 The way I see it, there's no reason to convert the iterable to a list in the map function (or any other place in the Executor). Just replacing the list comprehension with a generator expression would probably fix that. Here's an example that illustrates the issue: from concurrent.futures import ThreadPoolExecutor import time def generate(): for i in range(10): print("generating input", i) yield i def work(i): print("working on input", i) time.sleep(1) with ThreadPoolExecutor(max_workers=2) as executor: generator = generate() executor.map(work, generator) The output is: generating input 0 working on input 0 generating input 1 working on input 1 generating input 2 generating input 3 generating input 4 generating input 5 generating input 6 generating input 7 generating input 8 generating input 9 working on input 2 working on input 3 working on input 4 working on input 5 working on input 6 working on input 7 working on input 8 working on input 9 Ideally, the lines should alternate, but currently all input is generated immediately. -- messages: 293353 nosy: Klamann priority: normal severity: normal status: open title: concurrent.futures.Executor.map() consumes all memory when big generators are used type: resource usage versions: Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30323> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27130] zlib: OverflowError while trying to compress 2^32 bytes or more
Klamann added the comment: Thanks Xiang and Martin for solving this, you guys are awesome :) -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27130> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27130] zlib: OverflowError while trying to compress 2^32 bytes or more
Klamann added the comment: > You should be able to use a compression (or decompression) object as a > workaround. OK, let's see >>> import zlib >>> zc = zlib.compressobj() >>> c1 = zc.compress(b'a' * 2**31) >>> c2 = zc.compress(b'a' * 2**31) >>> c3 = zc.flush() >>> c = c1 + c2 + c3 >>> zd = zlib.decompressobj() >>> d1 = zd.decompress(c) Segmentation fault (core dumped) Seriously? What is wrong with this library? I've tested this using Python 3.5.0 on linux and Python 3.5.1 on Windows. At least with Python 2.7.6 it seems to work as expected... So, splitting the Input in chunks of less than 2^32 byte (less than 2^31 for Python 2.x) seems to work (except for this segfault in Python 3), but it's still annoying that you have to split and concatenate data each time and remember to call flush() or you lose data... imho, it would be best to fix the underlying issue. There is no reason why we should keep the 32 bit limitation. > Alternatively (or in the mean time), I guess we could document the limitation. +1 -- Added file: http://bugs.python.org/file43099/_usr_bin_python3.5.1000.crash ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27130> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27130] zlib: OverflowError while trying to compress 2^32 bytes or more
Klamann added the comment: > But you can only get that feature with Python3.5+. Well, I have Python 3.5.1 installed and the problem still persists. I'm not sure that 25626 ist the same problem - in the comments they say this was not an issue in Python 3.4 or 2.x, but this is clearly the case here. Another thing I've noticed: Contrary to my previous statement, zlib.decompress() doesn't work on archives that are larger than 4GB (I was mislead by the fact that my 1GB archive contained a 6GB file). When I use gzip.compress() on more than 2^32 bytes, the same OverflowError occurs as with zlib.compress(). But when I use gzip.decompress(), I can extract archives that are larger than 4GB. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27130> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27130] zlib: OverflowError while trying to compress 2^32 bytes or more
New submission from Klamann: zlib fails to compress files larger than 4gb due to some 32bit issues. I've tested this in Python 3.4.3 and 3.5.1: > python3 -c "import zlib; zlib.compress(b'a' * (2**32 - 1))" > python3 -c "import zlib; zlib.compress(b'a' * (2**32))" Traceback (most recent call last): File "", line 1, in OverflowError: Size does not fit in an unsigned int For Python 2.7, the issue starts at 2^31 byte (due to signed 32bit integers): > python2 -c "import zlib; zlib.compress(b'a' * (2**31 - 1))" > python2 -c "import zlib; zlib.compress(b'a' * (2**31))" Traceback (most recent call last): File "", line 1, in OverflowError: size does not fit in an int Decompressing files larger than 4GB works just fine. -- components: Library (Lib) messages: 266436 nosy: Klamann priority: normal severity: normal status: open title: zlib: OverflowError while trying to compress 2^32 bytes or more versions: Python 2.7, Python 3.4, Python 3.5 ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue27130> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com