[issue29842] Make Executor.map work with infinite/large inputs correctly

2017-05-16 Thread Klamann

Changes by Klamann <sebastian-str...@gmx.net>:


--
nosy: +Klamann

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29842>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30323] concurrent.futures.Executor.map() consumes all memory when big generators are used

2017-05-16 Thread Klamann

Klamann added the comment:

Thanks for pointing this out.
*closed*

--
resolution:  -> duplicate
stage:  -> resolved
status: open -> closed

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30323>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30323] concurrent.futures.Executor.map() consumes all memory when big generators are used

2017-05-12 Thread Klamann

Klamann added the comment:

Yes, I was wrong in my assumption that simply replacing the list comprehension 
with a generator expression would fix the issue.

Nevertheless, there is no need to load the *entire* generator into memory by 
converting it to a list. All we have to read are the first n elements, where n 
is the number of workers that are currently available.

I've implemented an alternative solution that works for me, using wait() and 
notify() from threading.Condition, but I'm not quite sure if this would be the 
best solution for everyone. But I could post it here, if you're intrested.

We should also consider that not strictly evaluating every iterable that is 
passed to the map() function might break existing code that implicitly relies 
on that fact that this is happening (although this is not a documented feature 
of the map function and was probably not the intended behaviour in the first 
place).

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30323>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30323] concurrent.futures.Executor.map() consumes all memory when big generators are used

2017-05-09 Thread Klamann

New submission from Klamann:

The Executor's map() function accepts a function and an iterable that holds the 
function arguments for each call to the function that should be made. This 
iterable could be a generator, and as such it could reference data that won't 
fit into memory.

The behaviour I would expect is that the Executor requests the next element 
from the iterable whenever a thread, process or whatever is ready to make the 
next function call.

But what actually happens is that the entire iterable gets converted into a 
list right after the map function is called and therefore any underlying 
generator will load all referenced data into memory. Here's where the list gets 
built from the iterable:
https://github.com/python/cpython/blob/3.6/Lib/concurrent/futures/_base.py#L548

The way I see it, there's no reason to convert the iterable to a list in the 
map function (or any other place in the Executor). Just replacing the list 
comprehension with a generator expression would probably fix that.


Here's an example that illustrates the issue:

from concurrent.futures import ThreadPoolExecutor
import time

def generate():
for i in range(10):
print("generating input", i)
yield i

def work(i):
print("working on input", i)
time.sleep(1)

with ThreadPoolExecutor(max_workers=2) as executor:
generator = generate()
executor.map(work, generator)

The output is:

generating input 0
working on input 0
generating input 1
working on input 1
generating input 2
generating input 3
generating input 4
generating input 5
generating input 6
generating input 7
generating input 8
generating input 9
working on input 2
working on input 3
working on input 4
working on input 5
working on input 6
working on input 7
working on input 8
working on input 9

Ideally, the lines should alternate, but currently all input is generated 
immediately.

--
messages: 293353
nosy: Klamann
priority: normal
severity: normal
status: open
title: concurrent.futures.Executor.map() consumes all memory when big 
generators are used
type: resource usage
versions: Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30323>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27130] zlib: OverflowError while trying to compress 2^32 bytes or more

2016-07-23 Thread Klamann

Klamann added the comment:

Thanks Xiang and Martin for solving this, you guys are awesome :)

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27130>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27130] zlib: OverflowError while trying to compress 2^32 bytes or more

2016-06-02 Thread Klamann

Klamann added the comment:

> You should be able to use a compression (or decompression) object as a 
> workaround.

OK, let's see

>>> import zlib
>>> zc = zlib.compressobj()
>>> c1 = zc.compress(b'a' * 2**31)
>>> c2 = zc.compress(b'a' * 2**31)
>>> c3 = zc.flush()
>>> c = c1 + c2 + c3
>>> zd = zlib.decompressobj()
>>> d1 = zd.decompress(c)
Segmentation fault (core dumped)

Seriously? What is wrong with this library? I've tested this using Python 3.5.0 
on linux and Python 3.5.1 on Windows.
At least with Python 2.7.6 it seems to work as expected...

So, splitting the Input in chunks of less than 2^32 byte (less than 2^31 for 
Python 2.x) seems to work (except for this segfault in Python 3), but it's 
still annoying that you have to split and concatenate data each time and 
remember to call flush() or you lose data...

imho, it would be best to fix the underlying issue. There is no reason why we 
should keep the 32 bit limitation.

> Alternatively (or in the mean time), I guess we could document the limitation.

+1

--
Added file: http://bugs.python.org/file43099/_usr_bin_python3.5.1000.crash

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27130>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27130] zlib: OverflowError while trying to compress 2^32 bytes or more

2016-05-26 Thread Klamann

Klamann added the comment:

> But you can only get that feature with Python3.5+.

Well, I have Python 3.5.1 installed and the problem still persists. I'm not 
sure that 25626 ist the same problem - in the comments they say this was not an 
issue in Python 3.4 or 2.x, but this is clearly the case here.

Another thing I've noticed: Contrary to my previous statement, 
zlib.decompress() doesn't work on archives that are larger than 4GB (I was 
mislead by the fact that my 1GB archive contained a 6GB file).

When I use gzip.compress() on more than 2^32 bytes, the same OverflowError 
occurs as with zlib.compress(). But when I use gzip.decompress(), I can extract 
archives that are larger than 4GB.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27130>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27130] zlib: OverflowError while trying to compress 2^32 bytes or more

2016-05-26 Thread Klamann

New submission from Klamann:

zlib fails to compress files larger than 4gb due to some 32bit issues.

I've tested this in Python 3.4.3 and 3.5.1:

> python3 -c "import zlib; zlib.compress(b'a' * (2**32 - 1))"
> python3 -c "import zlib; zlib.compress(b'a' * (2**32))"
Traceback (most recent call last):
  File "", line 1, in 
OverflowError: Size does not fit in an unsigned int

For Python 2.7, the issue starts at 2^31 byte (due to signed 32bit integers):

> python2 -c "import zlib; zlib.compress(b'a' * (2**31 - 1))"
> python2 -c "import zlib; zlib.compress(b'a' * (2**31))"
Traceback (most recent call last):
  File "", line 1, in 
OverflowError: size does not fit in an int

Decompressing files larger than 4GB works just fine.

--
components: Library (Lib)
messages: 266436
nosy: Klamann
priority: normal
severity: normal
status: open
title: zlib: OverflowError while trying to compress 2^32 bytes or more
versions: Python 2.7, Python 3.4, Python 3.5

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27130>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com