New submission from Nadeem Vawda:
When calling zlib.Decompress.decompress() with a max_length argument,
if the input data is not full consumed, the next_in pointer in the
z_stream struct are left pointing into the data object, but the
decompressor does not hold a reference to this object. This same
pointer is reused (perhaps unintentionally) if flush() is called
without calling decompress() again.
If the data object gets deallocated between the calls to decompress()
and to flush(), zlib will then try to access this deallocated memory,
and most likely return bogus output (or segfault). See the attached
script for a demonstration.
I see two potential solutions:
1. Set avail_in to zero in flush(), so that it does not try to use
leftover data (or whatever is else where that data used to be).
2. Have decompress() check if there is leftover data, and if so,
save a reference to the object until a) we consume the rest of
the data in flush(), or b) discard it in a subsequent call to
Solution 2 would be less disruptive to code that depends on the existing
behavior (in non-pathological cases), but I'm don't like the maintenance
burden of adding yet another thing to keep track of to the decompressor
state. The PyZlib_objdecompress function is complex enough as it is, and
we can expect more bugs like this to creep in the more we cram additional
logic into it. So I'm more in favor of solution 1.
nosy: nadeem.vawda, serhiy.storchaka
stage: needs patch
title: zlib.Decompress.decompress() retains pointer to input buffer without
acquiring reference to it
versions: Python 2.7, Python 3.2, Python 3.3, Python 3.4
Added file: http://bugs.python.org/file27889/zlib_stale_ptr.py
Python tracker <rep...@bugs.python.org>
Python-bugs-list mailing list