New submission from Joshua Chia:
When using bz2.BZ2File to read an input file that is growing slowly, repeated
read()ing eventually catches up to the end and subsequently fails to produce
any more data while the input file continues growing.
In 2.7, the symptom is that read() keeps returning no data even after the file
grows. In 3.3, the symptom is EOFError: Compressed file ended before the
end-of-stream marker was reached.
The correct behavior is to not consume partial compressed data during read()
and be able to read() properly later after the input file grows. The EOFError
should not be raised until close() is called and the file is found to not
ending at an end-of-stream marker.
While some existing software may depend on the current behavior, the new
behavior may break the existing software. However, predicating the new behavior
on constructor parameter buffer being non-zero may mitigate incompatibility
problems as using buffer during reading currently doesn't seem to make much
sense.
To repro the problem, use the attached slow-copy.py to slowly copy a
large-enough source bz2 file to a destination bz2 file. Then run the following
script on the slowly-growing destination bz2 file:
import bz2
import sys
import time
if len(sys.argv) != 2:
exit(1)
total = 0
with bz2.BZ2File(sys.argv[1], 'r', buffering=8192) as input:
while True:
bytes = input.read(8192)
bytes = len(bytes)
total += bytes
print('{} {}'.format(total, bytes))
if bytes 8192:
time.sleep(0.5)
--
components: Library (Lib)
files: slow-copy.py
messages: 207506
nosy: Joshua.Chia
priority: normal
severity: normal
status: open
title: bz2.BZ2File.read() does not treat growing input file properly
type: behavior
versions: Python 2.7, Python 3.3
Added file: http://bugs.python.org/file5/slow-copy.py
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue20156
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com