[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-04-10 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 62723172412c by Antoine Pitrou in branch 'default':
Issue #23529: Limit the size of decompressed data when reading from
https://hg.python.org/cpython/rev/62723172412c

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-04-10 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Thank you very much for being so perseverant! The patch is now pushed into the 
default branch.

--
resolution:  - fixed
stage:  - resolved
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-29 Thread Martin Panter

Martin Panter added the comment:

Patch v9:

* Incorporated _PaddedFile.rewind() into seek() for simplicity
* Reverted to support fast-forwarding in non-seekable streams again
* Factored common checks into _check_can_seek()
* Documented “mtime” attribute and implemented it as a read-only 
decompression-only property

Antoine highlighted the fact that BufferedReader.peek() does not guarantee how 
much data it returns, so I removed the code trying to return at least 100 bytes.

I revisited merging the gzip read() loop with the others, but gave up again. I 
tried adding calls to _start_member() and _end_member(), with more “if” 
statements and flags to account for the different edge cases. But then I 
remembered the “needs_input” flag of LZMA and bzip versus the “unconsumed_tail” 
of zlib. Handling this would probably require another delegation to a new 
_decompress_chunk() method, at which point I think the whole read() method will 
probably just be a mess of hacks trying to marry the two not-quite-compatible 
implementations. So I think it is better to override read() with a 
gzip-specific implementation, unless someone wants to prove me wrong with their 
own patch :)

--
Added file: http://bugs.python.org/file38733/LZMAFile-etc.v9.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-27 Thread Martin Panter

Martin Panter added the comment:

Yes my patch should be ready, unless we want to work on factoring more common 
logic out of the gzip read() method.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-26 Thread Antoine Pitrou

Antoine Pitrou added the comment:

Is it still work-in-progress or are you looking for a review?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-26 Thread Nikolaus Rath

Nikolaus Rath added the comment:

I believe Martin's patch (v8) is ready for a core committer review. At least I 
can't find anything to criticize anymore :-).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-25 Thread Martin Panter

Martin Panter added the comment:

Another behaviour change in v7 is the seekable() return value is now inherited 
from underlying file, instead of always being True. I mentioned on Reitveld why 
this could be a bad idea, but as it is undocumented and untested and will leave 
the new behaviour unless someone complains.

Posting v8, with these changes from v7:

* Reverted _read_exact() and read() loop changes to fix tests
* Restored myfileobj attribute and test that it is closed
* Clean up gzip read() loop so that it always returns data if available, rather 
than raising EOFError
* Document the read(None) is now allowed
* Other minor changes addressing review comments

--
Added file: http://bugs.python.org/file38686/LZMAFile-etc.v8.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-25 Thread Nikolaus Rath

Nikolaus Rath added the comment:

Except for the pointless 'myfileobj' stuff in gzip.py, rev 8 of the patch looks 
good to me.

(Btw, I'm not actually in favor of e.g. the seekable() change. The previous 
patch was intended as a proof-of-concept to see what would be necessary to 
inherit more from DecompressReader and if it's actually worth it. But having 
thought about it for a while more, I don't think there's a significant gain. 
But I'm happy to see that you were able to cherry-pick some useful pieces out 
of it nevertheless).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-24 Thread Nikolaus Rath

Nikolaus Rath added the comment:

As discussed in Rietveld, here's an attempt to reuse more DecompressReader for 
GzipFile. There is still an unexplained test failure (test_read_truncated).

--
Added file: http://bugs.python.org/file38674/LZMAFile-etc.v7.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-24 Thread Martin Panter

Martin Panter added the comment:

Wolfgang: If the patch here is too massive, perhaps you might like to review my 
patch at Issue 23528 first. :) It only fixes the “gzip” module, but is 
definitely simpler, so might have more chance of making it into 3.4 or 3.5 on 
time.

===

Changes introduced by patch v7 if anyone’s interested: 
https://bitbucket.org/vadmium/cpython/commits/b7c43f7. A few problems and 
behaviour changes are introduced:

* GzipFile(filename=...).close() now relies on the garbage collector to close 
the file
* EOFError(Compressed file ended before the end-of-stream marker was reached) 
repaced with “struct.error” exception in some cases, e.g. 
GzipFile(fileobj=BytesIO(gzip_data[:3])).read()
* Premature EOF when stream ends in some cases, e.g. 
GzipFile(fileobj=BytesIO(empty_gzip + gzip_data)).read()

However I think the general idea of inheriting the common reader class and 
overriding as necessary is sound. Just looks like the code got too messed 
around in the process.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-21 Thread Martin Panter

Martin Panter added the comment:

I am posting v6, which basically drops the buffer_size parameters. The change 
is also pushed as a series of revisions to a Bit Bucket repository, see 
https://bitbucket.org/vadmium/cpython/branches/compare/compression%0D@. I 
thought that might help eliminate the giant blocks of changes in gzip.py, but 
it did not.

The previous changes for the buffer_size parameters are in the “buffer_size” 
bookmark in that repository.

--
hgrepos: +301
Added file: http://bugs.python.org/file38619/LZMAFile-etc.v6.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-20 Thread Martin Panter

Martin Panter added the comment:

Wolfgang: Unfortunately, I think that block of changes is largely unavoidable. 
The write() method should not have changed in reality, but all the init and 
read methods around it have been split into two classes, and the diff has 
decided to compare the old GzipFile methods to the new _DecompressReader 
methods. If you like I could try preparing a pseudo diff files that forces it 
to compare old GzipFile to new GzipFile to highlight the changes there.

Nikolaus: I will try to split my buffer_size parameter changes out and post 
them in a separate issue that can be enhanced further.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-20 Thread Wolfgang Maier

Wolfgang Maier added the comment:

The patch has a huge block replacement in the gzip.py module starting at 
GzipFile.write, which makes it large and hard to identify changes. Could that 
be ameliorated or is it too much work right now?

--
nosy: +wolma

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-17 Thread Nikolaus Rath

Nikolaus Rath added the comment:

If you want to add support for buffer_size=0 in a separate patch/issue I think 
that's fine. But in that case I would not add a buffer_size parameter now at 
all. IMO, not having it is better as having it but not supporting zero (even if 
it's documented that's pretty surprising).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-16 Thread Martin Panter

Martin Panter added the comment:

Posting LZMAFile-etc.v5.patch with the following changes:

* Merged with current code
* Changed BZ2File(buffer_size=...) to a keyword-only parameter and restored 
previous unused “buffering” parameter. Also changed the LZMAFile parameter to 
keyword-only for consistency.
* Dropped documenting how buffer_size affects the compressed data chunk size or 
the fast-forward chunk size for seeking
* Dropped backslash from \* in function signatures
* Documented and tested that buffer_size=0 is not valid for the LZMAFile etc 
classes
* Clarified a few code comments

There is still Nikolaus’s concern about setting the buffer size to zero and 
doing short reads, discussed on Rietveld. Apart from that, I think this patch 
addresses the rest of the comments. Let me know if I missed something!

--
Added file: http://bugs.python.org/file38514/LZMAFile-etc.v5.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-14 Thread Martin Panter

Martin Panter added the comment:

Well my last patch just added the _compression.py file without doing anything 
special, and it seems to be installed properly with “make install” so I will 
assume nothing else needs to be done.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-13 Thread Nikolaus Rath

Nikolaus Rath added the comment:

On Mar 06 2015, Martin Panter rep...@bugs.python.org wrote:
 Still to do: Need to find a better home for the _DecompressReader and
 _BaseStream classes. Currently it lives in “lzma”, but apparently it
 is possible for any of the gzip, bz2, lzma modules to not be
 importable, so it would have to live elsewhere.

Yes.

 Possible options are
 the io module, or a brand new internal module
 (e.g. Lib/_compression.py). Thoughts?

I think a new internal module would be the right choice, but I don't
know what needs to be done to properly add it to the build system. So
for now I'd just put it in the io module and wait for a core committer
to complain :-).

Best,
-Nikolaus
-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«

--
title: Limit decompressed data when reading from LZMAFile, BZ2File, GzipFile - 
Limit decompressed data when reading from LZMAFile and BZ2File

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-03-06 Thread Martin Panter

Martin Panter added the comment:

I am posting LZMAFile-etc.v3.patch, where I have implemented a “buffer_size” 
parameter to the buffered LZMAFile etc classes. I have not implemented 
open(buffering=...) for the time being (which should probably delegate to the 
buffer_size parameter or return a raw _DecompressReader object, at least for 
read mode). Other changes:

* Restored the _MODE_WRITE = 3 value
* Explained the _pos and _size attributes in _DecompressReader
* Factored out decomp_factory and args, decomp_error parameters to make 
_DecompressReader generic
* BZ2File modified similarly to LZMAFile
* I removed the deprecated and unused BZ2File(buffering=...) parameter; 
buffer_size takes its place. The old buffering parameter appears to have been 
documented, but never implemented in C Python, so hopefully this is not a big 
deal.

Still to do: Need to find a better home for the _DecompressReader and 
_BaseStream classes. Currently it lives in “lzma”, but apparently it is 
possible for any of the gzip, bz2, lzma modules to not be importable, so it 
would have to live elsewhere. Possible options are the io module, or a brand 
new internal module (e.g. Lib/_compression.py). Thoughts?

Also I am about to see if I can make GzipFile use the _DecompressReader class. 
I will have to add GzipFile(buffer_size=...) as a keyword-only parameter since 
the third parameter position is already taken. There are quite a few quirks 
with gzip and zlib compared to bz2 and lzma, so I will see how I go.

--
Added file: http://bugs.python.org/file38367/LZMAFile-etc.v3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-02-28 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-02-28 Thread Nikolaus Rath

Nikolaus Rath added the comment:

On Feb 27 2015, Martin Panter rep...@bugs.python.org wrote:
 In the code review, Nikolaus raised the idea of allowing a custom
 “buffer_size” parameter for the BufferedReader. I think this would
 need a bit of consideration about how it should work:

 1. Should it be a direct wrapper around
 BufferedReader(buffer_size=...)?

I'm not 100% sure what you mean by that, but I think the answer is Yes.

 2. Should it also support an unbuffered reader mode like
 open(buffering=0), socket.makefile(buffering=0), and
 subprocess.Popen(bufsize=0)?

Yes, in that case the DecompressReader should simply be used without
encapsulating it in BufferedReader.

 3. Should there be any consideration for buffering in write mode
 (mirroring the equivalent open(), etc parameters)?

I don't think so. There's not much to be gained by buffering data before
passing it to the compressor, so the only point of buffering is to avoid
short writes (or reads in case of decompressing). But the low-level
compressor code already avoids short writes.

Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-02-27 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 LZMAFile now uses BufferedReader.peek(). The current implementation seems 
 appropriate, but I am not comfortable with the current specification in the 
 documentation, which says it is allowed to not return any useful data.

What do you mean with useful data? peek() should always return at least one 
byte (except on EOF or on non-blocking streams, of course).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-02-27 Thread Martin Panter

Martin Panter added the comment:

## BufferedReader.peek() ##
See https://bugs.python.org/issue5811#msg233750, but basically my concern is 
that the documentation says “the number of bytes returned may be less or more 
than requested”, without any mention of EOF or other conditions.

## Buffer sizing ##
In the code review, Nikolaus raised the idea of allowing a custom “buffer_size” 
parameter for the BufferedReader. I think this would need a bit of 
consideration about how it should work:

1. Should it be a direct wrapper around BufferedReader(buffer_size=...)?
2. Should it also support an unbuffered reader mode like open(buffering=0), 
socket.makefile(buffering=0), and subprocess.Popen(bufsize=0)?
3. Should there be any consideration for buffering in write mode (mirroring the 
equivalent open(), etc parameters)?

## Common raw decompression stream class ##
Having a common base class mapping the generic decompressor object API to the 
RawIOBase API is a good thing. I will try to make one in my next patch 
iteration. In the mean time, here are a couple of issues to consider:

* What module should it reside in? Perhaps “gzip”, because it is the most 
commonly-used of the three decompressors? Perhaps “io”, being the most relevant 
common dependency? Or a brand new internal module?!
* The zlib.decompressobj() API is slightly different, since it has the 
“unconsumed_tail” attribute and flush() methods, and decompress(max_length=0) 
does not mean return zero bytes (Issue 23200).

In Issue 23528, Nikolaus also pointed out that “GzipFile would need to 
additionally overwrite read() and write() in order to handle the CRC and gzip 
header.”

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-02-26 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-02-26 Thread Martin Panter

New submission from Martin Panter:

This is a follow-on from Issue 15955, which has added low-level support for 
limiting the amount of data from the LZMA and bzip decompressors. The 
high-level LZMAFile and BZ2File reader APIs need to make use of the new low 
level max_length parameter.

I am starting off with a patch for LZMAFile, based on a patch I posted to Issue 
15955. I split out a _RawReader class that does the actual decompress() calls, 
and then wrapped that in a BufferedReader. The LZMAFile then just delegates the 
read methods to the BufferedReader. This avoids needing any special code to 
implement buffering, readline(), etc. This involved some changes in the API 
though:

* LZMAFile now uses BufferedReader.peek(). The current implementation seems 
appropriate, but I am not comfortable with the current specification in the 
documentation, which says it is allowed to not return any useful data. See 
Issue 5811.
* read() now accepts size=None, because BufferedReader does. I had to change a 
test case for this.
* BufferedReader.seek() raises a different exception for invalid “whence”

Once the LZMAFile patch sees a bit of review and looks like it might be 
acceptable, I plan to change the BZ2File class in a similar manner.

--
components: Library (Lib)
files: LZMAFile.v2.patch
keywords: patch
messages: 236663
nosy: nikratio, vadmium
priority: normal
severity: normal
status: open
title: Limit decompressed data when reading from LZMAFile and BZ2File
type: enhancement
versions: Python 3.5
Added file: http://bugs.python.org/file38245/LZMAFile.v2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23529] Limit decompressed data when reading from LZMAFile and BZ2File

2015-02-26 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23529
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com