[issue22163] max_wbits set incorrectly to -zlib.MAX_WBITS in tarfile, shouldn't be negative

2014-08-07 Thread Eduardo Robles Elvira

New submission from Eduardo Robles Elvira:

I think I have found a small typo-bug in tarfile.py, that seems to
be present in cpython upstream, which makes tarfile compression slower.
The issue can be seen here, in line 415 [1] of tarfile.py:

self.cmp = self.zlib.compressobj(9, self.zlib.DEFLATED,
-self.zlib.MAX_WBITS,
self.zlib.DEF_MEM_LEVEL,
0)

The minus sign doesn't make sense to me, because zlib.MAX_WBITS is 15,
and according to the documentation [2] wbits is by default 15 and
This should be an integer from 8 to 15. Higher values give better
compression, but use more memory. -15 is not even in range. So I guess the 
expected value should be the zlib default, 15. Or maybe another value, but it 
should at least be in range. 

I marked it as performance type bug as this only really affects performance.

I might have gotten it all wrong and it's fine the way it is, but I thought 
it'd be best to report it, as it looked fishy to me.

--
[1] http://hg.python.org/cpython/file/94d0e842b9ea/Lib/tarfile.py#l415
[2] https://docs.python.org/3.4/library/zlib.html#zlib.compressobj

--
components: Library (Lib)
files: max_wbits.patch
keywords: patch
messages: 225006
nosy: edulix
priority: normal
severity: normal
status: open
title: max_wbits set incorrectly to -zlib.MAX_WBITS in tarfile, shouldn't be 
negative
type: performance
versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 3.5
Added file: http://bugs.python.org/file36299/max_wbits.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue22163
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21109] tarfile: Traversal attack vulnerability

2014-04-26 Thread Eduardo Robles Elvira

Eduardo Robles Elvira added the comment:

Do we have any final decision on what's the best approach to solve this? I see 
some possibilities:

a) leave the issue to the library user. I think that's a not good solution 
security-wise as many will be unaware of the problem and this promotes code 
duplication for the fix. On the other hand, this does not change default 
behavior.

b) fix the problem as proposed in the patch sent by Daniel. This makes the 
tarfile secure against this kind of attacks. It does change the behavior and 
doesn't allow to extract in arbitrary paths, though.

c) fix the problem so that by default extracting in arbitrary paths is not 
allowed, but allow somehow to do that optionally. This way we change the 
default behavior but provide an easy fix for those that depend on that 
functionality.

d) do not change the default, but provide a well documented and easy  way to 
activate the safety checks that fix this kind of attacks. The advantage is that 
it doesn't change the default behavior, the disadvantage is that many people 
will have to modify their code to be secure, and that the default is not very 
secure.

For what is worth, I believe either b or c should be chosen to fix this issue.

--
nosy: +edulix

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21109
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21109] tarfile: Traversal attack vulnerability

2014-04-26 Thread Eduardo Robles Elvira

Eduardo Robles Elvira added the comment:

Also, I guess this patch solves and is closely related to #1044 which was, at 
the time (2007), considered not a bug.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21109
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18321] Multivolume support in tarfile module

2014-04-13 Thread Eduardo Robles Elvira

Eduardo Robles Elvira added the comment:

 The example I gave is based on the idea that there is a TarVolumeSet class in 
 the tarfile module that implements all the required file-object methods (e.g. 
  read(), write(), seek(), etc.) and acts as if the sequence of volumes is 
 actually one big file. It is passed to tarfile.open() as the fileobj 
 argument. This TarVolumeSet class is supposed to be subclassable to let the 
 user implement her/his own mode of operation. This way the open_volume() 
 method can do whatever the user wants it to do. The TarVolumeSet class might 
 as well have a new_volume() method for writing multivol archives, the example 
 only covered the case of reading a multivol archive.

This is not so easy, because for example if you want to move the logic in 
addfile() that manages volumes to the write() function, you have some issues:
 * write() will need to take into account blocks (BLOCKSIZE), just to be able 
to split the volumes correctly. This is something that usually shouldn't belong 
in a write() function as it has to do to tarfile.. and it can be messy that 
both layers deal with it (write() splitting volumes, and tarfile adding NUL at 
the end of a BLOCK..) This can be done I guess, but remember, we split a volume 
only in the middle of a big file, not in any other case (AFAIK). Hopefully you 
don't get huge pax headers or anything strange. Usually all other records are 
either in the begining or just occupy one block, but can we rule this problem 
out for good?

 * multivolume logic in write() needs read/write access to the current tarinfo 
being written (look for tarfile in addfile() funcion in my patch to see why). 
How do you propose this object should be accessed from write()? 

 BTW, my version of GNU tar refuses to create compressed multiple-volume 
 archives which is why I doubt the usefulness of this feature overall.

But it has multivolume support right? Which is what I am proposing here. Also, 
you can gzip (or encrypt or anything) the volumes after creating the volumes..

 [...] because a multivol tarfile is not exactly the same as a normal tarfile 
 chopped up.
 No, I think it is exactly that. The only purpose of the GNUTYPE_MULTIVOL 
 header that is at the start of each subsequent volume is to give GNU tar the 
 ability to detect if it is reading the correct volume. It is not essential 
 and could as well be left out.

I'm not going to discuss this, because I think that we can implement it in the 
way you propose to implement it anyway. But my patch supports it and I think 
it's an *useful* feature, so I want it in :-P

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18321
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18321] Multivolume support in tarfile module

2014-04-13 Thread Eduardo Robles Elvira

Eduardo Robles Elvira added the comment:

 [...] but remember, we split a volume only in the middle of a big file, not 
 in any other case (AFAIK). Hopefully you don't get huge pax headers or 
 anything strange. [...] 
 Hopefully? Sorry, but have you tested this? I did. I let GNU tar create a two 
 volume archive that is split exactly between the two blocks of an XHDTYPE pax 
 header.

 The result is terrifying. At the beginning of the second volume GNU tar 
 creates an XGLTYPE header as the pax replacement for a GNUTYPE_MULTIVOL 
 header, followed by an XHDTYPE header (GNUFileParts) that somehow decorates 
 the following REGTYPE(!) tar header that contains the continuation of the 
 split XHDTYPE header data from the previous volume. After that comes the 
 REGTYPE file that the split XHDTYPE header was actually meant for as 
 decoration.

 I attached the archive to this issue.

 What happens if a GNUTYPE_LONGNAME header is split in two? I don't wanna 
 know...

 write() will need to take into account blocks (BLOCKSIZE), just to be able 
 to split the volumes correctly.

 It is mandatory to do the split on a block boundary (a multiple of 512).
 BTW, my version of GNU tar refuses to create compressed multiple-volume 
 archives which is why I doubt the usefulness of this feature overall.
 But it has multivolume support right? Which is what I am proposing here. 
 Also, you can gzip (or encrypt or anything) the volumes after creating the 
 volumes..

 Yeah, it has multivolume support, but a very limited one that is not only 
 weird but isn't even usable together with compression. And sure, I can 
 compress and encrypt the volumes afterward, but I can also create a 
 compressed archive and pipe it through split(1) to split it into parts. Both 
 ways create tar archives that are not readable by GNU tar because they're 
 non-standard. So what?

 Please tell me, what is your actual personal use-case for this feature?

I'm willing modify the patch to remove the weirdness you refer to. I differ 
on that it's not usable: it might not be useful to you, but it's certainly a 
feature that covers part of the functionality of GNU tar. Actually, some of the 
unit tests are like this: use GNU Tar to compress, then extract with tarfile - 
and viceversa.

Of course you can use split. And I could use Ruby or Perl, but I'm using python 
and tarfile, and this is a GNU tar feature that is just not supported in python 
tarfile upstream, and I'm just trying to contribute this feature, if possible 
:-).

BTW, If I create a multivol tar file and then compress the volumes, that does 
not make it non-standard, in the same way that if I create a PNG file and 
then compress it and then store it in EXTFS, it doesn't make it non-standard. 
I'm just using multiple layers of standards.

I'm a contractor, and I have been asked by a client to develop a python-based 
backup tool. The client is technical and had already an idea of what he wanted 
to do: use python-tarfile and add support to multivolume and some other 
goodies, and the client also wanted to try to push the changes upstream as we 
believe it is the correct thing to do.

BTW, when we designed the backup tool, we ruled out the possibility of using 
split because split wouldn't allow to correctly list all the files in each 
file-slice separately. We wanted to be able to recover all the files of each 
volume so that if we lose other volumes, we can still recover all the data 
from the volumes we have. 

Anyway, if you are the maintainer of tarfile and you think it's not possible to 
push tar-multivolume support upstream in python tarfile for whatever reason, 
please tell me.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18321
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18321] Multivolume support in tarfile module

2014-03-11 Thread Eduardo Robles Elvira

Eduardo Robles Elvira added the comment:

I guess I got it wrong, it's not part of the POSIX standard, just part of the 
GNU tar documentation. About the getmembers and getnames not reflecting the 
entirety of the archive, it's an optimization I needed and I think ccan be 
quite handy. It's also consistent with how the tar command works afaik, just 
listing the contents of the current volume. But it's true that could be done.

Regarding the TarVolumeSet idea, it could work but it's not as easy as that. 
You don't want to directly do a plain open in there, because you want to be 
able to deal with read/write modes, with gzip/bzip/Stream class. You also want 
to give the user maximum flexibility. That's why in my implementation 
new_volume_handler and open_volume functions are separated. Similarly, we could 
do something like this:

class MyTarVolumeSet(tarfile.TarVolumeSet):

def __init__(self, template, max_vol_size):
self.template = template
self.max_vol_size = max_vol_size

def new_volume_handler(self, tarfile, volume_number):
self.open_volume(self.template % volume_number, tarfile)

volumes = MyTarVolumesSet(test.tar.%03d)
with tarfile.open(fileobj=volumes, mode=w:) as tar:
for t in tar:
print(t.name)

Note that the new_volume_handler in this example receives more or less the same 
params as in my patch (not the base name, because it's already stored in the 
template), and that the open_volume really abstracts which way it will be 
opened. It could also have, as in my patch, an optional fileobj parameter, for 
a new indirection/abstraction.

In any case, this kind of abstraction would still really need some kind of 
hooking with tarfile, because a multivol tarfile is not exactly the same as a 
normal tarfile chopped up. This might be doable unilateraly by a smart 
TarVolumeSet getting the information from the tarfile and modifying/patching 
anything needed. This is one of the reasons why one would probably would still 
need access to the tarfile inside the open_volume function.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18321
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18321] Multivolume support in tarfile module

2014-01-29 Thread Eduardo Robles Elvira

Eduardo Robles Elvira added the comment:

 I cannot yet go into the details, because I have not tested the patch.

 The comments, docstrings and quoting are not very consistent with the rest of 
 the module. There are a few spelling mistakes. 

I can try to take care of this, though it'd be best if you can point me out 
concrete examples.

 The open_volume() method is more or less a copy of the open() method which is 
 not optimal.

I know, but trying to do something else might be even worse..

 The patch adds a lot of complexity to the tarfile module for a use case that 
 only a few connoisseurs benefit from. It seems to alter some inherent TarFile 
 mechanics that people might rely on, e.g. the members attribute contains only 
 the members stored in the current volume not the overall entirety of members. 

Well, that doesn't make much sense to me. You say that people rely on something 
like  members attribute contains only the members stored in the current volume 
not the overall entirety of members, but as you know, python tarfile lib 
didn't support volumes before this patch, so I guess people couldn't be relying 
on that anyway.

Also please bear in mind that tarfile volumes support is part of the tar 
standard, and altough not everyone uses this, it has its niche uses (sliced 
backups etc).

 Does this patch reliably allow random-access?

Yes and no. 

No: when you open a volume for reading, the list of members is cleared as you 
pointed so you cannot access to them.

Yes: you can open any volume at the begining of a file, and it read the tarfile 
from there. I do that in my use-case.

 Would it be possible/easier to add the same functionality using a separate 
 class MultiVolumeTarFile instead?

If you find open_volume similar to open() and don't like, I'm quite sure you 
would like all the needed copy-paste having this a separate class would entail. 
Also as I said before, might not make much sense as this is part of the 
standard.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18321
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18321] Multivolume support in tarfile module

2014-01-27 Thread Eduardo Robles Elvira

Eduardo Robles Elvira added the comment:

Do we have any news on this patch?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18321
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18321] Multivolume support in tarfile module

2013-10-22 Thread Eduardo Robles Elvira

Eduardo Robles Elvira added the comment:

could you please check if my contributor form is already processed?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18321
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18321] Multivolume support in tarfile module

2013-07-28 Thread Eduardo Robles Elvira

Eduardo Robles Elvira added the comment:

Sure, I will fill it out. But is it required?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18321
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18575] Fixing tarfile._mode when using gzip via :gz

2013-07-28 Thread Eduardo Robles Elvira

New submission from Eduardo Robles Elvira:

A TarFile object constructor accepts a fileobj argument. When this argument is 
set and it has a mode property, tarfile._mode is blindly copied from there. 
Otherwise, mode is set using the mode argument in the constructor.

This usually works, but in the case where fileobj is a gzip.GzipFile, the 
mode property is not a string like rb or wb but an integer, which is not 
expected in TarFile. 

This has not been noticed before probably because the TarFile._mode property is 
usually unused internally, but in my case it's a problem when using it together 
with tarfile multivolume mode provided in issue #18321.

--
components: Library (Lib)
files: gzip.patch
keywords: patch
messages: 193812
nosy: edulix
priority: normal
severity: normal
status: open
title: Fixing tarfile._mode when using gzip via :gz
versions: Python 3.4
Added file: http://bugs.python.org/file31061/gzip.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18575
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18575] Fixing tarfile._mode when using gzip via :gz

2013-07-28 Thread Eduardo Robles Elvira

Changes by Eduardo Robles Elvira edu...@gmail.com:


Removed file: http://bugs.python.org/file31061/gzip.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18575
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18575] Fixing tarfile._mode when using gzip via :gz

2013-07-28 Thread Eduardo Robles Elvira

Eduardo Robles Elvira added the comment:

Fixing  gzip.patch, it was using basestring (python2) instead of str

--
Added file: http://bugs.python.org/file31062/gzip.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18575
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com




[issue18321] Multivolume support in tarfile module

2013-06-28 Thread Eduardo Robles Elvira

New submission from Eduardo Robles Elvira:

The patch attached provides implementation for multivolume support for tarfile 
module. It contains both the changes in the module and a battery of unit tests. 
It contains support for multivolume for both GNU and PAX formats.

The main idea behind this proposal is that for dealing with new volumes, the 
user will provide a callback function. In this callback function the user 
typically calls to TarFile.open_volume(filename) with the appropiate filename. 
This is quite flexible in the sense that the callback function could even ask 
the user for the filename of the next volume (as tar command does), or do 
anything they need before returning or before calling to open_volume. 

Please feel free to comment on how to improve the implementation or the API. 
Documentation for the new feature will be provided at a later stage of the 
review process if the patch gets a good review.

--
components: Library (Lib)
files: cpython-tarfile-multivolume.patch
keywords: patch
messages: 191979
nosy: edulix
priority: normal
severity: normal
status: open
title: Multivolume support in tarfile module
versions: Python 3.4
Added file: http://bugs.python.org/file30720/cpython-tarfile-multivolume.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18321
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com