Re: [Numpy-discussion] numpy.savez does /not/ compress!?
>2010/6/8 Hans Meine : > On Tuesday 08 June 2010 11:40:59 Scott Sinclair wrote: >> The savez docstring should probably be clarified to provide this >> information. > > I would prefer to actually offer compression to the user. In the meantime, I've edited the docstring to reflect the current behaviour (http://docs.scipy.org/numpy/docs/numpy.lib.npyio.savez/). Cheers, Scott ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.savez does /not/ compress!?
Hi Anne, thanks for your input, too. On Tuesday 08 June 2010 12:53:51 Anne Archibald wrote: > I'm also a little dubious about making compression the default. > np.savez provides a feature - storing multiple arrays - that is not > otherwise available. I suspect many users care more about speed than > size. You're probably right. Together with the behaviour change, I'd say it's a bad idea to change the default. And you're right concerning the API - it would've been better to explicitly pass a dict, but I don't want to introduce a new function for this reason. Best, Hans ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.savez does /not/ compress!?
On 8 June 2010 06:11, Pauli Virtanen wrote: > ti, 2010-06-08 kello 12:03 +0200, Hans Meine kirjoitti: >> On Tuesday 08 June 2010 11:40:59 Scott Sinclair wrote: >> > The savez docstring should probably be clarified to provide this >> > information. >> >> I would prefer to actually offer compression to the user. Unfortunately, >> adding another argument to this function will never be 100% secure, since >> currently, all kwargs will be saved into the zip, so it could count as >> behaviour change. > > Yep, that's the only question to be resolved. I suppose "compression" is > not so usual as a variable name, so it probably wouldn't break anyone's > code. This sounds like trouble, not just now but for any future additions to the interface. Perhaps it would be better to provide a function with a different, more extensible interface? (For example, one that accepts an explicit dictionary?) I'm also a little dubious about making compression the default. np.savez provides a feature - storing multiple arrays - that is not otherwise available. I suspect many users care more about speed than size. Anne > -- > Pauli Virtanen > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.savez does /not/ compress!?
On Tuesday 08 June 2010 12:11:28 Pauli Virtanen wrote: > ti, 2010-06-08 kello 12:03 +0200, Hans Meine kirjoitti: > > I would prefer to actually offer compression to the user. Unfortunately, > > adding another argument to this function will never be 100% secure, since > > currently, all kwargs will be saved into the zip, so it could count as > > behaviour change. > > Yep, that's the only question to be resolved. Actually, one could also briefly think about whether the default should be ZIP_DEFLATE (if available). Apart from that, what should we(/I) do to get my patch in? (It wouldn't be my first one which goes into the bitrot department..) Have a nice day, Hans ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.savez does /not/ compress!?
ti, 2010-06-08 kello 12:03 +0200, Hans Meine kirjoitti: > On Tuesday 08 June 2010 11:40:59 Scott Sinclair wrote: > > The savez docstring should probably be clarified to provide this > > information. > > I would prefer to actually offer compression to the user. Unfortunately, > adding another argument to this function will never be 100% secure, since > currently, all kwargs will be saved into the zip, so it could count as > behaviour change. Yep, that's the only question to be resolved. I suppose "compression" is not so usual as a variable name, so it probably wouldn't break anyone's code. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.savez does /not/ compress!?
On Tuesday 08 June 2010 11:40:59 Scott Sinclair wrote: > The savez docstring should probably be clarified to provide this > information. I would prefer to actually offer compression to the user. Unfortunately, adding another argument to this function will never be 100% secure, since currently, all kwargs will be saved into the zip, so it could count as behaviour change. I would propose sth. like the attached patch. (I only chose ZIP_STORED as default for maximum backward compatibility.) > I guess that the default (uncompressed Zip) is used because specifying > the compression as ZIP_DEFLATED requires zlib to be installed on the > system (see zipfile.ZipFile docstring). Yes, that was my thought, too. Nevertheless, numpy might make a different decision, no? Ciao, Hans Index: numpy/lib/io.py === --- numpy/lib/io.py (Revision 7191) +++ numpy/lib/io.py (Arbeitskopie) @@ -269,10 +269,11 @@ def savez(file, *args, **kwds): """ -Save several arrays into a single, compressed file with extension ".npz" +Save several arrays into a single zip file with extension ".npz" -If keyword arguments are given, the names for variables assigned to the -keywords are the keyword names (not the variable names in the caller). +If keyword arguments (other than `compression`) are given, the +names for variables assigned to the keywords are the keyword names +(not the variable names in the caller). If arguments are passed in with no keywords, the corresponding variable names are arr_0, arr_1, etc. @@ -290,6 +291,12 @@ All keyword=value pairs cause the value to be saved with the name of the keyword. +The special keyword argument `compression` can be used if the file +should be compressed (see `zipfile.ZipFile`), i.e. pass +`zipfile.ZIP_DEFLATED` or `zipfile.ZIP_STORED` (default). If +compression == True, a suitable default (i.e. ZIP_DEFLATED) is +chosen. + See Also save : Save a single array to a binary file in NumPy format @@ -318,6 +325,15 @@ if not file.endswith('.npz'): file = file + '.npz' +if 'compression' in kwds: +compression = kwds['compression'] +del kwds['compression'] + +if compression == True: +compression = zipfile.ZIP_DEFLATE +else: +compression = zipfile.ZIP_STORED + namedict = kwds for i, val in enumerate(args): key = 'arr_%d' % i @@ -325,7 +341,7 @@ raise ValueError, "Cannot use un-named variables and keyword %s" % key namedict[key] = val -zip = zipfile.ZipFile(file, mode="w") +zip = zipfile.ZipFile(file, mode="w", compression=compression) # Stage arrays in a temporary file on disk, before writing to zip. fd, tmpfile = tempfile.mkstemp(suffix='-numpy.npy') ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.savez does /not/ compress!?
>2010/6/8 Hans Meine : > I just wondered why numpy.load("foo.npz") was so much faster than loading > (gzip-compressed) hdf5 file contents, and found that numpy.savez did not > compress my files at all. > > But is that intended? The numpy.savez docstring says "Save several arrays > into a single, *compressed* file in ``.npz`` format." (emphasis mine), so I > guess this might be a bug, or at least a missing feature. In fact, the > implementation simply uses the zipfile.ZipFile class, without specifying the > 'compression' argument to the constructor. > > >From http://docs.python.org/library/zipfile.html : >> `compression` is the ZIP compression method to use when writing the archive, >> and should be ZIP_STORED or ZIP_DEFLATED; unrecognized values will cause >> RuntimeError to be raised. If ZIP_DEFLATED is specified but the zlib module >> is not available, RuntimeError is also raised. The default is ZIP_STORED. The savez docstring should probably be clarified to provide this information. I guess that the default (uncompressed Zip) is used because specifying the compression as ZIP_DEFLATED requires zlib to be installed on the system (see zipfile.ZipFile docstring). Cheers, Scott ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] numpy.savez does /not/ compress!?
Hi, I just wondered why numpy.load("foo.npz") was so much faster than loading (gzip-compressed) hdf5 file contents, and found that numpy.savez did not compress my files at all. So there is currently no point in using numpy.savez instead of numpy.save when you're not using the multiple-arrays-per-file feature. (To the contrary, it even complicates loading and you need to choose and remember a name for the archive member.) But is that intended? The numpy.savez docstring says "Save several arrays into a single, *compressed* file in ``.npz`` format." (emphasis mine), so I guess this might be a bug, or at least a missing feature. In fact, the implementation simply uses the zipfile.ZipFile class, without specifying the 'compression' argument to the constructor. >From http://docs.python.org/library/zipfile.html : > `compression` is the ZIP compression method to use when writing the archive, > and should be ZIP_STORED or ZIP_DEFLATED; unrecognized values will cause > RuntimeError to be raised. If ZIP_DEFLATED is specified but the zlib module > is not available, RuntimeError is also raised. The default is ZIP_STORED. Greetings, Hans ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion