Re: [Numpy-discussion] numpy.savez does /not/ compress!?

2010-06-08 Thread Scott Sinclair
>2010/6/8 Hans Meine :
> On Tuesday 08 June 2010 11:40:59 Scott Sinclair wrote:
>> The savez docstring should probably be clarified to provide this
>> information.
>
> I would prefer to actually offer compression to the user.

In the meantime, I've edited the docstring to reflect the current
behaviour (http://docs.scipy.org/numpy/docs/numpy.lib.npyio.savez/).

Cheers,
Scott
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.savez does /not/ compress!?

2010-06-08 Thread Hans Meine
Hi Anne,

thanks for your input, too.

On Tuesday 08 June 2010 12:53:51 Anne Archibald wrote:
> I'm also a little dubious about making compression the default.
> np.savez provides a feature - storing multiple arrays - that is not
> otherwise available.  I suspect many users care more about speed than
> size.

You're probably right.  Together with the behaviour change, I'd say it's a bad 
idea to change the default.

And you're right concerning the API - it would've been better to explicitly 
pass a dict, but I don't want to introduce a new function for this reason.

Best,
  Hans
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.savez does /not/ compress!?

2010-06-08 Thread Anne Archibald
On 8 June 2010 06:11, Pauli Virtanen  wrote:
> ti, 2010-06-08 kello 12:03 +0200, Hans Meine kirjoitti:
>> On Tuesday 08 June 2010 11:40:59 Scott Sinclair wrote:
>> > The savez docstring should probably be clarified to provide this
>> > information.
>>
>> I would prefer to actually offer compression to the user.  Unfortunately,
>> adding another argument to this function will never be 100% secure, since
>> currently, all kwargs will be saved into the zip, so it could count as
>> behaviour change.
>
> Yep, that's the only question to be resolved. I suppose "compression" is
> not so usual as a variable name, so it probably wouldn't break anyone's
> code.

This sounds like trouble, not just now but for any future additions to
the interface. Perhaps it would be better to provide a function with a
different, more extensible interface? (For example, one that accepts
an explicit dictionary?)

I'm also a little dubious about making compression the default.
np.savez provides a feature - storing multiple arrays - that is not
otherwise available. I suspect many users care more about speed than
size.

Anne

> --
> Pauli Virtanen
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.savez does /not/ compress!?

2010-06-08 Thread Hans Meine
On Tuesday 08 June 2010 12:11:28 Pauli Virtanen wrote:
> ti, 2010-06-08 kello 12:03 +0200, Hans Meine kirjoitti:
> > I would prefer to actually offer compression to the user.  Unfortunately,
> > adding another argument to this function will never be 100% secure, since
> > currently, all kwargs will be saved into the zip, so it could count as
> > behaviour change.
> 
> Yep, that's the only question to be resolved.

Actually, one could also briefly think about whether the default should be 
ZIP_DEFLATE (if available).

Apart from that, what should we(/I) do to get my patch in?  (It wouldn't be my 
first one which goes into the bitrot department..)

Have a nice day,
  Hans
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.savez does /not/ compress!?

2010-06-08 Thread Pauli Virtanen
ti, 2010-06-08 kello 12:03 +0200, Hans Meine kirjoitti:
> On Tuesday 08 June 2010 11:40:59 Scott Sinclair wrote:
> > The savez docstring should probably be clarified to provide this
> > information.
>
> I would prefer to actually offer compression to the user.  Unfortunately, 
> adding another argument to this function will never be 100% secure, since 
> currently, all kwargs will be saved into the zip, so it could count as 
> behaviour change.

Yep, that's the only question to be resolved. I suppose "compression" is
not so usual as a variable name, so it probably wouldn't break anyone's
code.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.savez does /not/ compress!?

2010-06-08 Thread Hans Meine
On Tuesday 08 June 2010 11:40:59 Scott Sinclair wrote:
> The savez docstring should probably be clarified to provide this
> information.

I would prefer to actually offer compression to the user.  Unfortunately, 
adding another argument to this function will never be 100% secure, since 
currently, all kwargs will be saved into the zip, so it could count as 
behaviour change.

I would propose sth. like the attached patch.  (I only chose ZIP_STORED as 
default for maximum backward compatibility.)

> I guess that the default (uncompressed Zip) is used because specifying
> the compression as ZIP_DEFLATED requires zlib to be installed on the
> system (see zipfile.ZipFile docstring).

Yes, that was my thought, too.  Nevertheless, numpy might make a different 
decision, no?

Ciao,
  Hans
Index: numpy/lib/io.py
===
--- numpy/lib/io.py	(Revision 7191)
+++ numpy/lib/io.py	(Arbeitskopie)
@@ -269,10 +269,11 @@
 
 def savez(file, *args, **kwds):
 """
-Save several arrays into a single, compressed file with extension ".npz"
+Save several arrays into a single zip file with extension ".npz"
 
-If keyword arguments are given, the names for variables assigned to the
-keywords are the keyword names (not the variable names in the caller).
+If keyword arguments (other than `compression`) are given, the
+names for variables assigned to the keywords are the keyword names
+(not the variable names in the caller).
 If arguments are passed in with no keywords, the corresponding variable
 names are arr_0, arr_1, etc.
 
@@ -290,6 +291,12 @@
 All keyword=value pairs cause the value to be saved with the name of
 the keyword.
 
+The special keyword argument `compression` can be used if the file
+should be compressed (see `zipfile.ZipFile`), i.e. pass
+`zipfile.ZIP_DEFLATED` or `zipfile.ZIP_STORED` (default).  If
+compression == True, a suitable default (i.e. ZIP_DEFLATED) is
+chosen.
+
 See Also
 
 save : Save a single array to a binary file in NumPy format
@@ -318,6 +325,15 @@
 if not file.endswith('.npz'):
 file = file + '.npz'
 
+if 'compression' in kwds:
+compression = kwds['compression']
+del kwds['compression']
+
+if compression == True:
+compression = zipfile.ZIP_DEFLATE
+else:
+compression = zipfile.ZIP_STORED
+
 namedict = kwds
 for i, val in enumerate(args):
 key = 'arr_%d' % i
@@ -325,7 +341,7 @@
 raise ValueError, "Cannot use un-named variables and keyword %s" % key
 namedict[key] = val
 
-zip = zipfile.ZipFile(file, mode="w")
+zip = zipfile.ZipFile(file, mode="w", compression=compression)
 
 # Stage arrays in a temporary file on disk, before writing to zip.
 fd, tmpfile = tempfile.mkstemp(suffix='-numpy.npy')
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.savez does /not/ compress!?

2010-06-08 Thread Scott Sinclair
>2010/6/8 Hans Meine :
> I just wondered why numpy.load("foo.npz") was so much faster than loading
> (gzip-compressed) hdf5 file contents, and found that numpy.savez did not
> compress my files at all.
>
> But is that intended?  The numpy.savez docstring says "Save several arrays
> into a single, *compressed* file in ``.npz`` format." (emphasis mine), so I
> guess this might be a bug, or at least a missing feature.  In fact, the
> implementation simply uses the zipfile.ZipFile class, without specifying the
> 'compression' argument to the constructor.
>
> >From http://docs.python.org/library/zipfile.html :
>> `compression` is the ZIP compression method to use when writing the archive,
>> and should be ZIP_STORED or ZIP_DEFLATED; unrecognized values will cause
>> RuntimeError to be raised. If ZIP_DEFLATED is specified but the zlib module
>> is not available, RuntimeError is also raised. The default is ZIP_STORED.

The savez docstring should probably be clarified to provide this information.

I guess that the default (uncompressed Zip) is used because specifying
the compression as ZIP_DEFLATED requires zlib to be installed on the
system (see zipfile.ZipFile docstring).

Cheers,
Scott
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] numpy.savez does /not/ compress!?

2010-06-08 Thread Hans Meine
Hi,

I just wondered why numpy.load("foo.npz") was so much faster than loading 
(gzip-compressed) hdf5 file contents, and found that numpy.savez did not 
compress my files at all.  So there is currently no point in using numpy.savez 
instead of numpy.save when you're not using the multiple-arrays-per-file 
feature.  (To the contrary, it even complicates loading and you need to choose 
and remember a name for the archive member.)

But is that intended?  The numpy.savez docstring says "Save several arrays 
into a single, *compressed* file in ``.npz`` format." (emphasis mine), so I 
guess this might be a bug, or at least a missing feature.  In fact, the 
implementation simply uses the zipfile.ZipFile class, without specifying the 
'compression' argument to the constructor.  

>From http://docs.python.org/library/zipfile.html :
> `compression` is the ZIP compression method to use when writing the archive,
> and should be ZIP_STORED or ZIP_DEFLATED; unrecognized values will cause
> RuntimeError to be raised. If ZIP_DEFLATED is specified but the zlib module
> is not available, RuntimeError is also raised. The default is ZIP_STORED.

Greetings,
  Hans
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion