[Pytables-users] Ubuntu 11.10: blosc is not supported?

2011-12-06 Thread PyTables Org
Forwarding to the list. ~JoshBegin forwarded message:From: pytables-users-boun...@lists.sourceforge.netDate: December 6, 2011 2:20:35 PM GMT+01:00To: pytables-users-ow...@lists.sourceforge.netSubject: Auto-discard notificationThe attached message has been automatically discarded.From: Martin Felder martin.fel...@zsw-bw.deDate: December 6, 2011 2:05:15 PM GMT+01:00To: pytables-users@lists.sourceforge.netSubject: Ubuntu 11.10: blosc is not supported?Hi,I installed pytables via the Ubuntu package manager (currently version 2.1.2-3.1build1), and we use it a lot for production work. Thanks for this great package!So far I haven't tried enabling compression, but since it says in the documentation BLOSC comes with it, I created a filter with complib="blosc", only to get:ValueError: compression library ``blosc`` is not supported; it must be one of: zlib, lzo, bzip2Do I have to compile a newer version from source to enable BLOSC?Thanks,Martinattachment: martin.felder.vcf--
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Giant HDF5/PyTables error message

2011-12-06 Thread Francesc Alted
2011/12/5 Francesc Alted fal...@pytables.org

 Regarding the big error, the HDF5 error stack could be converted into a
 Python error so that it can be caught, if needed.  Hmm, I'll file a ticket
 on this later on.


https://github.com/PyTables/PyTables/issues/120

-- 
Francesc Alted
--
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Ubuntu 11.10: blosc is not supported?

2011-12-06 Thread Francesc Alted
2011/12/6 PyTables Org pytab...@googlemail.com

 Forwarding to the list. ~Josh

 Begin forwarded message:

 *From: *pytables-users-boun...@lists.sourceforge.net
 *Date: *December 6, 2011 2:20:35 PM GMT+01:00
 *To: *pytables-users-ow...@lists.sourceforge.net
 *Subject: **Auto-discard notification*

 The attached message has been automatically discarded.
 *From: *Martin Felder martin.fel...@zsw-bw.de
 *Date: *December 6, 2011 2:05:15 PM GMT+01:00
 *To: *pytables-users@lists.sourceforge.net
 *Subject: **Ubuntu 11.10: blosc is not supported?*


 Hi,

 I installed pytables via the Ubuntu package manager (currently version
 2.1.2-3.1build1), and we use it a lot for production work. Thanks for this
 great package!

 So far I haven't tried enabling compression, but since it says in the
 documentation BLOSC comes with it, I created a filter with complib=blosc,
 only to get:

 ValueError: compression library ``blosc`` is not supported; it must be one
 of: zlib, lzo, bzip2

 Do I have to compile a newer version from source to enable BLOSC?


Yes, you need at least PyTables 2.2 for using Blosc.  Antonio has recently
released PyTables binaries for Debian in:

http://sourceforge.net/projects/pytables/files/pytables/2.3.1/

that might be useful for Ubuntu too.

-- 
Francesc Alted
--
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Ubuntu 11.10: blosc is not supported?

2011-12-06 Thread Antonio Valentino
Hi Francesc, hi Martin,

Il 06/12/2011 14:55, Francesc Alted ha scritto:
 2011/12/6 PyTables Org pytab...@googlemail.com
 
 Forwarding to the list. ~Josh

 Begin forwarded message:

 *From: *pytables-users-boun...@lists.sourceforge.net
 *Date: *December 6, 2011 2:20:35 PM GMT+01:00
 *To: *pytables-users-ow...@lists.sourceforge.net
 *Subject: **Auto-discard notification*

 The attached message has been automatically discarded.
 *From: *Martin Felder martin.fel...@zsw-bw.de
 *Date: *December 6, 2011 2:05:15 PM GMT+01:00
 *To: *pytables-users@lists.sourceforge.net
 *Subject: **Ubuntu 11.10: blosc is not supported?*


 Hi,

 I installed pytables via the Ubuntu package manager (currently version
 2.1.2-3.1build1), and we use it a lot for production work. Thanks for this
 great package!

 So far I haven't tried enabling compression, but since it says in the
 documentation BLOSC comes with it, I created a filter with complib=blosc,
 only to get:

 ValueError: compression library ``blosc`` is not supported; it must be one
 of: zlib, lzo, bzip2

 Do I have to compile a newer version from source to enable BLOSC?


 Yes, you need at least PyTables 2.2 for using Blosc.  Antonio has recently
 released PyTables binaries for Debian in:
 
 http://sourceforge.net/projects/pytables/files/pytables/2.3.1/
 
 that might be useful for Ubuntu too.
 

Yes it should work but it is only for amd64.

Users of Ubuntu 11.10 can use the following PPA:

https://launchpad.net/~a.valentino/+archive/eotools

I'm trying to ush it in the official debian/ubuntu archives but I have
serious problems to contact current maintainers.

cheers

-- 
Antonio Valentino

--
Cloud Services Checklist: Pricing and Packaging Optimization
This white paper is intended to serve as a reference, checklist and point of 
discussion for anyone considering optimizing the pricing and packaging model 
of a cloud services business. Read Now!
http://www.accelacomm.com/jaw/sfnl/114/51491232/
___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


[Pytables-users] Some experiences with PyTables

2011-12-06 Thread PyTables Org
Forwarding to the list. ~Josh.

Begin forwarded message:

 From: pytables-users-boun...@lists.sourceforge.net
 Date: December 6, 2011 9:27:27 PM GMT+01:00
 To: pytables-users-ow...@lists.sourceforge.net
 Subject: Auto-discard notification
 
 The attached message has been automatically discarded.
 From: Edward C. Jones edcjo...@comcast.net
 Date: December 6, 2011 9:25:08 PM GMT+01:00
 To: pytables-users@lists.sourceforge.net
 Subject: Some experiences with PyTables
 
 
 My computer has an up-to-date Debian stable distribution installed.  I have
 the following Debian packages (plus most dev packages):
python 2.6.6-3+squeeze6
python-numpy 1:1.4.1-5
hdf5-tools 1.8.4-patch1-2
libhdf5-serial-1.8.4 1.8.4-patch1-2
 I have compiled and installed PyTables 2.3.1.
 
 1. There seems to be an unpythonic design choice with the start, stop, step
   convention for PyTables.  Anything that is unnatural to a Python
   programmer should be heavily documented.
 2. There may be a bug in itersorted.
 
 Here is code for (1) and (2):
 
 #! /usr/bin/env python
 
 import random, tables
 
 h5file = tables.openFile('mytable.h5', mode='w')
 
 class silly_class(tables.IsDescription):
num = tables.Int32Col(pos=0)
 
 mytable = h5file.createTable(h5file.root, 'mytable', silly_class,
 'a few ints', expectedrows=4)
 
 row = mytable.row
 for i in range(10):
row['num'] = random.randint(0, 99)
row.append()
 mytable.flush()
 mytable.cols.num.createCSIndex()
 
 # Python's idiom for start, stop, step:
 print 'Python:', range(9, -1, -1)
 
 output = mytable.readSorted('num', start=0, stop=10, step=-1)
 print 'readSorted:', 0, 10, -1, output
 
 # copy supports a negative step.  It seems that start and stop are applied
 # _after_ the sort is done.  Very unlike Python.  Please document thoroughly.
 print h5file.root.mytable[:]
 h5file.root.mytable.copy(h5file.root, 'mytable2', sortby='num',
 start=0, stop=5, step=-1)
 print h5file.root.mytable2[:]
 
 # The following raises an OverflowError.  The documentation (2.3.1) says
 # negative steps are supported for itersorted.  Documentation error or bug
 # in itersorted?
 output = [x['num'] for x in mytable.itersorted('num',
 start=0, stop=10, step=-1)]
 print 'itersorted:', 0, 10, -1, output
 
 3. Null bytes are stripped from the end of strings when they are stored in a
   table.  Since a Python does not expect this, it needs to be explicitly
   documented in all the relevant places.  Here is some code:
 
 #! /usr/bin/env python
 
 import tables
 
 def hash2hex(stringin):
out = list()
for c in stringin:
s = hex(ord(c))[2:]
if len(s) == 1:
s = '0' + s
out.append(s)
return ''.join(out)
 
 h5file = tables.openFile('mytable.h5', mode='w')
 
 class silly_class(tables.IsDescription):
astring = tables.StringCol(16, pos=0)
 
 mytable = h5file.createTable(h5file.root, 'mytable', silly_class,
 'a few strings', expectedrows=4)
 
 # Problem when string ends with null bytes:
 nasty = 'abdcef' + '\x00\x00'
 print repr(nasty)
 print hash2hex(nasty)
 
 row = mytable.row
 row['astring'] = nasty
 row.append()
 mytable.flush()
 print repr(mytable[0][0])
 print hash2hex(mytable[0][0])
 h5file.close()
 
 4. Has the 64K limit for attributes been lifted?
 
 5. The reference manual for numpy contains _many_ small examples.  They
   partially compensate for any lack of precision or excessive precision in
   the documents.  Also many people learn best from examples.
 
 6. Suppose that the records (key, data1) and (key, data2) are two rows in a
   table with (key, data1) being a earlier row than (key, data2).  Both
   records have the same value in the first column.  If a CSIndex is created
   using the first column, will (key, data1) still be before (key, data2) in
   the index?  This property is called stability.  Some sorting algorithms
   guarantee this and others don't.  Are the sorts in PyTables stable?
 
 7. The table.append in PyTables behaves like extend in Python. Why?
 
 8. I get a mysterious PerformanceWarning from the PyTables file table.py,
   line 2742. This message needs to be split into two messages.  In my case,
   after I appended to a table, 'row' in self.__dict__ was True and
   self.row._getUnsavedNrows() was 1.  To resolve the problem, I added a
   line that flushes the table after every append.  Does
   h5file.mytable.flush() do something that h5file.flush() doesn't?  Do I
   need to flush every table after every append or are there only certain
   situations when this is needed?  What does preempted from alive nodes
   mean?
 
 9. Does the following code contain a bug in PyTables?
 
 #! /usr/bin/env python
 
 import sys
 import numpy, tables
 
 # No failure if projections and winsize are small enough.  In the original
 # program, gauss.shape is (2000, 196, 196).
 projections = 105
 winsize = 2500
 h5 = tables.openFile('mess.h5', mode='w')
 shape = (projections, winsize)
 

Re: [Pytables-users] Some experiences with PyTables

2011-12-06 Thread Anthony Scopatz
Hello Edward,

I'd like to respond point by point:

On Tue, Dec 6, 2011 at 2:54 PM, PyTables Org pytab...@googlemail.comwrote:

 1. There seems to be an unpythonic design choice with the start, stop, step
   convention for PyTables.  Anything that is unnatural to a Python
   programmer should be heavily documented.

 Agreed in general.  Do you have specific example we could address?


 2. There may be a bug in itersorted.


Yes, this looks like a bug.  This deserves an issue on github...



 Here is code for (1) and (2):
 
 #! /usr/bin/env python

 import random, tables

 h5file = tables.openFile('mytable.h5', mode='w')

 class silly_class(tables.IsDescription):
num = tables.Int32Col(pos=0)

 mytable = h5file.createTable(h5file.root, 'mytable', silly_class,
 'a few ints', expectedrows=4)

 row = mytable.row
 for i in range(10):
row['num'] = random.randint(0, 99)
row.append()
 mytable.flush()
 mytable.cols.num.createCSIndex()

 # Python's idiom for start, stop, step:
 print 'Python:', range(9, -1, -1)

 output = mytable.readSorted('num', start=0, stop=10, step=-1)
 print 'readSorted:', 0, 10, -1, output

 # copy supports a negative step.  It seems that start and stop are applied
 # _after_ the sort is done.  Very unlike Python.  Please document
 thoroughly.

 We could certainly add some text to the docstring of Table.copy().  Still,
I guess I am
missing how this is 'wrong.' To the best of my knowledge, Python itself has
no single
function which both sorts and slices. (Please correct me if I am wrong
~_~.)  When
performing both operations one needs to be done first.  However, you are
correct in that
this could be better documented.


 print h5file.root.mytable[:]
 h5file.root.mytable.copy(h5file.root, 'mytable2', sortby='num',
 start=0, stop=5, step=-1)
 print h5file.root.mytable2[:]

 # The following raises an OverflowError.  The documentation (2.3.1) says
 # negative steps are supported for itersorted.  Documentation error or bug
 # in itersorted?
 output = [x['num'] for x in mytable.itersorted('num',
 start=0, stop=10, step=-1)]
 print 'itersorted:', 0, 10, -1, output
 
 3. Null bytes are stripped from the end of strings when they are stored in
 a
   table.  Since a Python does not expect this, it needs to be explicitly
   documented in all the relevant places.  Here is some code:

 This is a function of the underlying HDF5 storage mechanism and not
explicitly PyTables.
When storing fixed length strings, the array of characters it is converted
to *must* be exactly
length-N.  When serializing a string of length-M, HDF5 does the following:

1.  M  N: truncate the string at N bytes (chop off the end).
2. M == N: do nothing.
3.  M  N:  pad the character array with N - M null characters to achieve
length N.

Because of this technique, when deserializing all trailing null characters
are dropped.
This supports the much more common use case of storing shorter strings in a
longer
buffer but wanting to only recover the shorter version.

If you wanted to append null bytes to the end of the string, you could
always store the
python length (M) in another column.

  
 #! /usr/bin/env python

 import tables

 def hash2hex(stringin):
out = list()
for c in stringin:
s = hex(ord(c))[2:]
if len(s) == 1:
s = '0' + s
out.append(s)
return ''.join(out)

 h5file = tables.openFile('mytable.h5', mode='w')

 class silly_class(tables.IsDescription):
astring = tables.StringCol(16, pos=0)

 mytable = h5file.createTable(h5file.root, 'mytable', silly_class,
 'a few strings', expectedrows=4)

 # Problem when string ends with null bytes:
 nasty = 'abdcef' + '\x00\x00'
 print repr(nasty)
 print hash2hex(nasty)

 row = mytable.row
 row['astring'] = nasty
 row.append()
 mytable.flush()
 print repr(mytable[0][0])
 print hash2hex(mytable[0][0])
 h5file.close()
 
 4. Has the 64K limit for attributes been lifted?


No, unfortunately.  Once again, this is a compile time parameter of HDF5.
You could change this value and recompile HDF5, but then any h5 file you
create would not be portable with other versions of HDF5.  Trust me, you
are
not the only one who wishes this were as run-time variable.  (Still there
are good
reasons for it being static, ie speed and size)



 5. The reference manual for numpy contains _many_ small examples.  They
   partially compensate for any lack of precision or excessive precision in
   the documents.  Also many people learn best from examples.


If you would like to write up some additional example or contribute to the
docs in any way
*please* let me know.  We would be ecstatic for your help!



 6. Suppose that the records (key, data1) and (key, data2) are two rows in a
   table with (key, data1) being a earlier row than (key, data2).  Both
   records have the same value in the first column.  If a CSIndex is created
   using the first column, will (key, data1) still be before (key, data2) in