Thanks Francesc, that solved it. Having the disk datastructures load
compressed in memory can be a deal-breaker when you got daily 50Gb+
datasets to process!
The carray google group (I had not noticed it) seems unreachable at the
moment. That's why I am going to report a problem here for the moment. With
the following code
ct0 = ca.ctable((h5f.root.c_000[:],), names=('c_000',), rootdir=
u'/lfpd1/tmp/ctable-1', mode='w', cparams=ca.cparams(5), dtype='u2',
expectedlen=len(h5f.root.c_000))
for k in h5f.root._v_children.keys()[:3]: #just some of the HDF5 datasets
try:
col = getattr(h5f.root, k)
ct0.addcol(col[:], name=k, expectedlen=len(col), dtype='u2')
except ValueError:
pass #exists
ct0.flush()
>>> ct0
ctable((303390000,), [('c_000', '<u2'), ('c_007', '<u2'), ('c_006',
'<u2'), ('c_005', '<u2')])
nbytes: 2.26 GB; cbytes: 1.30 GB; ratio: 1.73
cparams := cparams(clevel=5, shuffle=True)
rootdir := '/lfpd1/tmp/ctable-1'
[(312, 37, 65432, 91) (313, 32, 65439, 65) (320, 24, 65433, 66) ...,
(283, 597, 677, 647) (276, 600, 649, 635) (298, 607, 635, 620)]
The newly-added datasets/columns exist in memory
>>> ct0['c_007']
carray((303390000,), uint16)
nbytes: 578.67 MB; cbytes: 333.50 MB; ratio: 1.74
cparams := cparams(clevel=5, shuffle=True)
[ 37 32 24 ..., 597 600 607]
but they do not appear in the rootdir, not even after .flush()
/lfpd1/tmp/ctable-1]$ ls
__attrs__ c_000 __rootdirs__
and something seems amiss with __rootdirs__:
/lfpd1/tmp/ctable-1]$ cat __rootdirs__
{"dirs": {"c_007": null, "c_006": null, "c_005": null, "c_000":
"/lfpd1/tmp/ctable-1/c_000"}, "names": ["c_000", "c_007", "c_006", "c_005"]}
>>> ct0.cbytes//1024**2
1334
vs
/lfpd1/tmp]$ du -h ctable-1
12K ctable-1/c_000/meta
340M ctable-1/c_000/data
340M ctable-1/c_000
340M ctable-1
and, finally, no 'open'
ct0_disk = ca.open(rootdir='/lfpd1/tmp/ctable-1', mode='r')
---------------------------------------------------------------------------ValueError
Traceback (most recent call
last)/home/tejero/Dropbox/O/nb/nonridge/<ipython-input-26-41e1cb01ffe6>
in <module>()----> 1 ct0_disk = ca.open(rootdir='/lfpd1/tmp/ctable-1',
mode='r')
/home/tejero/Local/Envs/test/lib/python2.7/site-packages/carray/toplevel.pyc
in open(rootdir, mode) 104 # Not a carray. Now with a
ctable 105 try:--> 106 obj =
ca.ctable(rootdir=rootdir, mode=mode) 107 except IOError:
108 # Not a ctable
/home/tejero/Local/Envs/test/lib/python2.7/site-packages/carray/ctable.pyc
in __init__(self, columns, names, **kwargs) 193 _new =
True 194 else:--> 195 self.open_ctable() 196
_new = False 197
/home/tejero/Local/Envs/test/lib/python2.7/site-packages/carray/ctable.pyc
in open_ctable(self) 282 283 # Open the ctable by
reading the metadata--> 284 self.cols.read_meta_and_open()
285 286 # Get the length out of the first column
/home/tejero/Local/Envs/test/lib/python2.7/site-packages/carray/ctable.pyc
in read_meta_and_open(self) 40 # Initialize the cols by
instatiating the carrays 41 for name, dir_ in
data['dirs'].items():---> 42 self._cols[str(name)] =
ca.carray(rootdir=dir_, mode=self.mode) 43 44 def
update_meta(self):
/home/tejero/Local/Envs/test/lib/python2.7/site-packages/carray/carrayExtension.so
in carray.carrayExtension.carray.__cinit__
(carray/carrayExtension.c:8637)()
ValueError: You need at least to pass an array or/and a rootdir
-á.
On 7 December 2012 17:04, Francesc Alted <fal...@gmail.com> wrote:
> Hmm, perhaps cythonizing by hand is your best bet:
>
> $ cython carray/carrayExtension.pyx
>
> If you continue having problems, please write to the carray mailing list.
>
> Francesc
>
> On 12/7/12 5:29 PM, Alvaro Tejero Cantero wrote:
> > I have now similar dependencies as you, except for Numpy 1.7 beta 2.
> >
> > I wish I could help with the carray flavor.
> >
> > --
> > Running setup.py install for carray
> > * Found Cython 0.17.2 package installed.
> > * Found numpy 1.6.2 package installed.
> > * Found numexpr 2.0.1 package installed.
> > building 'carray.carrayExtension' extension
> > C compiler: gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall
> > -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
> > --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC
> > -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
> > -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
> > -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC
> > compile options: '-Iblosc
> >
> -I/home/tejero/Local/Envs/test/lib/python2.7/site-packages/numpy/core/include
> > -I/usr/include/python2.7 -c'
> > extra options: '-msse2'
> > gcc: blosc/blosclz.c
> > gcc: carray/carrayExtension.c
> > gcc: error: carray/carrayExtension.c: No such file or directory
> > gcc: fatal error: no input files
> > compilation terminated.
> > gcc: error: carray/carrayExtension.c: No such file or directory
> > gcc: fatal error: no input files
> > compilation terminated.
> > error: Command "gcc -pthread -fno-strict-aliasing -O2 -g -pipe
> > -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
> > --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC
> > -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
> > -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
> > -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -Iblosc
> >
> -I/home/tejero/Local/Envs/test/lib/python2.7/site-packages/numpy/core/include
> > -I/usr/include/python2.7 -c carray/carrayExtension.c -o
> > build/temp.linux-x86_64-2.7/carray/carrayExtension.o -msse2" failed
> > with exit status 4
> >
> >
> >
> > -á.
> >
> >
> >
> > On 7 December 2012 12:47, Francesc Alted <fal...@gmail.com
> > <mailto:fal...@gmail.com>> wrote:
> >
> > On 12/6/12 1:42 PM, Alvaro Tejero Cantero wrote:
> > > Thank you for the comprehensive round-up. I have some ideas and
> > > reports below.
> > >
> > > What about ctables? The documentation says that it is specificly
> > > column-access optimized, which is what I need in this scenario
> > > (sometimes sequential, sometimes random).
> >
> > Yes, ctables is optimized for column access.
> >
> > >
> > > Unfortunately I could not get the rootdir parameter for ctables
> > > __init__ to work in carray 0.4 and pip-installing 0.5 or 0.5.1
> leads
> > > to compilation errors.
> >
> > Yep, persistence for carray/ctables objects was added in 0.5.
> >
> > >
> > > This is the ctables-to-disk error:
> > >
> > > ct2 = ca.ctable((np.arange(30000000),), names=('range2',),
> > > rootdir='/tmp/ctable2.ctable')
> > >
> >
> ---------------------------------------------------------------------------
> > > TypeError Traceback (most
> > recent call last)
> > >
> > /home/tejero/Dropbox/O/nb/nonridge/<ipython-input-29-255842877a0b>
> > in<module>()
> > > ----> 1 ct2= ca.ctable((np.arange(30000000),),
> > names=('range2',), rootdir='/tmp/ctable2.ctable')
> > >
> > >
> >
> /home/tejero/Local/Envs/test/lib/python2.7/site-packages/carray/ctable.pyc
> > in__init__(self, cols, names, **kwargs)
> > > 158 if column.dtype== np.void:
> > > 159 raise ValueError, "`cols`
> > elements cannot be of type void"
> > > --> 160 column= ca.carray(column, **kwargs)
> > > 161 elif ratype:
> > > 162 column= ca.carray(cols[name], **kwargs)
> > >
> > >
> >
> /home/tejero/Local/Envs/test/lib/python2.7/site-packages/carray/carrayExtension.so
> > incarray.carrayExtension.carray.__cinit__
> > (carray/carrayExtension.c:3917)()
> > >
> > > TypeError: __cinit__() got an unexpected keyword argument 'rootdir'
> > >
> > >
> > > And this is cut from the pip output when trying to upgrade carray.
> > >
> > > gcc: carray/carrayExtension.c
> > >
> > > gcc: error: carray/carrayExtension.c: No such file or directory
> >
> > Hmm, that's strange, because the carrayExtension should have been
> > cythonized automatically. Here it is part of my install process
> > with pip:
> >
> > Running setup.py install for carray
> > * Found Cython 0.17.1 package installed.
> > * Found numpy 1.7.0b2 package installed.
> > * Found numexpr 2.0.1 package installed.
> > cythoning carray/carrayExtension.pyx to carray/carrayExtension.c
> > building 'carray.carrayExtension' extension
> > C compiler: gcc -fno-strict-aliasing
> > -I/Users/faltet/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3
> > -Wall -Wstrict-prototypes
> >
> > Hmm, perhaps you need a newer version of Cython?
> >
> > >
> > >
> > > Two more notes:
> > >
> > > * a way was added to check in-disk (compressed) vs in-memory
> > > (uncompressed) node sizes. I was unable to find the way to use it
> > > either from the 2.4.0 release notes or from the git issue
> > >
> https://github.com/PyTables/PyTables/issues/141#issuecomment-5018763
> >
> > You already found the answer.
> >
> > >
> > > * is/will it be possible to load PyTables carrays as in-memory
> > carrays
> > > without decompression?
> >
> > Actually, that has been my idea from the very beginning. The
> > concept of
> > 'flavor' for the returned objects when reading is already there, so
> it
> > should be relatively easy to add a new 'carray' flavor. Maybe you
> can
> > contribute this?
> >
> > --
> > Francesc Alted
> >
> >
> >
> ------------------------------------------------------------------------------
> > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
> > Remotely access PCs and mobile devices and provide instant support
> > Improve your efficiency, and focus on delivering more value-add
> > services
> > Discover what IT Professionals Know. Rescue delivers
> > http://p.sf.net/sfu/logmein_12329d2d
> > _______________________________________________
> > Pytables-users mailing list
> > Pytables-users@lists.sourceforge.net
> > <mailto:Pytables-users@lists.sourceforge.net>
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
> > Remotely access PCs and mobile devices and provide instant support
> > Improve your efficiency, and focus on delivering more value-add services
> > Discover what IT Professionals Know. Rescue delivers
> > http://p.sf.net/sfu/logmein_12329d2d
> >
> >
> > _______________________________________________
> > Pytables-users mailing list
> > Pytables-users@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
> --
> Francesc Alted
>
>
>
> ------------------------------------------------------------------------------
> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
> Remotely access PCs and mobile devices and provide instant support
> Improve your efficiency, and focus on delivering more value-add services
> Discover what IT Professionals Know. Rescue delivers
> http://p.sf.net/sfu/logmein_12329d2d
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users