Re: [Pytables-users] ANN: PyTables 3.0 final
Many thanks for keeping such a great piece of work up and running. I've just seen some features in the release notes, features which I was going to need in the very near future! Great job! Best regards Seref Arikan On Sat, Jun 1, 2013 at 12:33 PM, Antonio Valentino < antonio.valent...@tiscali.it> wrote: > === > Announcing PyTables 3.0.0 > === > > We are happy to announce PyTables 3.0.0. > > PyTables 3.0.0 comes after about 5 years from the last major release > (2.0) and 7 months since the last stable release (2.4.0). > > This is new major release and an important milestone for the PyTables > project since it provides the long waited support for Python 3.x, which > has been around for 4 years. > > Almost all of the core numeric/scientific packages for Python already > support Python 3 so we are very happy that now also PyTables can provide > this important feature. > > > What's new > == > > A short summary of main new features: > > - Since this release, PyTables now provides full support to Python 3 > - The entire code base is now more compliant with coding style >guidelines described in PEP8. > - Basic support for HDF5 drivers. It now is possible to open/create an >HDF5 file using one of the SEC2, DIRECT, LOG, WINDOWS, STDIO or CORE >drivers. > - Basic support for in-memory image files. An HDF5 file can be set >from or copied into a memory buffer. > - Implemented methods to get/set the user block size in a HDF5 file. > - All read methods now have an optional *out* argument that allows to >pass a pre-allocated array to store data. > - Added support for the floating point data types with extended >precision (Float96, Float128, Complex192 and Complex256). > - Consistent ``create_xxx()`` signatures. Now it is possible to create >all data sets Array, CArray, EArray, VLArray, and Table from existing >Python objects. > - Complete rewrite of the `nodes.filenode` module. Now it is fully >compliant with the interfaces defined in the standard `io` module. >Only non-buffered binary I/O is supported currently. > > Please refer to the RELEASE_NOTES document for a more detailed list of > changes in this release. > > As always, a large amount of bugs have been addressed and squashed as well. > > In case you want to know more in detail what has changed in this > version, please refer to: http://pytables.github.io/release_notes.html > > You can download a source package with generated PDF and HTML docs, as > well as binaries for Windows, from: > http://sourceforge.net/projects/pytables/files/pytables/3.0.0 > > For an online version of the manual, visit: > http://pytables.github.io/usersguide/index.html > > > What it is? > === > > PyTables is a library for managing hierarchical datasets and > designed to efficiently cope with extremely large amounts of data with > support for full 64-bit file addressing. PyTables runs on top of > the HDF5 library and NumPy package for achieving maximum throughput and > convenient use. PyTables includes OPSI, a new indexing technology, > allowing to perform data lookups in tables exceeding 10 gigarows > (10**10 rows) in less than a tenth of a second. > > > Resources > = > > About PyTables: http://www.pytables.org > > About the HDF5 library: http://hdfgroup.org/HDF5/ > > About NumPy: http://numpy.scipy.org/ > > > Acknowledgments > === > > Thanks to many users who provided feature improvements, patches, bug > reports, support and suggestions. See the ``THANKS`` file in the > distribution package for a (incomplete) list of contributors. Most > specially, a lot of kudos go to the HDF5 and NumPy makers. > Without them, PyTables simply would not exist. > > > Share your experience > = > > Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. > > > > >**Enjoy data!** > >-- The PyTables Developers > > > -- > Get 100% visibility into Java/.NET code with AppDynamics Lite > It's a free troubleshooting tool designed for production > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap2 > ___ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > -- Get 100% visibility into
Re: [Pytables-users] Chunk selection for optimized data access
I think I've seen this in the release notes of 3.0. This is actually something that I'm looking into as well. So any experience/feedback about creating files in memory would be much appreciated. Best regards Seref On Tue, Jun 4, 2013 at 2:09 PM, Andreas Hilboll wrote: > On 04.06.2013 05:35, Tim Burgess wrote: > > My thoughts are: > > > > - try it without any compression. Assuming 32 bit floats, your monthly > > 5760 x 2880 is only about 65MB. Uncompressed data may perform well and > > at the least it will give you a baseline to work from - and will help if > > you are investigating IO tuning. > > > > - I have found with CArray that the auto chunksize works fairly well. > > Experiment with that chunksize and with some chunksizes that you think > > are more appropriate (maybe temporal rather than spatial in your case). > > > > On Jun 03, 2013, at 10:45 PM, Andreas Hilboll wrote: > > > >> On 03.06.2013 14:43, Andreas Hilboll wrote: > >> > Hi, > >> > > >> > I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray > >> > (the last dimension represents time, and once per month there'll be > one > >> > more 5760x2880 array to add to the end). > >> > > >> > Now, extracting timeseries at one index location is slow; e.g., for > four > >> > indices, it takes several seconds: > >> > > >> > In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1)) > >> > > >> > In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)]) > >> > CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s > >> > Wall time: 7.17 s > >> > > >> > I have the feeling that this performance could be improved, but I'm > not > >> > sure about how to properly use the `chunkshape` parameter in my case. > >> > > >> > Any help is greatly appreciated :) > >> > > >> > Cheers, Andreas. > >> > >> PS: If I could get significant performance gains by not using an EArray > >> and therefore re-creating the whole database each month, then this would > >> also be an option. > >> > >> -- Andreas. > > Thanks a lot, Anthony and Tim! I was able to get down the readout time > considerably using chunkshape=(32, 32, 256) for my 5760x2880x150 array. > Now, reading times are about as fast as I expected. > > the downside is that now, building up the database takes up a lot of > time, because i get the data in chunks of 5760x2880x1. So I guess that > writing the data to disk like this causes a load of IO operations ... > > My new question: Is there a way to create a file in-memory? If possible, > I could then build up my database in-memory and then, once it's done, > just copy the arrays to an on-disk file. Is that possible? If so, how? > > Thanks a lot for your help! > > -- Andreas. > > > > -- > How ServiceNow helps IT people transform IT departments: > 1. A cloud service to automate IT design, transition and operations > 2. Dashboards that offer high-level views of enterprise services > 3. A single system of record for all IT processes > http://p.sf.net/sfu/servicenow-d2d-j > ___ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users
Re: [Pytables-users] Chunk selection for optimized data access
You would be suprised to see how convenient HDF5 can be in small scale data :) There are cases where one may need to use binary serialization of a few thousand items, but still needing metadata, indexing and other nice features provided by HDF5/pyTables. On Wed, Jun 5, 2013 at 2:29 AM, Tim Burgess wrote: > I was playing around with in-memory HDF5 prior to the 3.0 release. Here's > an example based on what I was doing. > I looked over the docs and it does mention that there is an option to > throw away the 'file' rather than write it to disk. > Not sure how to do that and can't actually think of a use case where I > would want to :-) > > And be wary, it is H5FD_CORE. > > > On Jun 05, 2013, at 08:38 AM, Anthony Scopatz wrote: > > > I think that you want to set parameters.DRIVER to H5DF_CORE [1]. I > haven't ever used this personally, but it would be great to have an example > script, if someone wants to write one ;) > > > > import numpy as np > import tables > > CHUNKY = 30 > CHUNKX = 8640 > > if __name__ == '__main__': > > # create dataset and add global attrs > > file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX) > > with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory > example', driver='H5FD_CORE') as h5f: > > # dummy some data > lats = np.empty([4320]) > lons = np.empty([8640]) > > # create some simple arrays > lat_node = h5f.create_array('/', 'lat', lats, title='latitude') > lon_node = h5f.create_array('/', 'lon', lons, title='longitude') > > # create a 365 x 4320 x 8640 CArray of 32bit float > shape = (365, 4320, 8640) > atom = tables.Float32Atom(dflt=np.nan) > > # chunk into daily slices and then further chunk days > sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape, > chunkshape=(1, CHUNKY, CHUNKX)) > > # dummy up an ndarray > sst = np.empty([4320, 8640], dtype=np.float32) > sst.fill(30.0) > > # write ndarray to a 2D plane in the HDF5 > sst_node[0] = sst > > > > -- > How ServiceNow helps IT people transform IT departments: > 1. A cloud service to automate IT design, transition and operations > 2. Dashboards that offer high-level views of enterprise services > 3. A single system of record for all IT processes > http://p.sf.net/sfu/servicenow-d2d-j > ___ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j___ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users