Re: [Pytables-users] ANN: PyTables 3.0 final

2013-06-03 Thread Seref Arikan
Many thanks for keeping such a great piece of work up and running. I've
just seen some features in the release notes, features which I was going to
need in the very near future!
Great job!

Best regards
Seref Arikan



On Sat, Jun 1, 2013 at 12:33 PM, Antonio Valentino <
antonio.valent...@tiscali.it> wrote:

> ===
>   Announcing PyTables 3.0.0
> ===
>
> We are happy to announce PyTables 3.0.0.
>
> PyTables 3.0.0 comes after about 5 years from the last major release
> (2.0) and 7 months since the last stable release (2.4.0).
>
> This is new major release and an important milestone for the PyTables
> project since it provides the long waited support for Python 3.x, which
> has been around for 4 years.
>
> Almost all of the core numeric/scientific packages for Python already
> support Python 3 so we are very happy that now also PyTables can provide
> this important feature.
>
>
> What's new
> ==
>
> A short summary of main new features:
>
> - Since this release, PyTables now provides full support to Python 3
> - The entire code base is now more compliant with coding style
>guidelines described in PEP8.
> - Basic support for HDF5 drivers.  It now is possible to open/create an
>HDF5 file using one of the SEC2, DIRECT, LOG, WINDOWS, STDIO or CORE
>drivers.
> - Basic support for in-memory image files.  An HDF5 file can be set
>from or copied into a memory buffer.
> - Implemented methods to get/set the user block size in a HDF5 file.
> - All read methods now have an optional *out* argument that allows to
>pass a pre-allocated array to store data.
> - Added support for the floating point data types with extended
>precision (Float96, Float128, Complex192 and Complex256).
> - Consistent ``create_xxx()`` signatures.  Now it is possible to create
>all data sets Array, CArray, EArray, VLArray, and Table from existing
>Python objects.
> - Complete rewrite of the `nodes.filenode` module. Now it is fully
>compliant with the interfaces defined in the standard `io` module.
>Only non-buffered binary I/O is supported currently.
>
> Please refer to the RELEASE_NOTES document for a more detailed list of
> changes in this release.
>
> As always, a large amount of bugs have been addressed and squashed as well.
>
> In case you want to know more in detail what has changed in this
> version, please refer to: http://pytables.github.io/release_notes.html
>
> You can download a source package with generated PDF and HTML docs, as
> well as binaries for Windows, from:
> http://sourceforge.net/projects/pytables/files/pytables/3.0.0
>
> For an online version of the manual, visit:
> http://pytables.github.io/usersguide/index.html
>
>
> What it is?
> ===
>
> PyTables is a library for managing hierarchical datasets and
> designed to efficiently cope with extremely large amounts of data with
> support for full 64-bit file addressing.  PyTables runs on top of
> the HDF5 library and NumPy package for achieving maximum throughput and
> convenient use.  PyTables includes OPSI, a new indexing technology,
> allowing to perform data lookups in tables exceeding 10 gigarows
> (10**10 rows) in less than a tenth of a second.
>
>
> Resources
> =
>
> About PyTables: http://www.pytables.org
>
> About the HDF5 library: http://hdfgroup.org/HDF5/
>
> About NumPy: http://numpy.scipy.org/
>
>
> Acknowledgments
> ===
>
> Thanks to many users who provided feature improvements, patches, bug
> reports, support and suggestions.  See the ``THANKS`` file in the
> distribution package for a (incomplete) list of contributors.  Most
> specially, a lot of kudos go to the HDF5 and NumPy makers.
> Without them, PyTables simply would not exist.
>
>
> Share your experience
> =
>
> Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.
>
>
> 
>
>**Enjoy data!**
>
>-- The PyTables Developers
>
>
> --
> Get 100% visibility into Java/.NET code with AppDynamics Lite
> It's a free troubleshooting tool designed for production
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap2
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
--
Get 100% visibility into

Re: [Pytables-users] Chunk selection for optimized data access

2013-06-04 Thread Seref Arikan
I think I've seen this in the release notes of 3.0. This is actually
something that I'm looking into as well. So any experience/feedback about
creating files in memory would be much appreciated.

Best regards
Seref



On Tue, Jun 4, 2013 at 2:09 PM, Andreas Hilboll  wrote:

> On 04.06.2013 05:35, Tim Burgess wrote:
> > My thoughts are:
> >
> > - try it without any compression. Assuming 32 bit floats, your monthly
> > 5760 x 2880 is only about 65MB. Uncompressed data may perform well and
> > at the least it will give you a baseline to work from - and will help if
> > you are investigating IO tuning.
> >
> > - I have found with CArray that the auto chunksize works fairly well.
> > Experiment with that chunksize and with some chunksizes that you think
> > are more appropriate (maybe temporal rather than spatial in your case).
> >
> > On Jun 03, 2013, at 10:45 PM, Andreas Hilboll  wrote:
> >
> >> On 03.06.2013 14:43, Andreas Hilboll wrote:
> >> > Hi,
> >> >
> >> > I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
> >> > (the last dimension represents time, and once per month there'll be
> one
> >> > more 5760x2880 array to add to the end).
> >> >
> >> > Now, extracting timeseries at one index location is slow; e.g., for
> four
> >> > indices, it takes several seconds:
> >> >
> >> > In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
> >> >
> >> > In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
> >> > CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
> >> > Wall time: 7.17 s
> >> >
> >> > I have the feeling that this performance could be improved, but I'm
> not
> >> > sure about how to properly use the `chunkshape` parameter in my case.
> >> >
> >> > Any help is greatly appreciated :)
> >> >
> >> > Cheers, Andreas.
> >>
> >> PS: If I could get significant performance gains by not using an EArray
> >> and therefore re-creating the whole database each month, then this would
> >> also be an option.
> >>
> >> -- Andreas.
>
> Thanks a lot, Anthony and Tim! I was able to get down the readout time
> considerably using  chunkshape=(32, 32, 256) for my 5760x2880x150 array.
> Now, reading times are about as fast as I expected.
>
> the downside is that now, building up the database takes up a lot of
> time, because i get the data in chunks of 5760x2880x1. So I guess that
> writing the data to disk like this causes a load of IO operations ...
>
> My new question: Is there a way to create a file in-memory? If possible,
> I could then build up my database in-memory and then, once it's done,
> just copy the arrays to an on-disk file. Is that possible? If so, how?
>
> Thanks a lot for your help!
>
> -- Andreas.
>
>
>
> --
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users


Re: [Pytables-users] Chunk selection for optimized data access

2013-06-05 Thread Seref Arikan
You would be suprised to see how convenient HDF5 can be in small scale data
:) There are cases where one may need to use binary serialization of a few
thousand items, but still needing metadata, indexing and other nice
features provided by HDF5/pyTables.




On Wed, Jun 5, 2013 at 2:29 AM, Tim Burgess  wrote:

> I was playing around with in-memory HDF5 prior to the 3.0 release. Here's
> an example based on what I was doing.
> I looked over the docs and it does mention that there is an option to
> throw away the 'file' rather than write it to disk.
> Not sure how to do that and can't actually think of a use case where I
> would want to :-)
>
> And be wary, it is H5FD_CORE.
>
>
> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz  wrote:
>
>
> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
> haven't ever used this personally, but it would be great to have an example
> script, if someone wants to write one ;)
>
>
>
> import numpy as np
> import tables
>
> CHUNKY = 30
> CHUNKX = 8640
>
> if __name__ == '__main__':
>
> # create dataset and add global attrs
>
> file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
>
> with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
> example', driver='H5FD_CORE') as h5f:
>
> # dummy some data
> lats = np.empty([4320])
> lons = np.empty([8640])
>
> # create some simple arrays
> lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
> lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
>
> # create a 365 x 4320 x 8640 CArray of 32bit float
> shape = (365, 4320, 8640)
> atom = tables.Float32Atom(dflt=np.nan)
>
> # chunk into daily slices and then further chunk days
> sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
> chunkshape=(1, CHUNKY, CHUNKX))
>
> # dummy up an ndarray
> sst = np.empty([4320, 8640], dtype=np.float32)
> sst.fill(30.0)
>
> # write ndarray to a 2D plane in the HDF5
> sst_node[0] = sst
>
>
>
> --
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> ___
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users