[Pytables-users] PyTables 2.1b3 ready for beta-testing

Francesc Alted Wed, 08 Oct 2008 06:07:59 -0700

Hi List,

After several months of intense hacking, I'm happy to announce the 
availability of PyTables 2.1 beta3 as well as PyTables Pro 2.1 beta3.  
This will probably be the last beta before announcing a release 
candidate later this month.  My plan is to not further change the API 
until the final release, unless I have a compelling reason to do so.


You can get the software from:

http://www.pytables.org/download/preliminary/tables-2.1b3.tar.gz

For those with a Pro license, I've dropped the:

tables-2.1b3.devpro.tar.gz

tarball in their regular download areas.

Please notice that, as this is a beta release, you will have to compile 
the beast yourself.  Also, be in mind that this is a beta quality 
release and not apt for production purposes (I've tested it only with 
Linux 32-bit and 64-bit, but not on Win nor MacOSX).

I'd be very grateful if people can get its hands onto this release and 
act as beta-testers.  Finally, I'm taking some holidays for the rest of 
the week, so expect a response time adequate to this fact ;-)

And now, what's new in 2.1b3:

=======================================
 Release notes for PyTables 2.1 series
=======================================

:Author: Francesc Alted i Abad
:Contact: [EMAIL PROTECTED]
:Author: Ivan Vilata i Balaguer
:Contact: [EMAIL PROTECTED]


Changes from 2.0.4 to 2.1b3
===========================

Main improvements
-----------------

- Now, when opening a node, that will be done directly (i.e. without
  populating first all the parent directories).  So, for opening
  pre-known group and leaf locations, the new code is substantially
  faster (in fact, the cost of these operations is O(1) now).

- The `EArray.truncate()` method has been generalized and implemented as
  `Leaf.truncate()`.  Now, it is possible to truncate all *enlargeable*
  datasets (i.e. all except `Array` and `CArray` objects).  Fixes #174.

- Disabling the LRU node cache is now supported by setting the
  NODE_MAX_SLOTS (in parameters.py) to 0 (this can also be achieved
  through the `nodeCacheSize` parameter of openFile() function).
  Disabling this cache may be useful in situations where you suspect
  that maintaining a LRU node cache is actually reducing performance.
  Besides, this figure can also be negative, meaning that all the
  touched nodes will be kept in an internal dictionary.  See more info
  about this features in the updated "Getting the most from the node LRU
  cache" section of chapter 5 of User's Guide.


Main improvements (Pro edition)
-------------------------------

- New light indexes that can take up to 4x less space than 2.0 indexes,
  and more than 15x less space than indexes in traditional databases.
  Four levels of index "lightness", namely ``ultralight``, ``light``,
  ``medium`` and ``full`` (the latter being the one that implemented the
  2.0 version), are available so that the user will be able to choose
  the most appropriate for her needs.

- The index query code has been completely revamped and it is based now
  on the concept of chunkmaps.  This allows for a much more effective
  way to retrieve table data in queries that have low selectivity, while
  retaining good performance for high selectivity ones.

- A new query optimizer being able to use several indexes simultaneously
  in a broad range of complex queries. For example, in the query::

    (((c_int32 == 3) | (c_bool == True)) & (c_int32 == 5)) & (c_extra > 
0)

  if ``c_int32`` and ``c_bool`` columns are indexed but ``c_extra`` is
  not, both ``c_int32`` and ``c_bool`` indexes will be used. That will
  greatly enhance the response times of potentially complicated queries.

- An additional optimization in the index creation process permits to
  achieve completely sorted indexes (CSI), allowing not only to get
  better performance in queries, but also to create completely sorted
  tables ordered by a specific field.


API additions from 2.0.4 to 2.1b3
---------------------------------

- The `AttributeSet` class has received the next dictionary like
  methods: `__getitem__()`, `__setitem__()` and `__delitem__()`, so that
  you can do things like::

    for name in node._v_attrs._f_list():
        print "name: %s, value: %s" % (name, node._v_attrs[name])

- New `File.fileno()` added.  This returns the underlying OS file
  descriptor for the file.  This is meant to allow `File` objects to
  better interact with the `fcntl` module.

- A new `chunkshape` argument has been added to `Leaf.copy()` allowing
  to specify a chunkshape.  It can also take the special values 'auto'
  (compute a sensible value) and 'keep' (keep the original value, which
  is the default).

- Added a new '--chunkshape' flag to the `ptrepack` console command that
  corresponds to the new `chunkshape` added to `Leaf.copy()`.

- `File.copyNode()` can copy now complete hierarchies directly from the
  root.  This can be useful when one wants to create a new file by
  merging the contents of others.


API additions from 2.0.4 to 2.1b3 (Pro edition)
-----------------------------------------------

- A new `Table.itersorted()` iterator allows to iterate through a table
  following the order of a certain index.  It supports iteration on
  ranges, including negative steps (i.e. reverse sorted order).

- New `Table.readSorted()` method that can read a table following the
  order of a certain index.  It supports the reads on ranges, including
  negative steps (i.e. reverse sorted order).

- New `Table.colindexes` property that returns a dictionary with the
  indexes of the indexed columns in table.

- A new `sortby` argument has been added to Table.copy() allowing to a
  Table to be sorted during the copy operation.

- Added a new `propindexes` argument in `Table.copy()`.  If true, the
  indexes in the source table are propagated (created) to the new table.
  If false (the default), the indexes are not propagated.

- New public `Index.readSorted()` and `Index.readIndices()` methods that
  allow direct access to the index data.

- Added new '--sortby' (sort a table by a column key), '--forceCSI'
  (force the creation of a CSI index) and '--propindexes' (propagate the
  indexes in original tables) flags to the `ptrepack` utility.


Bug fixes
---------

- In order to avoid a long-standing bug, all the possible 64-bit class
  attributes of leaf objects (like `nrows`, `shape` or `nrow`) have been
  converted into a new `SizeType` type (actually an alias for
  `numpy.int64`).  This change should be backward compatible with
  existing programs, so you should not need any action to adapt to this.
  Fixes #118.

- When in `ptrepack` a range is not specified, all the elements of
  leaves are copied now.  Before, only the first row was copied, which
  was clearly wrong.

- The `Atom` default value (`Atom.dflt`) is honored now when creating
  `CArrays`.  Fixes #176.


Backward incompatible API changes from 2.0.4 to 2.1b3
=====================================================

- The semantics of `Leaf.copy()` has changed: before the chunkshape of
  destination was computed 'auto'matically while now the default is that
  the value is 'keep't.  This behaviour is thought to satisfy better the
  least surprise principle.

- The `trMap` argument has been removed from the `tables.openFile()`
  function.  Also, the `Node._v_hdf5name` attribute has been removed as
  well.  Fixes #117.

- The `sort` parameter of `Table.itersequence()` has been removed as it
  will not allow to sort sequences larger than memory.  Moreover, it is
  not clear that the sorting operation would be a clear advantage in
  every situation.


Backward incompatible API changes from 2.0.4 to 2.1b3 (Pro edition)
===================================================================

- The `Column.createIndex()` has received a new parameter named `kind`
  which is the first now in the argument list.  This is intentional and
  *incompatible* with previous arglist, so that people should update
  their existing `Column.createIndex()` calls.

- Added a new `Column.createCSIndex()` as a handy way to create a
  completely sorted index (CSI).

- The `Table.indexFilters` property has been removed (after a period of
  ``DeprecationWarnings``).  If you want to change filters in indexes,
  please use the `filters` parameter of the `Column.createIndex()`
  method (and the like).

- `Table.willQueryUseIndexing()` has changed its return value from a
  list to a frozen set of usable indexed columns.

- Now, the copy of the 'AUTO_INDEX' system attribute of the `Index`
  class is done only if the `copyuserattrs` in `Table.copy()` is true
  (the default).


----

  **Enjoy data!**

  -- The PyTables Team


-- 
Francesc Alted
Freelance developer
Tel +34-964-282-249

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

[Pytables-users] PyTables 2.1b3 ready for beta-testing

Reply via email to