[Python-announce] ANN: numexpr 2.8.8 released

2023-12-11 Thread Francesc Alted

Announcing NumExpr 2.8.8


Hi everyone,

NumExpr 2.8.8 is a release to deal mainly with issues appearing with
upcoming `NumPy` 2.0.  Also, some small fixes (support for simple complex
expressions like `ne.evaluate('1.5j')`) and improvements are included.

Project documentation is available at:

http://numexpr.readthedocs.io/

Changes from 2.8.7 to 2.8.8
---

* Fix re_evaluate not taking global_dict as argument. Thanks to Teng Liu
  (@27rabbitlt).

* Fix parsing of simple complex numbers.  Now, `ne.evaluate('1.5j')` works.
  Thanks to Teng Liu (@27rabbitlt).

* Fixes for upcoming NumPy 2.0:

  * Replace npy_cdouble with C++ complex. Thanks to Teng Liu (@27rabbitlt).
  * Add NE_MAXARGS for future numpy change NPY_MAXARGS. Now it is set to 64
to match NumPy 2.0 value. Thanks to Teng Liu (@27rabbitlt).

What's Numexpr?
---

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It has multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

Where I can find Numexpr?
-

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Documentation is hosted at:

http://numexpr.readthedocs.io/en/latest/

Share your experience
-

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.

Enjoy data!

-- 
Francesc Alted
___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: arch...@mail-archive.com


[Python-announce] ANN: NumExpr 2.8.7

2023-09-26 Thread Francesc Alted
Hi everyone,

NumExpr 2.8.7 is a release to deal with issues related to downstream
`pandas`
and other projects where the sanitization blacklist was triggering issues
in their
evaluate. Hopefully, the new sanitization code would be much more robust
now.

For those who do not wish to have sanitization on by default, it can be
changed
by setting an environment variable, `NUMEXPR_SANITIZE=0`.

If you use `pandas` in your packages it is advisable you pin

`numexpr >= 2.8.7`

in your requirements.

Project documentation is available at:

http://numexpr.readthedocs.io/

Changes from 2.8.6 to 2.8.7
---

* More permissive rules in sanitizing regular expression: allow to access
digits
  after the . with scientific notation.  Thanks to Thomas Vincent.

* Don't reject double underscores that are not at the start or end of a
variable
  name (pandas uses those), or scientific-notation numbers with digits
after the
  decimal point.  Thanks to Rebecca Palmer.

* Do not use `numpy.alltrue` in the test suite, as it has been deprecated
  (replaced by `numpy.all`).  Thanks to Rebecca Chen.

* Wheels for Python 3.12.  Wheels for 3.7 and 3.8 are not generated anymore.

What's Numexpr?
---

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It has multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

Where I can find Numexpr?
-

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Documentation is hosted at:

http://numexpr.readthedocs.io/en/latest/

Share your experience
-

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.

Enjoy data!

-- 
Francesc Alted
___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: arch...@mail-archive.com


ANN: python-blosc 1.9.2 released!

2020-09-09 Thread Francesc Alted
=
Announcing python-blosc 1.9.2
=

What is new?


This is a maintenance release to better support recent versions of Python
(3.8 and 3.9).  Also, and due to the evolution of modern CPUs,
the number of default threads has been raised to 8 (from 4).
Finally, zero-copy decompression is now supported by allowing bytes-like
input.  Thanks to Lehman Garrison.

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor optimized
for binary data.  It has been designed to transmit data to the processor
cache faster than the traditional, non-compressed, direct memory fetch
approach via a memcpy() OS call.  Blosc works well for compressing
numerical arrays that contain data with relatively low entropy, like
sparse data, time series, grids with regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library, with added functions (`compress_ptr()`
and `pack_array()`) for efficiently compressing NumPy arrays, minimizing
the number of memory copies during the process.  python-blosc can be
used to compress in-memory data buffers for transmission to other
machines, persistence or just as a compressed cache.

There is also a handy tool built on top of python-blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a command line
interface that allows you to compress large binary data files on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Sources repository
==

The sources and documentation are managed through github services at:

http://github.com/Blosc/python-blosc





  **Enjoy data!**

-- 
The Blosc Development Team
___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: arch...@mail-archive.com


ANN: python-blosc 1.9.0 released

2020-03-29 Thread Francesc Alted
=
Announcing python-blosc 1.9.0
=

What is new?


In this release we got rid of the support for Python 2.7 and 3.5.
Also, we fixed the copy of the leftovers of a chunk when its size is not a
multiple of the typesize.  Although this is a very unusual situation,
it can certainly happen (e.g.
https://github.com/Blosc/python-blosc/issues/220).
Finally, sources for C-Blosc v1.18.1 have been included.

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor optimized
for binary data.  It has been designed to transmit data to the processor
cache faster than the traditional, non-compressed, direct memory fetch
approach via a memcpy() OS call.  Blosc works well for compressing
numerical arrays that contains data with relatively low entropy, like
sparse data, time series, grids with regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library, with added functions (`compress_ptr()`
and `pack_array()`) for efficiently compressing NumPy arrays, minimizing
the number of memory copies during the process.  python-blosc can be
used to compress in-memory data buffers for transmission to other
machines, persistence or just as a compressed cache.

There is also a handy tool built on top of python-blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a commmand line
interface that allows you to compress large binary datafiles on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Sources repository
==

The sources and documentation are managed through github services at:

http://github.com/Blosc/python-blosc





  **Enjoy data!**
--
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/

Support the Python Software Foundation:
http://www.python.org/psf/donations/


bcolz, a column store for Python, 1.1.2 released

2017-02-10 Thread Francesc Alted
==
Announcing bcolz 1.1.2
==

What's new
==

This is a maintenance release that brings quite a lot of improvements.
Here are the highlights:

- Zstd is a supported codec now.  Fixes #331.

- C-Blosc updated to 1.11.2.

- Added a new `defaults_ctx` context so that users can select defaults
  easily without changing global behaviour. For example::

   with bcolz.defaults_ctx(vm="python", cparams=bcolz.cparams(clevel=0)):
  cout = bcolz.eval("(x + 1) < 0")

- Fixed a crash occurring in `ctable.todataframe()` when both `columns`
  and `orient='columns'` were specified.  PR #311.  Thanks to Peter
  Quackenbush.

- Use `pkg_resources.parse_version()` to test for version of packages.
  Fixes #322 (PY27 bcolz with dask unicode error).

- New package recipe for conda-forge.  Now you can install bcolz with:
`conda install -c conda-forge bcolz`.  Thanks to Alistair Miles.

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst

For some comparison between bcolz and other compressed data containers,
see:

https://github.com/FrancescAlted/DataContainersTutorials

specially chapters 3 (in-memory containers) and 4 (on-disk containers).


What it is
==

*bcolz* provides **columnar and compressed** data containers that can
live either on-disk or in-memory.  The compression is carried out
transparently by Blosc, an ultra fast meta-compressor that is optimized
for binary data.  Compression is active by default.

Column storage allows for efficiently querying tables with a large
number of columns.  It also allows for cheap addition and removal of
columns.  Lastly, high-performance iterators (like ``iter()``,
``where()``) for querying the objects are provided.

bcolz can use diffent backends internally (currently numexpr,
Python/NumPy or dask) so as to accelerate many vector and query
operations (although it can use pure NumPy for doing so too).  Moreover,
since the carray/ctable containers can be disk-based, it is possible to
use them for seamlessly performing out-of-memory computations.

While NumPy is used as the standard way to feed and retrieve data from
bcolz internal containers, but it also comes with support for
high-performance import/export facilities to/from `HDF5/PyTables tables
<http://www.pytables.org>`_ and `pandas dataframes
<http://pandas.pydata.org>`_.

Have a look at how bcolz and the Blosc compressor, are making a better
use of the memory without an important overhead, at least for some real
scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

bcolz has minimal dependencies (NumPy is the only strict requisite),
comes with an exhaustive test suite, and it is meant to be used in
production. Example users of bcolz are Visualfabriq
(http://www.visualfabriq.com/), Quantopian (https://www.quantopian.com/)
and scikit-allel:

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* scikit-allel:

  * Exploratory analysis of large scale genetic variation data.
  * https://github.com/cggh/scikit-allel


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst

----

  **Enjoy data!**

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: python-blosc 1.5.0 released

2017-02-08 Thread Francesc Alted
=
Announcing python-blosc 1.5.0
=

What is new?


A new `blosc.set_releasegil()` function that allows to release/acquire
the GIL at will.  Thanks to Robert McLeod.

Also, C-Blosc has been updated to 1.11.2.

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor optimized
for binary data.  It has been designed to transmit data to the processor
cache faster than the traditional, non-compressed, direct memory fetch
approach via a memcpy() OS call.  Blosc works well for compressing
numerical arrays that contains data with relatively low entropy, like
sparse data, time series, grids with regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library, with added functions (`compress_ptr()`
and `pack_array()`) for efficiently compressing NumPy arrays, minimizing
the number of memory copies during the process.  python-blosc can be
used to compress in-memory data buffers for transmission to other
machines, persistence or just as a compressed cache.

There is also a handy tool built on top of python-blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a commmand line
interface that allows you to compress large binary datafiles on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Sources repository
==

The sources and documentation are managed through github services at:

http://github.com/Blosc/python-blosc





  **Enjoy data!**

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: numexpr 2.6.2 released!

2017-01-30 Thread Francesc Alted
=

 Announcing Numexpr 2.6.2

=


What's new

==


This is a maintenance release that fixes several issues, with special

emphasis in keeping compatibility with newer NumPy versions.  Also,

initial support for POWER processors is here.  Thanks to Oleksandr

Pavlyk, Alexander Shadchin, Breno Leitao, Fernando Seiti Furusato and

Antonio Valentino for their nice contributions.


In case you want to know more in detail what has changed in this

version, see:


https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst



What's Numexpr

==


Numexpr is a fast numerical expression evaluator for NumPy.  With it,

expressions that operate on arrays (like "3*a+4*b") are accelerated

and use less memory than doing the same calculation in Python.


It wears multi-threaded capabilities, as well as support for Intel's

MKL (Math Kernel Library), which allows an extremely fast evaluation

of transcendental functions (sin, cos, tan, exp, log...) while

squeezing the last drop of performance out of your multi-core

processors.  Look here for a some benchmarks of numexpr using MKL:


https://github.com/pydata/numexpr/wiki/NumexprMKL


Its only dependency is NumPy (MKL is optional), so it works well as an

easy-to-deploy, easy-to-use, computational engine for projects that

don't want to adopt other solutions requiring more heavy dependencies.


Where I can find Numexpr?

=


The project is hosted at GitHub in:


https://github.com/pydata/numexpr


You can get the packages from PyPI as well (but not for RC releases):


http://pypi.python.org/pypi/numexpr


Share your experience

=


Let us know of any bugs, suggestions, gripes, kudos, etc. you may

have.


Enjoy data!

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: bcolz 1.1.0 released!

2016-06-10 Thread Francesc Alted
==
Announcing bcolz 1.1.0
==

What's new
==

This release brings quite a lot of changes.  After format stabilization
in 1.0, the focus is now in fine-tune many operations (specially queries
in ctables), as well as widening the available computational engines.

Highlights:

* Much improved performance of ctable.where() and ctable.whereblocks().
  Now bcolz is getting closer than ever to fundamental memory limits
  during queries (see the updated benchmarks in the data containers
  tutorial below).

* Better support for Dask; i.e. GIL is released during Blosc operation
  when bcolz is called from a multithreaded app (like Dask).  Also, Dask
  can be used as another virtual machine for evaluating expressions (so
  now it is possible to use it during queries too).

* New ctable.fetchwhere() method for getting the rows fulfilling some
  condition in one go.

* New quantize filter for allowing lossy compression of floating point
  data.

* It is possible to create ctables with more than 255 columns now.
  Thanks to Skipper Seabold.

* The defaults during carray creation are scalars now.  That allows to
  create highly dimensional data containers more efficiently.

* carray object does implement the __array__() special method now. With
  this, interoperability with numpy arrays is easier and faster.

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst

For some comparison between bcolz and other compressed data containers,
see:

https://github.com/FrancescAlted/DataContainersTutorials

specially chapters 3 (in-memory containers) and 4 (on-disk containers).


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Example users of bcolz are Visualfabriq (http://www.visualfabriq.com/),
and Quantopian (https://www.quantopian.com/):

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html



Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: bcolz 1.0.0 (final) released

2016-04-07 Thread Francesc Alted
=
Announcing bcolz 1.0.0 final
=

What's new
==

Yeah, 1.0.0 is finally here.  We are not introducing any exciting new
feature (just some optimizations and bug fixes), but bcolz is already 6
years old and it implements most of the capabilities that it was
designed for, so I decided to release a 1.0.0 meaning that the format is
declared stable and that people can be assured that future bcolz
releases will be able to read bcolz 1.0 data files (and probably much
earlier ones too) for a long while.  Such a format is fully described
at:

https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst

Also, a 1.0.0 release means that bcolz 1.x series will be based on
C-Blosc 1.x series (https://github.com/Blosc/c-blosc).  After C-Blosc
2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is
expected taking advantage of shiny new features of C-Blosc2 (more
compressors, more filters, native variable length support and the
concept of super-chunks), which should be very beneficial for next bcolz
generation.

Important: this is a final release and there are no important known bugs
there, so this is recommended to be used in production.  Enjoy!

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst

For some comparison between bcolz and other compressed data containers,
see:

https://github.com/FrancescAlted/DataContainersTutorials

specially chapters 3 (in-memory containers) and 4 (on-disk containers).

Also, if it happens that you are in Madrid during this weekend, you can
drop by my tutorial and talk:

http://pydata.org/madrid2016/schedule/

See you!


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) ,
Quantopian
(https://www.quantopian.com/) and Scikit-Allel (
https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: python-blosc 1.3.1

2016-04-07 Thread Francesc Alted
=
Announcing python-blosc 1.3.1
=

What is new?


This is an important release in terms of stability.  Now, the -O1 flag
for compiling the included C-Blosc sources on Linux.  This represents
slower performance, but fixes the nasty issue #110.  In case maximum
speed is needed, please `compile python-blosc with an external C-Blosc
library <
https://github.com/Blosc/python-blosc#compiling-with-an-installed-blosc-library-recommended
)>`_.

Also, symbols like BLOSC_MAX_BUFFERSIZE have been replaced for allowing
backward compatibility with python-blosc 1.2.x series.

For whetting your appetite, look at some benchmarks here:

https://github.com/Blosc/python-blosc#benchmarking

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor optimized
for binary data.  It has been designed to transmit data to the processor
cache faster than the traditional, non-compressed, direct memory fetch
approach via a memcpy() OS call.  Blosc works well for compressing
numerical arrays that contains data with relatively low entropy, like
sparse data, time series, grids with regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library, with added functions (`compress_ptr()`
and `pack_array()`) for efficiently compressing NumPy arrays, minimizing
the number of memory copies during the process.  python-blosc can be
used to compress in-memory data buffers for transmission to other
machines, persistence or just as a compressed cache.

There is also a handy tool built on top of python-blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a commmand line
interface that allows you to compress large binary datafiles on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Sources repository
==

The sources and documentation are managed through github services at:

http://github.com/Blosc/python-blosc




  **Enjoy data!**

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: numexpr 2.5.2 released

2016-04-07 Thread Francesc Alted
=
 Announcing Numexpr 2.5.2
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a maintenance release shaking some remaining problems with VML
(it is nice to see how Anaconda VML's support helps raising hidden
issues).  Now conj() and abs() are actually added as VML-powered
functions, preventing the same problems than log10() before (PR #212);
thanks to Tom Kooij.  Upgrading to this release is highly recommended.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: python-blosc 1.3.0 released

2016-03-31 Thread Francesc Alted
=
Announcing python-blosc 1.3.0
=

What is new?


There is support for newest C-Blosc.  As such, C-Blosc 1.8.0 is being
distributed internally.  Support for the new `BITSHUFFLE` filter,
allowing for more compression ratios in many cases, at the expense of
some slowdown.  For details see:

http://python-blosc.blosc.org/tutorial.html#using-different-filters

You can also run some benchmarks including different codecs and filters:

https://github.com/Blosc/python-blosc/blob/master/bench/compress_ptr.py

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc is the first compressor that is meant not only to reduce the size
of large datasets on-disk or in-memory, but also to accelerate object
manipulations that are memory-bound
(http://www.blosc.org/docs/StarvingCPUs.pdf).  See
http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on
how much speed it can achieve in some datasets.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library.

There is also a handy tool built on Blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a commmand line
interface that allows you to compress large binary datafiles on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you must omit the 'python-' prefix


Download sources


The sources are managed through github services at:

http://github.com/Blosc/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://python-blosc.blosc.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/Blosc/python-blosc/blob/master/LICENSES

for more details.



  **Enjoy data!**

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: numexpr 2.5.1 released

2016-03-31 Thread Francesc Alted
=
 Announcing Numexpr 2.5.1
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

Fixed a critical bug that caused wrong evaluations of log10() and
conj().  These produced wrong results when numexpr was compiled with
Intel's MKL (which is a popular build since Anaconda ships it by
default) and non-contiguous data.  This is considered a *critical* bug
and upgrading is highly recommended. Thanks to Arne de Laat and Tom
Kooij for reporting and providing a test unit.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


[ANN] bcolz 1.0.0 RC1 released

2016-03-08 Thread Francesc Alted
==
Announcing bcolz 1.0.0 RC1
==

What's new
==

Yeah, 1.0.0 is finally here.  We are not introducing any exciting new
feature (just some optimizations and bug fixes), but bcolz is already 6
years old and it implements most of the capabilities that it was
designed for, so I decided to release a 1.0.0 meaning that the format is
declared stable and that people can be assured that future bcolz
releases will be able to read bcolz 1.0 data files (and probably much
earlier ones too) for a long while.  Such a format is fully described
at:

https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst

Also, a 1.0.0 release means that bcolz 1.x series will be based on
C-Blosc 1.x series (https://github.com/Blosc/c-blosc).  After C-Blosc
2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is
expected taking advantage of shiny new features of C-Blosc2 (more
compressors, more filters, native variable length support and the
concept of super-chunks), which should be very beneficial for next bcolz
generation.

Important: this is a Release Candidate, so please test it as much as you
can.  If no issues would appear in a week or so, I will proceed to tag
and release 1.0.0 final.  Enjoy!

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html

Installing
==

bcolz is in the PyPI repository, so installing it is easy::

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: numexpr 2.5

2016-02-06 Thread Francesc Alted
=
 Announcing Numexpr 2.5
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

In this version, a lock has been added so that numexpr can be called
from multithreaded apps.  Mind that this does not prevent numexpr
to use multiple cores internally.  Also, a new min() and max()
functions have been added.  Thanks to contributors!

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: bcolz 0.12.0 released

2015-11-16 Thread Francesc Alted
===
Announcing bcolz 0.12.0
===

What's new
==

This release copes with some compatibility issues with NumPy 1.10.
Also, several improvements have happened in the installation procedure,
allowing for a smoother process.  Last but not least, the tutorials
haven been migrated to the IPython notebook format (a huge thank you to
Francesc Elies for this!).  This will hopefully will allow users to
better exercise the different features of bcolz.

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html

Installing
==

bcolz is in the PyPI repository, so installing it is easy::

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: numexpr 2.4.6 released

2015-11-02 Thread Francesc Alted
Hi,

This is a quick release fixing some reported problems in the 2.4.5 version
that I announced a few hours ago.  Hope I have fixed the main issues now.
Now, the official announcement:

=
 Announcing Numexpr 2.4.6
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a quick maintenance version that offers better handling of
MSVC symbols (#168, Francesc Alted), as well as fising some
UserWarnings in Solaris (#189, Graham Jones).

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: numexpr 2.4.5 released

2015-11-02 Thread Francesc Alted
=
 Announcing Numexpr 2.4.5
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a maintenance release where an important bug in multithreading
code has been fixed (#185 Benedikt Reinartz, Francesc Alted).  Also,
many harmless warnings (overflow/underflow, divide by zero and others)
in the test suite have been silenced  (#183, Francesc Alted).

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: bcolz 0.11.3 released!

2015-10-05 Thread Francesc Alted
===
Announcing bcolz 0.11.3
===

What's new
==

Implemented new feature (#255): bcolz.zeros() can create new ctables
too, either empty or filled with zeros. (#256 @FrancescElies
@FrancescAlted).

Also, in previous, non announced versions (0.11.1 and 0.11.2), new
dependencies were added and other fixes are there too.

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html

Installing
==

bcolz is in the PyPI repository, so installing it is easy::

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: python-blosc 1.2.8 released

2015-09-16 Thread Francesc Alted
=
Announcing python-blosc 1.2.8
=

What is new?


This is a maintenance release.  Internal C-Blosc has been upgraded to
1.7.0 (although new bitshuffle support has not been made public, as it
seems not ready for production yet).

Also, there is support for bytes-like objects that support the buffer
interface as input to ``compress`` and ``decompress``. On Python 2.x
this includes unicode, on Python 3.x it doesn't.  Thanks to Valentin
Haenel.

Finally, a memory leak in ``decompress```has been hunted and fixed.  And
new tests have been added to catch possible leaks in the future.  Thanks
to Santi Villalba.

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc is the first compressor that is meant not only to reduce the size
of large datasets on-disk or in-memory, but also to accelerate object
manipulations that are memory-bound
(http://www.blosc.org/docs/StarvingCPUs.pdf).  See
http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on
how much speed it can achieve in some datasets.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library.

There is also a handy tool built on Blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a commmand line
interface that allows you to compress large binary datafiles on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you must omit the 'python-' prefix


Download sources


The sources are managed through github services at:

http://github.com/Blosc/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://python-blosc.blosc.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/Blosc/python-blosc/blob/master/LICENSES

for more details.



  **Enjoy data!**

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


numexpr 2.4.4 released

2015-09-14 Thread Francesc Alted
=
 Announcing Numexpr 2.4.4
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a maintenance release which contains several bug fixes, like
better testing on Python3 platform and some harmless data race.  Among
the enhancements, AppVeyor support is here and OMP_NUM_THREADS is
honored as a fallback in case NUMEXPR_NUM_THREADS is not set.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.

Enjoy data!

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: python-blosc 1.2.7 released

2015-05-06 Thread Francesc Alted
=
Announcing python-blosc 1.2.7
=

What is new?


Updated to use c-blosc v1.6.1.  Although that this supports AVX2, it is
not enabled in python-blosc because we still need a way to devise how to
detect AVX2 in the underlying platform.

At any rate, c-blosc 1.6.1 fixed an important bug in the blosclz codec that
a release was deemed important.

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc is the first compressor that is meant not only to reduce the size
of large datasets on-disk or in-memory, but also to accelerate object
manipulations that are memory-bound
(http://www.blosc.org/docs/StarvingCPUs.pdf).  See
http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on
how much speed it can achieve in some datasets.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library.

There is also a handy tool built on Blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a commmand line
interface that allows you to compress large binary datafiles on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the python- prefix


Download sources


The sources are managed through github services at:

http://github.com/Blosc/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://python-blosc.blosc.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/Blosc/python-blosc/blob/master/LICENSES

for more details.



  **Enjoy data!**

-- 
Francesc Alted
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: PyTables 3.2.0 (final) released!

2015-05-06 Thread Francesc Alted
===
 Announcing PyTables 3.2.0
===

We are happy to announce PyTables 3.2.0.

***
IMPORTANT NOTICE:

If you are a user of PyTables, it needs your help to keep going.  Please
read the next thread as it contains important information about the
future (or the lack of it) of the project:

https://groups.google.com/forum/#!topic/pytables-users/yY2aUa4H7W4

Thanks!
***


What's new
==

This is a major release of PyTables and it is the result of more than a
year of accumulated patches, but most specially it fixes a couple of
nasty problem with indexed queries not returning the correct results in
some scenarios.  There are many usablity and performance improvements
too.

In case you want to know more in detail what has changed in this
version, please refer to: http://www.pytables.org/release_notes.html

You can install it via pip or download a source package with generated
PDF and HTML docs from:
http://sourceforge.net/projects/pytables/files/pytables/3.2.0

For an online version of the manual, visit:
http://www.pytables.org/usersguide/index.html


What it is?
===

PyTables is a library for managing hierarchical datasets and
designed to efficiently cope with extremely large amounts of data with
support for full 64-bit file addressing.  PyTables runs on top of
the HDF5 library and NumPy package for achieving maximum throughput and
convenient use.  PyTables includes OPSI, a new indexing technology,
allowing to perform data lookups in tables exceeding 10 gigarows
(10**10 rows) in less than a tenth of a second.


Resources
=

About PyTables: http://www.pytables.org

About the HDF5 library: http://hdfgroup.org/HDF5/

About NumPy: http://numpy.scipy.org/


Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy makers.
Without them, PyTables simply would not exist.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.




  **Enjoy data!**

  -- The PyTables Developers
-- 
https://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: bcolz 0.7.1 released

2014-07-30 Thread Francesc Alted
, kudos, etc. you may
have.

**Enjoy Data!**

--
Francesc Alted

--
https://mail.python.org/mailman/listinfo/python-announce-list

   Support the Python Software Foundation:
   http://www.python.org/psf/donations/


ANN: bcolz 0.7.0, columnar, chunked and compressed datasets at your fingertips

2014-07-24 Thread Francesc Alted

==
Announcing bcolz 0.7.0
==

What's new
==

In this release, support for Python 3 has been added, Pandas and
HDF5/PyTables conversion, support for different compressors via latest
release of Blosc, and a new `iterblocks()` iterator.

Also, intensive benchmarking has lead to an important tuning of buffer
sizes parameters so that compression and evaluation goes faster than
ever.  Together, bcolz and the Blosc compressor, are finally fullfilling
the promise of accelerating memory I/O, at least for some real
scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots 



``bcolz`` is a renaming of the ``carray`` project.  The new goals for
the project are to create simple, yet flexible compressed containers,
that can live either on-disk or in-memory, and with some
high-performance iterators (like `iter()`, `where()`) for querying them.

For more detailed info, see the release notes in:
https://github.com/Blosc/bcolz/wiki/Release-Notes


What it is
==

bcolz provides columnar and compressed data containers.  Column storage
allows for efficiently querying tables with a large number of columns.
It also allows for cheap addition and removal of column.  In addition,
bcolz objects are compressed by default for reducing memory/disk I/O
needs.  The compression process is carried out internally by Blosc, a
high-performance compressor that is optimized for binary data.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.


Installing
==

bcolz is in the PyPI repository, so installing it is easy:

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt




  **Enjoy data!**

-- Francesc Alted
--
https://mail.python.org/mailman/listinfo/python-list


ANN: python-blosc 1.2.7 released

2014-07-07 Thread Francesc Alted

=
Announcing python-blosc 1.2.4
=

What is new?


This is a maintenance release, where included c-blosc sources have been
updated to 1.4.0.  This adds support for non-Intel architectures, most
specially those not supporting unaligned access.

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc is the first compressor that is meant not only to reduce the size
of large datasets on-disk or in-memory, but also to accelerate object
manipulations that are memory-bound
(http://www.blosc.org/docs/StarvingCPUs.pdf).  See
http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on
how much speed it can achieve in some datasets.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library.

There is also a handy command line and Python library for Blosc called
Bloscpack (https://github.com/Blosc/bloscpack) that allows you to
compress large binary datafiles on-disk.


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the python- prefix


Download sources


The sources are managed through github services at:

http://github.com/Blosc/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://python-blosc.blosc.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/Blosc/python-blosc/blob/master/LICENSES

for more details.



  **Enjoy data!**

--
Francesc Alted

--
https://mail.python.org/mailman/listinfo/python-announce-list

   Support the Python Software Foundation:
   http://www.python.org/psf/donations/


[CORRECTION] python-blosc 1.2.4 released (Was: ANN: python-blosc 1.2.7 released)

2014-07-07 Thread Francesc Alted
Indeed it was 1.2.4 the version just released and not 1.2.7.  Sorry for 
the typo!


Francesc

On 7/7/14, 8:20 PM, Francesc Alted wrote:

=
Announcing python-blosc 1.2.4
=

What is new?


This is a maintenance release, where included c-blosc sources have been
updated to 1.4.0.  This adds support for non-Intel architectures, most
specially those not supporting unaligned access.

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc is the first compressor that is meant not only to reduce the size
of large datasets on-disk or in-memory, but also to accelerate object
manipulations that are memory-bound
(http://www.blosc.org/docs/StarvingCPUs.pdf).  See
http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on
how much speed it can achieve in some datasets.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library.

There is also a handy command line and Python library for Blosc called
Bloscpack (https://github.com/Blosc/bloscpack) that allows you to
compress large binary datafiles on-disk.


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the python- prefix


Download sources


The sources are managed through github services at:

http://github.com/Blosc/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://python-blosc.blosc.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/Blosc/python-blosc/blob/master/LICENSES

for more details.



  **Enjoy data!**




--
Francesc Alted

--
https://mail.python.org/mailman/listinfo/python-announce-list

   Support the Python Software Foundation:
   http://www.python.org/psf/donations/


ANN: numexpr 2.4 is out

2014-04-19 Thread Francesc Alted


 Announcing Numexpr 2.4


Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

A new `contains()` function has been added for detecting substrings in
strings.  Only plain strings (bytes) are supported for now (see ticket
#142).  Thanks to Marcin Krol.

You can have a glimpse on how `contains()` works in this notebook:

http://nbviewer.ipython.org/gist/FrancescAlted/10595974

where it can be seen that this can make substring searches more
than 10x faster than with regular Python.

You can find the source for the notebook here:

https://github.com/FrancescAlted/ngrams

Also, there is a new version of setup.py that allows better management
of the NumPy dependency during pip installs.  Thanks to Aleks Bunin.

Windows related bugs have been addressed and (hopefully) squashed.
Thanks to Christoph Gohlke.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

--
Francesc Alted

--
Francesc Alted

--
https://mail.python.org/mailman/listinfo/python-announce-list

   Support the Python Software Foundation:
   http://www.python.org/psf/donations/


ANN: python-blosc 1.2.0 released

2014-01-28 Thread Francesc Alted
/ContinuumIO/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://blosc.pydata.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/ContinuumIO/python-blosc/blob/master/LICENSES

for more details.



  **Enjoy data!**

--
Francesc Alted
Continuum Analytics, Inc.

--
https://mail.python.org/mailman/listinfo/python-announce-list

   Support the Python Software Foundation:
   http://www.python.org/psf/donations/


ANN: BLZ 0.6.1 has been released

2014-01-28 Thread Francesc Alted

Announcing BLZ 0.6 series
=

What it is
--

BLZ is a chunked container for numerical data.  Chunking allows for
efficient enlarging/shrinking of data container.  In addition, it can
also be compressed for reducing memory/disk needs.  The compression
process is carried out internally by Blosc, a high-performance
compressor that is optimized for binary data.

The main objects in BLZ are `barray` and `btable`.  `barray` is meant
for storing multidimensional homogeneous datasets efficiently.
`barray` objects provide the foundations for building `btable`
objects, where each column is made of a single `barray`.  Facilities
are provided for iterating, filtering and querying `btables` in an
efficient way.  You can find more info about `barray` and `btable` in
the tutorial:

http://blz.pydata.org/blz-manual/tutorial.html

BLZ can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too)
either from memory or from disk.  In the future, it is planned to use
Numba as the computational kernel and to provide better Blaze
(http://blaze.pydata.org) integration.


What's new
--

BLZ has been branched off from the Blaze project
(http://blaze.pydata.org).  BLZ was meant as a persistent format and
library for I/O in Blaze.  BLZ in Blaze is based on previous carray
0.5 and this is why this new version is labeled 0.6.

BLZ supports completely transparent storage on-disk in addition to
memory.  That means that *everything* that can be done with the
in-memory container can be done using the disk as well.

The advantages of a disk-based container is that the addressable space
is much larger than just your available memory.  Also, as BLZ is based
on a chunked and compressed data layout based on the super-fast Blosc
compression library, the data access speed is very good.

The format chosen for the persistence layer is based on the
'bloscpack' library and described in the Persistent format for BLZ
chapter of the user manual ('docs/source/persistence-format.rst').
More about Bloscpack here: https://github.com/esc/bloscpack

You may want to know more about BLZ in this blog entry:
http://continuum.io/blog/blz-format

In this version, support for Blosc 1.3 has been added, that meaning
that a new `cname` parameter has been added to the `bparams` class, so
that you can select you preferred compressor from 'blosclz', 'lz4',
'lz4hc', 'snappy' and 'zlib'.

Also, many bugs have been fixed, providing a much smoother experience.

CAVEAT: The BLZ/bloscpack format is still evolving, so don't trust on
forward compatibility of the format, at least until 1.0, where the
internal format will be declared frozen.


Resources
-

Visit the main BLZ site repository at:
http://github.com/ContinuumIO/blz

Read the online docs at:
http://blz.pydata.org/blz-manual/index.html

Home of Blosc compressor:
http://www.blosc.org

User's mail list:
blaze-...@continuum.io



   Enjoy!

Francesc Alted
Continuum Analytics, Inc.

--
https://mail.python.org/mailman/listinfo/python-announce-list

   Support the Python Software Foundation:
   http://www.python.org/psf/donations/


[ANN] numexpr 2.2 released

2013-09-01 Thread Francesc Alted
==
 Announcing Numexpr 2.2
==

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
VML library (included in Intel MKL), which allows an extremely fast
evaluation of transcendental functions (sin, cos, tan, exp, log...)
while squeezing the last drop of performance out of your multi-core
processors.

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational kernel for projects that
don't want to adopt other solutions that require more heavy
dependencies.

What's new
==

This release is mainly meant to fix a problem with the license the
numexpr/win32/pthread.{c,h} files emulating pthreads on Windows. After
persmission from the original authors is granted, these files adopt
the MIT license and can be redistributed without problems.  See issue
#109 for details
(https://code.google.com/p/numexpr/issues/detail?id=110).

Another important improvement is the algorithm to decide the initial
number of threads to be used.  This was necessary because by default,
numexpr was using a number of threads equal to the detected number of
cores, and this can be just too much for moder systems where this
number can be too high (and counterporductive for performance in many
cases).  Now, the 'NUMEXPR_NUM_THREADS' environment variable is
honored, and in case this is not present, a maximum number of *8*
threads are setup initially.  The new algorithm is fully described in
the Users Guide now in the note of 'General routines' section:
https://code.google.com/p/numexpr/wiki/UsersGuide#General_routines.
Closes #110.

In case you want to know more in detail what has changed in this
version, see:

http://code.google.com/p/numexpr/wiki/ReleaseNotes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

You can get the packages from PyPI as well:

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: python-blosc 1.1 (final) released

2013-05-24 Thread Francesc Alted

===
Announcing python-blosc 1.1
===

What is it?
===

python-blosc (http://blosc.pydata.org/) is a Python wrapper for the
Blosc compression library.

Blosc (http://blosc.org) is a high performance compressor optimized for
binary data.  It has been designed to transmit data to the processor
cache faster than the traditional, non-compressed, direct memory fetch
approach via a memcpy() OS call.  Whether this is achieved or not
depends of the data compressibility, the number of cores in the system,
and other factors.  See a series of benchmarks conducted for many
different systems: http://blosc.org/trac/wiki/SyntheticBenchmarks.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

There is also a handy command line for Blosc called Bloscpack
(https://github.com/esc/bloscpack) that allows you to compress large
binary datafiles on-disk.  Although the format for Bloscpack has not
stabilized yet, it allows you to effectively use Blosc from your
favorite shell.


What is new?


- Added new `compress_ptr` and `decompress_ptr` functions that allows to
  compress and decompress from/to a data pointer, avoiding an
  itermediate copy for maximum speed.  Be careful, as these are low
  level calls, and user must make sure that the pointer data area is
  safe.

- Since Blosc (the C library) already supports to be installed as an
  standalone library (via cmake), it is also possible to link
  python-blosc against a system Blosc library.

- The Python calls to Blosc are now thread-safe (another consequence of
  recent Blosc library supporting this at C level).

- Many checks on types and ranges of values have been added.  Most of
  the calls will now complain when passed the wrong values.

- Docstrings are much improved. Also, Sphinx-based docs are available
  now.

Many thanks to Valentin Hänel for his impressive work for this release.

For more info, you can see the release notes in:

https://github.com/FrancescAlted/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://blosc.pydata.org


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the python- prefix


Download sources


The sources are managed through github services at:

http://github.com/FrancescAlted/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://blosc.pydata.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/FrancescAlted/python-blosc/blob/master/LICENSES

for more details.

Enjoy!

--
Francesc Alted
--
http://mail.python.org/mailman/listinfo/python-announce-list

   Support the Python Software Foundation:
   http://www.python.org/psf/donations/


ANN: python-blosc 1.1 RC1, a wrapper for the compression library, is available

2013-05-19 Thread Francesc Alted


Announcing python-blosc 1.1 RC1


What is it?
===

python-blosc (http://blosc.pydata.org/) is a Python wrapper for the
Blosc compression library.

Blosc (http://blosc.org) is a high performance compressor optimized for
binary data.  It has been designed to transmit data to the processor
cache faster than the traditional, non-compressed, direct memory fetch
approach via a memcpy() OS call.  Whether this is achieved or not
depends of the data compressibility, the number of cores in the system,
and other factors.  See a series of benchmarks conducted for many
different systems: http://blosc.org/trac/wiki/SyntheticBenchmarks.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

There is also a handy command line for Blosc called Bloscpack
(https://github.com/esc/bloscpack) that allows you to compress large
binary datafiles on-disk.  Although the format for Bloscpack has not
stabilized yet, it allows you to effectively use Blosc from your
favorite shell.


What is new?


- Added new `compress_ptr` and `decompress_ptr` functions that allows to
  compress and decompress from/to a data pointer.  These are low level
  calls and user must make sure that the pointer data area is safe.

- Since Blosc (the C library) already supports to be installed as an
  standalone library (via cmake), it is also possible to link
  python-blosc against a system Blosc library.

- The Python calls to Blosc are now thread-safe (another consequence of
  recent Blosc library supporting this at C level).

- Many checks on types and ranges of values have been added.  Most of
  the calls will now complain when passed the wrong values.

- Docstrings are much improved. Also, Sphinx-based docs are available
  now.

Many thanks to Valentin Hänel for his impressive work for this release.

For more info, you can see the release notes in:

https://github.com/FrancescAlted/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://blosc.pydata.org


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the blosc- prefix


Download sources


The sources are managed through github services at:

http://github.com/FrancescAlted/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://blosc.pydata.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/FrancescAlted/python-blosc/blob/master/LICENSES

for more details.

--
Francesc Alted
--
http://mail.python.org/mailman/listinfo/python-announce-list

   Support the Python Software Foundation:
   http://www.python.org/psf/donations/


[ANN] python-blosc 1.0.5 released

2012-09-16 Thread Francesc Alted

=
Announcing python-blosc 1.0.5
=

What is it?
===

A Python wrapper for the Blosc compression library.

Blosc (http://blosc.pytables.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

What is new?


- Upgraded to latest Blosc 1.1.4.

- Better handling of condition errors, and improved memory releasing in
  case of errors (thanks to Valentin Haenel and Han Genuit).

- Better handling of types (should compile without warning now, at least
  with GCC).

For more info, you can see the release notes in:

https://github.com/FrancescAlted/python-blosc/wiki/Release-notes

More docs and examples are available in the Quick User's Guide wiki page:

https://github.com/FrancescAlted/python-blosc/wiki/Quick-User's-Guide

Download sources


Go to:

http://github.com/FrancescAlted/python-blosc

and download the most recent release from there.

Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for
details.

Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc

--
Francesc Alted

--
http://mail.python.org/mailman/listinfo/python-announce-list

   Support the Python Software Foundation:
   http://www.python.org/psf/donations/


[ANN] carray 0.5 released

2012-08-21 Thread Francesc Alted

Announcing carray 0.5
=

What's new
--

carray 0.5 supports completely transparent storage on-disk in addition
to memory.  That means that everything that can be done with an
in-memory container can be done using the disk instead.

The advantages of a disk-based container is that your addressable space
is much larger than just your available memory.  Also, as carray is
based on a chunked and compressed data layout based on the super-fast
Blosc compression library, and the different cache levels existing in
both modern operating systems and the internal carray machinery, the
data access speed is very good.

The format chosen for the persistence layer is based on the 'bloscpack'
library (thanks to Valentin Haenel for his inspiration) and described in
'persistence.rst', although not everything has been implemented yet.
You may want to contribute by proposing enhancements to it.  See:
https://github.com/FrancescAlted/carray/wiki/PersistenceProposal

CAVEAT: The bloscpack format is still evolving, so don't trust on
forward compatibility of the format, at least until 1.0, where the
internal format will be declared frozen.

For more detailed info, see the release notes in:
https://github.com/FrancescAlted/carray/wiki/Release-0.5


What it is
--

carray is a chunked container for numerical data.  Chunking allows for
efficient enlarging/shrinking of data container.  In addition, it can
also be compressed for reducing memory/disk needs.  The compression
process is carried out internally by Blosc, a high-performance
compressor that is optimized for binary data.

carray can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr can use optimize the memory usage and use several cores for
doing the computations, so it is blazing fast.  Moreover, with the
introduction of a carray/ctable disk-based container (in version 0.5),
it can be used for seamlessly performing out-of-core computations.

carray comes with an exhaustive test suite and fully supports both
32-bit and 64-bit platforms.  Also, it is typically tested on both UNIX
and Windows operating systems.

Resources
-

Visit the main carray site repository at:
http://github.com/FrancescAlted/carray

You can download a source package from:
http://carray.pytables.org/download

Manual:
http://carray.pytables.org/docs/manual

Home of Blosc compressor:
http://blosc.pytables.org

User's mail list:
car...@googlegroups.com
http://groups.google.com/group/carray



   Enjoy!

--
Francesc Alted

--
http://mail.python.org/mailman/listinfo/python-announce-list

   Support the Python Software Foundation:
   http://www.python.org/psf/donations/


ANN: Numexpr 2.0 released

2011-11-27 Thread Francesc Alted

 Announcing Numexpr 2.0


Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
VML library, which allows for squeezing the last drop of performance
out of your multi-core processors.

What's new
==

This version comes with support for the new iterator in NumPy
(introduced in NumPy 1.6), allowing for improved performance in
practically all the scenarios (the exception being very small arrays),
and most specially for operations implying broadcasting,
fortran-ordered or non-native byte orderings.

The carefully crafted mix of the new NumPy iterator and direct access
to data buffers turned out to be so powerful and flexible, that the
internal virtual machine has been completely revamped around this
combination.

The drawback is that you will need NumPy = 1.6 to run numexpr 2.0.
However, NumPy 1.6 has been released more than 6 months ago now, so we
think this is a good time for taking advantage of it.  Many thanks to
Mark Wiebe for such an important contribution!

For some benchmarks on the new virtual machine, see:

http://code.google.com/p/numexpr/wiki/NewVM

Also, Gaëtan de Menten contributed important bug fixes, code cleanup
as well as speed enhancements.  Francesc Alted contributed some fixes,
and added compatibility code with existing applications (PyTables)
too.

In case you want to know more in detail what has changed in this
version, see:

http://code.google.com/p/numexpr/wiki/ReleaseNotes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

You can get the packages from PyPI as well:

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy!

-- 
Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: numexpr 1.4.2 released

2011-01-26 Thread Francesc Alted
==
 Announcing Numexpr 1.4.2
==

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

What's new
==

This is a maintenance release.  The most annying issues have been
fixed (including the reduction malfunction introduced in 1.4 series).
Also, several performance enhancements (specially for VML and small
array operation) are included too.

In case you want to know more in detail what has changed in this
version, see:

http://code.google.com/p/numexpr/wiki/ReleaseNotes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

You can get the packages from PyPI as well:

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy!


-- 
Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: carray released

2010-12-23 Thread Francesc Alted
=
Announcing carray 0.3
=

What's new
==

A lot of stuff.  The most outstanding feature in this version is the
introduction of a `ctable` object.  A `ctable` is similar to a
structured array in NumPy, but instead of storing the data row-wise, it
uses a column-wise arrangement.  This allows for much better performance
for very wide tables, which is one of the scenarios where a `ctable`
makes more sense.  Of course, as `ctable` is based on `carray` objects,
it inherits all its niceties (like on-the-flight compression and fast
iterators).

Also, the `carray` object itself has received many improvements, like
new constructors (arange(), fromiter(), zeros(), ones(), fill()),
iterators (where(), wheretrue()) or resize mehtods (resize(), trim()).
Most of these also work with the new `ctable`.

Besides, Numexpr is supported now (but it is optional) in order to carry
out stunningly fast queries on `ctable` objects.  For example, doing a
query on a table with one million rows and one thousand columns can be
up to 2x faster than using a plain structured array, and up to 20x
faster than using SQLite (using the :memory: backend and indexing).
See 'bench/ctable-query.py' for details.

Finally, binaries for Windows (both 32-bit and 64-bit) are provided.

For more detailed info, see the release notes in:
https://github.com/FrancescAlted/carray/wiki/Release-0.3

What it is
==

carray is a container for numerical data that can be compressed
in-memory.  The compression process is carried out internally by Blosc,
a high-performance compressor that is optimized for binary data.

Having data compressed in-memory can reduce the stress of the memory
subsystem.  The net result is that carray operations may be faster than
using a traditional ndarray object from NumPy.

carray also supports fully 64-bit addressing (both in UNIX and Windows).
Below, a carray with 1 trillion of rows has been created (7.3 TB total),
filled with zeros, modified some positions, and finally, summed-up::

   %time b = ca.zeros(1e12)
  CPU times: user 54.76 s, sys: 0.03 s, total: 54.79 s
  Wall time: 55.23 s
   %time b[[1, 1e9, 1e10, 1e11, 1e12-1]] = (1,2,3,4,5)
  CPU times: user 2.08 s, sys: 0.00 s, total: 2.08 s
  Wall time: 2.09 s
   b
  carray((1,), float64)
nbytes: 7450.58 GB; cbytes: 2.27 GB; ratio: 3275.35
cparams := cparams(clevel=5, shuffle=True)
  [0.0, 1.0, 0.0, ..., 0.0, 0.0, 5.0]
   %time b.sum()
  CPU times: user 10.08 s, sys: 0.00 s, total: 10.08 s
  Wall time: 10.15 s
  15.0

['%time' is a magic function provided by the IPyhton shell]

Please note that the example above is provided for demonstration
purposes only.  Do not try to run this at home unless you have more than
3 GB of RAM available, or you will get into trouble.

Resources
=

Visit the main carray site repository at:
http://github.com/FrancescAlted/carray

You can download a source package from:
http://carray.pytables.org/download

Manual:
http://carray.pytables.org/manual

Home of Blosc compressor:
http://blosc.pytables.org

User's mail list:
car...@googlegroups.com
http://groups.google.com/group/carray

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.



   Enjoy!

-- 
Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


python-blosc 1.0.3 released

2010-11-18 Thread Francesc Alted

 Announcing python-blosc 1.0.3
 A Python wrapper for the Blosc compression library


What is it?
===

Blosc (http://blosc.pytables.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc is a Python package that wraps it.

What is new?


Blosc has been updated to 1.1.3, allowing much improved compression
ratio under some circumstances.  Also, the number of cores on Windows
platform is detected correctly now (thanks to Han Genuit).

Last, but not least, Windows binaries for Python 2.6 and 2.7 are
provided (both in 32-bit and 64-bit flavors).

For more info, you can see the release notes in:

https://github.com/FrancescAlted/python-blosc/wiki/Release-notes

Basic Usage
===

# Create a binary string made of int (32-bit) elements
 import array
 a = array.array('i', range(10*1000*1000))
 bytes_array = a.tostring()

# Compress it
 import blosc
 bpacked = blosc.compress(bytes_array, typesize=a.itemsize)
 len(bytes_array) / len(bpacked)
110  # 110x compression ratio.  Not bad!

# Compression speed?
 from timeit import timeit
 timeit(blosc.compress(bytes_array, a.itemsize),
   import blosc, array; \
a = array.array('i', range(10*1000*1000)); \
bytes_array = a.tostring(), \
number=10)
0.040534019470214844
 len(bytes_array)*10 / 0.0405 / (1024*1024*1024)
9.1982476505232444  # wow, compressing at ~ 9 GB/s.  That's fast!
# This is actually much faster than a `memcpy` system call
 timeit(ctypes.memmove(b.buffer_info()[0], a.buffer_info()[0], \
len(a)*a.itemsize),
import array, ctypes; \
a = array.array('i', range(10*1000*1000)); \
b = a[::-1], number=10)
0.10316681861877441
 len(bytes_array)*10 / 0.1031 / (1024*1024*1024)
3.6132786600018565  # ~ 3.6 GB/s is memcpy speed

# Decompress it
 bytes_array2 = blosc.decompress(bpacked)
# Check whether our data have had a good trip
 bytes_array == bytes_array2
True# yup, it seems so

# Decompression speed?
 timeit(s2 = blosc.decompress(bpacked),
   import blosc, array; \
a = array.array('i', range(10*1000*1000)); \
bytes_array = a.tostring(); \
bpacked = blosc.compress(bytes_array, a.itemsize), \
number=10)
0.083872079849243164
 len(bytes_array)*10 / 0.0838 / (1024*1024*1024)
4.4454538167803275  # decompressing at ~ 4.4 GB/s is pretty good too!

[Using a machine with 8 physical cores with hyper-threading]

The above examples use maximum compression level 9 (default), and
although lower compression levels produce smaller compression ratios,
they are also faster (reaching speeds exceeding 11 GB/s).

More examples showing other features (and using NumPy arrays) are
available on the python-blosc wiki page:

http://github.com/FrancescAlted/python-blosc/wiki

Documentation
=

Please refer to docstrings.  Start by the main package:

 import blosc
 help(blosc)

and ask for more docstrings in the referenced functions.

Download sources


Go to:

http://github.com/FrancescAlted/python-blosc

and download the most recent release from here.

Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for
details.

Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc




  **Enjoy data!**

-- 
Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: PyTables 2.2.1 released

2010-11-06 Thread Francesc Alted
===
 Announcing PyTables 2.2.1
===

This is maintenance release.  The upgrade is recommended for all that
are running PyTables in production environments.


What's new
==

Many fixes have been included, as well as a fair bunch of performance
improvements.  Also, the Blosc compression library has been updated to
1.1.2, in order to prevent locks in some scenarios.  Finally, the new
evaluation version of PyTables Pro is based on the previous Pro 2.2.

In case you want to know more in detail what has changed in this
version, have a look at:
http://www.pytables.org/moin/ReleaseNotes/Release_2.2.1

You can download a source package with generated PDF and HTML docs, as
well as binaries for Windows, from:
http://www.pytables.org/download/stable

For an on-line version of the manual, visit:
http://www.pytables.org/docs/manual-2.2.1


What it is?
===

PyTables is a library for managing hierarchical datasets and designed to
efficiently cope with extremely large amounts of data with support for
full 64-bit file addressing.  PyTables runs on top of the HDF5 library
and NumPy package for achieving maximum throughput and convenient use.


Resources
=

About PyTables:

http://www.pytables.org

About the HDF5 library:

http://hdfgroup.org/HDF5/

About NumPy:

http://numpy.scipy.org/


Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy (and numarray!)
makers.  Without them, PyTables simply would not exist.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.




  **Enjoy data!**

  -- The PyTables Team

-- 
Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: python-blosc 1.0.2

2010-11-04 Thread Francesc Alted

 Announcing python-blosc 1.0.2
 A Python wrapper for the Blosc compression library


What is it?
===

Blosc (http://blosc.pytables.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc is a Python package that wraps it.

What is new?


Updated to Blosc 1.1.2.  Fixes some bugs when dealing with very small
buffers (typically smaller than specified typesizes).  Closes #1.

Basic Usage
===

[Using IPython shell and a 2-core machine below]

# Create a binary string made of int (32-bit) elements
 import array
 a = array.array('i', range(10*1000*1000))
 bytes_array = a.tostring()

# Compress it
 import blosc
 bpacked = blosc.compress(bytes_array, typesize=a.itemsize)
 len(bytes_array) / len(bpacked)
110  # 110x compression ratio.  Not bad!
# Compression speed?
 timeit blosc.compress(bytes_array, typesize=a.itemsize)
100 loops, best of 3: 12.8 ms per loop
 len(bytes_array) / 0.0128 / (1024*1024*1024)
2.9103830456733704  # wow, compressing at ~ 3 GB/s, that's fast!

# Decompress it
 bytes_array2 = blosc.decompress(bpacked)
# Check whether our data have had a good trip
 bytes_array == bytes_array2
True# yup, it seems so
# Decompression speed?
 timeit blosc.decompress(bpacked)
10 loops, best of 3: 21.3 ms per loop
 len(bytes_array) / 0.0213 / (1024*1024*1024)
1.7489625814375185  # decompressing at ~ 1.7 GB/s is pretty good too!

More examples showing other features (and using NumPy arrays) are
available on the python-blosc wiki page:

http://github.com/FrancescAlted/python-blosc/wiki

Documentation
=

Please refer to docstrings.  Start by the main package:

 import blosc
 help(blosc)

and ask for more docstrings in the referenced functions.

Download sources


Go to:

http://github.com/FrancescAlted/python-blosc

and download the most recent release from here.

Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for
details.

Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc




  **Enjoy data!**

-- 
Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: Numexpr 1.4.1 released

2010-10-21 Thread Francesc Alted
==
 Announcing Numexpr 1.4.1
==

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

What's new
==

This is a maintenance release.  On it, several improvements have been
done in order to prevent deadlocks in new threaded code (fixes #33).
Also the GIL is released now during computations, which should be
interesting for embedding numexpr in threaded Python apps.

In case you want to know more in detail what has changed in this
version, see:

http://code.google.com/p/numexpr/wiki/ReleaseNotes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

And you can get the packages from PyPI as well:

http://pypi.python.org/pypi

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy!

-- 
Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: Numexpr 1.4 released

2010-08-01 Thread Francesc Alted

 Announcing Numexpr 1.4


Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

What's new
==

The main improvement in this version is the support for
multi-threading in pure C.  Threading in C provides the best
performance in nowadays multi-core machines.  In addition, this avoids
the GIL that hampers performance in many Python apps.

Just to wet your appetite, look into this page where the
implementation is briefly described and where some benchmarks are
shown:

http://code.google.com/p/numexpr/wiki/MultiThreadVM

In case you want to know more in detail what has changed in this
version, see:

http://code.google.com/p/numexpr/wiki/ReleaseNotes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

And you can get the packages from PyPI as well:

http://pypi.python.org/pypi

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy!

-- 
Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


PyTables 2.2 released: entering the multi-core age

2010-07-01 Thread Francesc Alted
=
 Announcing PyTables 2.2 (final)
=

I'm happy to announce PyTables 2.2 (final).  After 18 months of
continuous development and testing, this is, by far, the most powerful
and well-tested release ever.  I hope you like it too.

What's new
==

The main new features in 2.2 series are:

  * A new compressor called Blosc, designed to read/write data to/from
memory at speeds that can be faster than a system `memcpy()` call.
With it, many internal PyTables operations that are currently
bounded by CPU or I/O bandwith are speed-up.  Some benchmarks:
http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks

And a demonstration on how Blosc can improve PyTables performance:
http://www.pytables.org/docs/manual/ch05.html#chunksizeFineTune

  * Support for HDF5 hard links, soft links and external links (kind of
mounting external filesystems).  A new tutorial about its usage has
been added to the 'Tutorials' chapter of User's Manual.  See:
http://www.pytables.org/docs/manual/ch03.html#LinksTutorial

  * A new `tables.Expr` module (based on Numexpr) that allows to do
persistent, on-disk computations on many algebraic operations.
For a brief look on its performance, see:
http://pytables.org/moin/ComputingKernel

  * Suport for 'fancy' indexing (i.e., à la NumPy) in all the data
containers in PyTables.  Backported from the implementation in the
h5py project.  Thanks to Andrew Collette for his fine work on this!

  * Binaries for both Windows 32-bit and 64-bit are provided now.

As always, a large amount of bugs have been addressed and squashed too.

In case you want to know more in detail what has changed in this
version, have a look at:
http://www.pytables.org/moin/ReleaseNotes/Release_2.2

You can download a source package with generated PDF and HTML docs, as
well as binaries for Windows, from:
http://www.pytables.org/download/stable

For an on-line version of the manual, visit:
http://www.pytables.org/docs/manual-2.2

What it is?
===

PyTables is a library for managing hierarchical datasets and designed to
efficiently cope with extremely large amounts of data with support for
full 64-bit file addressing.  PyTables runs on top of the HDF5 library
and NumPy package for achieving maximum throughput and convenient use.

Resources
=

About PyTables:

http://www.pytables.org

About the HDF5 library:

http://hdfgroup.org/HDF5/

About NumPy:

http://numpy.scipy.org/

Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy (and numarray!)
makers.  Without them, PyTables simply would not exist.

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.



  **Enjoy data!**

  -- The PyTables Team


-- 
Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


PyTables 2.2rc2 ready to test

2010-06-17 Thread Francesc Alted
===
 Announcing PyTables 2.2rc2
===

PyTables is a library for managing hierarchical datasets and designed to
efficiently cope with extremely large amounts of data with support for
full 64-bit file addressing.  PyTables runs on top of the HDF5 library
and NumPy package for achieving maximum throughput and convenient use.

This is the second (and probably last) release candidate for PyTables
2.2, so please test it as much as you can before I declare the beast
stable.  The main new features in 2.2 series are:

  * A new compressor called Blosc, designed to read/write data to/from
memory at speeds that can be faster than a system `memcpy()` call.
With it, many internal PyTables operations that are currently
bounded by CPU or I/O bandwith are speed-up.  Some benchmarks:
http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks

  * A new `tables.Expr` module (based on Numexpr) that allows to do
persistent, on-disk computations on many algebraic operations.
For a brief look on its performance, see:
http://pytables.org/moin/ComputingKernel

  * Support for HDF5 hard links, soft links and automatic external links
(kind of mounting external filesystems).  A new tutorial about its
usage has been added to the 'Tutorials' chapter of User's Manual.

  * Suport for 'fancy' indexing (i.e., à la NumPy) in all the data
containers in PyTables.  Backported from the implementation in the
h5py project.  Thanks to Andrew Collette for his fine work on this!

As always, a large amount of bugs have been addressed and squashed too.

In case you want to know more in detail what has changed in this
version, have a look at:
http://www.pytables.org/moin/ReleaseNotes/Release_2.2rc2

You can download a source package with generated PDF and HTML docs, as
well as binaries for Windows, from:
http://www.pytables.org/download/preliminary

For an on-line version of the manual, visit:
http://www.pytables.org/docs/manual-2.2rc2


Resources
=

About PyTables:

http://www.pytables.org

About the HDF5 library:

http://hdfgroup.org/HDF5/

About NumPy:

http://numpy.scipy.org/


Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy (and numarray!)
makers.  Without them, PyTables simply would not exist.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.




  **Enjoy data!**

  -- The PyTables Team

-- 
Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


Python-es mailing list changes home

2010-01-20 Thread Francesc Alted
=== Python-es mailing list changes home ===

Due to technical problems with the site that usually ran the Python-es
mailing list (Python list for the Spanish speaking community), we are
setting up a new one under the python.org umbrella.  Hence, the new
list will become python...@python.org (the old one was
python...@aditel.org).

Please feel free to subscribe to the new list in:

http://mail.python.org/mailman/listinfo/python-es

Thanks!

=== La lista de distribución Python-es cambia de lugar ===

Debido a problemas técnicos con el sitio que normalmente albergaba la
lista de Python-es (Lista de Python para la comunidad
hispano-hablante), estamos configurando una nueva en el sitio
python.org.  Así que la nueva lista será python...@python.org (en
sustitución de la antigua python...@aditel.org).

Por favor, si lo deseas, date de alta en la nueva lista en:

http://mail.python.org/mailman/listinfo/python-es

¡Gracias!

Chema Cortes, Oswaldo Hernández y Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: PyTables 2.2b2 released

2009-12-22 Thread Francesc Alted
===
 Announcing PyTables 2.2b2
===

PyTables is a library for managing hierarchical datasets and designed to
efficiently cope with extremely large amounts of data with support for
full 64-bit file addressing.  PyTables runs on top of the HDF5 library
and NumPy package for achieving maximum throughput and convenient use.

This is the second beta version of 2.2 release.  The main addition is
the support for links.  All HDF5 kind of links are supported: hard, soft
and external.  Hard and soft links are similar to hard and symbolic
links in regular UNIX filesystems, while external links are more like
mounting external filesystems (in this case, HDF5 files) on top of
existing ones.  This allows for a considerable degree of flexibility
when defining your object tree.  See the new tutorial at:

http://www.pytables.org/docs/manual-2.2b2/ch03.html#LinksTutorial

Also, some other new features (like complete control of HDF5 chunk cache
parameters and native compound types in attributes), bug fixes and a
couple of (small) API changes happened.

In case you want to know more in detail what has changed in this
version, have a look at:

http://www.pytables.org/moin/ReleaseNotes/Release_2.2b2

You can download a source package with generated PDF and HTML docs, as
well as binaries for Windows, from:

http://www.pytables.org/download/preliminary

For an on-line version of the manual, visit:

http://www.pytables.org/docs/manual-2.2b2


Resources
=

About PyTables:

http://www.pytables.org

About the HDF5 library:

http://hdfgroup.org/HDF5/

About NumPy:

http://numpy.scipy.org/


Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy (and numarray!)
makers.  Without them, PyTables simply would not exist.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.




  **Enjoy data!**

  -- The PyTables Team

-- 
Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: Numexpr 1.3.1 released

2009-06-23 Thread Francesc Alted
==
 Announcing Numexpr 1.3.1
==

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

This is a maintenance release.  On it, support for the `unit32` type
has been added (it is internally upcasted to `int64`), as well as a
new `abs()` function (thanks to Pauli Virtanen for the patch).

Also, a little tweaking in the treatment of unaligned arrays on Intel
architectures allowed for up to 2x speedups in computations involving
unaligned arrays.  For example, for multiplying 2 arrays (see the
included ``unaligned-simple.py`` benchmark), figures before the
tweaking were:

NumPy aligned:  0.63 s
NumPy unaligned:1.66 s
Numexpr aligned:0.65 s
Numexpr unaligned:  1.09 s

while now they are:

NumPy aligned:  0.63 s
NumPy unaligned:1.65 s
Numexpr aligned:0.65 s
Numexpr unaligned:  0.57 s   -- almost 2x faster than above

You can also see how the unaligned case can be even faster than the
aligned one.  The explanation is that the 'aligned' array was actually
a strided one (actually a column of an structured array), and the
total working data size was a bit larger for this case.

In case you want to know more in detail what has changed in this
version, see:

http://code.google.com/p/numexpr/wiki/ReleaseNotes

or have a look at RELEASE_NOTES.txt in the tarball.


Where I can find Numexpr?
=

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

And you can get the packages from PyPI as well:

http://pypi.python.org/pypi


How it works?
=

See:

http://code.google.com/p/numexpr/wiki/Overview

for a detailed description by the original author (David M. Cooke).


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy!

-- 
Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


ANN: Numexpr 1.3 released

2009-06-03 Thread Francesc Alted

 Announcing Numexpr 1.3


Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

On this release, and due to popular demand, support for single
precision floating point types has been added.  This allows for both
improved performance and optimal usage of memory for the
single precision computations.  Of course, support for single
precision in combination with Intel's VML is there too :)

However, caveat emptor: the casting rules for floating point types
slightly differs from those of NumPy.  See the ``Casting rules``
section at:

http://code.google.com/p/numexpr/wiki/Overview

or the README.txt file for more info on this issue.

In case you want to know more in detail what has changed in this
version, see:

http://code.google.com/p/numexpr/wiki/ReleaseNotes

or have a look at RELEASE_NOTES.txt in the tarball.


Where I can find Numexpr?
=

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

And you can get the packages from PyPI as well:

http://pypi.python.org/pypi


How it works?
=

See:

http://code.google.com/p/numexpr/wiki/Overview

for a detailed description by the original author (David M. Cooke).


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy!

-- 
Francesc Alted
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations.html


[ANN] PyTables 2.1.1 released

2009-03-13 Thread Francesc Alted
===
 Announcing PyTables 2.1.1
===

PyTables is a library for managing hierarchical datasets and designed to
efficiently cope with extremely large amounts of data with support for
full 64-bit file addressing.  PyTables runs on top of the HDF5 library
and NumPy package for achieving maximum throughput and convenient use.

This is a maintenance release, so you should not expect API changes.
Instead, a handful of bugs, like `File` not being subclassable,
incorrectly retrieved default values for data types, a memory leak,
and more, have been fixed.  Besides, some enhancements have been
implemented, like improved Unicode support for filenames, better
handling of Unicode attributes, and the possibility to create very
large data types exceeding 64 KB in size (with some limitations).
Last but not least, this is the first PyTables version fully tested
against Python 2.6.  It is worth noting that binaries for Windows and
Python 2.6 wears the newest HDF5 1.8.2 libraries (instead of the
traditional HDF5 1.6.x) now.

In case you want to know more in detail what has changed in this
version, have a look at:
http://www.pytables.org/moin/ReleaseNotes/Release_2.1.1

You can download a source package with generated PDF and HTML docs, as
well as binaries for Windows, from:
http://www.pytables.org/download/stable

For an on-line version of the manual, visit:
http://www.pytables.org/docs/manual-2.1.1

You may want to fetch an evaluation version for PyTables Pro from:
http://www.pytables.org/download/evaluation


Resources
=

About PyTables:

http://www.pytables.org

About the HDF5 library:

http://www.hdfgroup.org/HDF5/

About NumPy:

http://numpy.scipy.org/


Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy (and numarray!)
makers.  Without them, PyTables simply would not exist.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.




  **Enjoy data!**

  -- The PyTables Team

-- 
Francesc Alted
--
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations.html


ANN: PyTables 2.1rc1 ready for testing

2008-10-31 Thread Francesc Alted

 Announcing PyTables 2.1rc1


PyTables is a library for managing hierarchical datasets and designed to
efficiently cope with extremely large amounts of data with support for
full 64-bit file addressing.  PyTables runs on top of the HDF5 library
and NumPy package for achieving maximum throughput and convenient use.

In PyTables 2.1rc1 many new features and a handful of bugs have been
addressed.  This is a release candidate, so, in addition to the tarball,
binaries for Windows are provided too.  Also, the API has been frozen
and you should only expect bug fixes and documentation improvements for
2.1 final (due to release in a couple of weeks now).

This version introduces important improvements, like much faster node
opening, creation or navigation, a file-based way to fine-tune the
different PyTables parameters (fully documented now in a new appendix of
the UG) and support for multidimensional atoms in EArray/CArray objects.

Regarding the Pro edition, 3 different kind of indexes have been added
so that the user can choose the best for her needs.  Also, and due to
the introduction of the concept of chunkmaps in OPSI, the responsiveness
of complex queries with low selectivity has improved quite a lot.  And
last but not least, it is possible now to sort completely tables that
are ordered by a specific field, with no practical limit in size (up to
2**48 rows, that is, around 281 trillion of rows).  More info in:
http://www.pytables.org/moin/PyTablesPro#WhatisnewinforthcomingPyTablesPro2.1

In case you want to know more in detail what has changed in this
version, have a look at ``RELEASE_NOTES.txt`` in the tarball.  Find the
HTML version for this document at:
http://www.pytables.org/moin/ReleaseNotes/Release_2.1rc1

You can download a source package of the version 2.1rc1 with
generated PDF and HTML docs and binaries for Windows from
http://www.pytables.org/download/preliminary

Finally, and for the first time, an evaluation version for PyTables Pro
has been made available in:
http://www.pytables.org/download/evaluation
Please read the evaluation license for terms of use of this version:
http://www.pytables.org/moin/PyTablesProEvaluationLicense

For an on-line version of the manual, visit:
http://www.pytables.org/docs/manual-2.1rc1


Resources
=

Go to the PyTables web site for more details:

http://www.pytables.org

About the HDF5 library:

http://hdfgroup.org/HDF5/

About NumPy:

http://numpy.scipy.org/

Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Many
thanks also to SourceForge who have helped to make and distribute this
package!  And last, but not least thanks a lot to the HDF5 and NumPy
(and numarray!) makers. Without them PyTables simply would not exist.

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.



  **Enjoy data!**

  -- The PyTables Team
-- 
Francesc Alted
--
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations.html


ANN: PyTables 2.0.4 available

2008-07-07 Thread Francesc Alted
===
 Announcing PyTables 2.0.4
===

PyTables is a library for managing hierarchical datasets and designed to
efficiently cope with extremely large amounts of data with support for
full 64-bit file addressing.  PyTables runs on top of the HDF5 library
and NumPy package for achieving maximum throughput and convenient use.

After some months without new versions (I have been busy for a while
doing things not related with PyTables, unfortunately), I'm happy to
announce the availability of PyTables 2.0.4.  It fixes some important
issues, and now it is possible to use table selections in threaded
environments.  Also, ``EArray.truncate(0)`` can be used so that you can
completely void existing EArrays (only enabled if you have a recent
version, i.e. = 1.8.0, of the HDF5 library installed).  Besides, the
compatibility with native HDF5 files has been improved too.  Finally, 
the usage of recent versions of NumPy (1.1) and HDF5 (1.8.1) has been 
tested and, fortunately, they work just fine.

In case you want to know more in detail what has changed in this
version, have a look at ``RELEASE_NOTES.txt``.  Find the HTML version
for this document at:
http://www.pytables.org/moin/ReleaseNotes/Release_2.0.4

You can download a source package of the version 2.0.4 with
generated PDF and HTML docs and binaries for Windows from
http://www.pytables.org/download/stable/

For an on-line version of the manual, visit:
http://www.pytables.org/docs/manual-2.0.4

*Important note for PyTables Pro users*: due to lack of resources, I'll 
not be delivering a MacOSX binary version of Pro for the time being 
(this is pretty easy to compile, though).  However, I'll continue 
offering the all-in-one binary for Windows (32-bit).

Migration Notes for PyTables 1.x users
==

If you are a user of PyTables 1.x, probably it is worth for you to look
at ``MIGRATING_TO_2.x.txt`` file where you will find directions on how
to migrate your existing PyTables 1.x apps to the 2.x versions.  You can
find an HTML version of this document at
http://www.pytables.org/moin/ReleaseNotes/Migrating_To_2.x

Resources
=

Go to the PyTables web site for more details:

http://www.pytables.org

About the HDF5 library:

http://hdfgroup.org/HDF5/

About NumPy:

http://numpy.scipy.org/

Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Many
thanks also to SourceForge who have helped to make and distribute this
package!  And last, but not least thanks a lot to the HDF5 and NumPy
(and numarray!) makers. Without them, PyTables simply would not exist.

Share your experience
=

Let me know of any bugs, suggestions, gripes, kudos, etc. you may
have.



  **Enjoy your data!**

-- 
Francesc Alted
--
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations.html