Re: [Numpy-discussion] PyData Barcelona this May

2017-03-17 Thread Francesc Alted
2017-03-17 12:37 GMT+01:00 Jaime Fernández del Río <jaime.f...@gmail.com>:

> Last night I gave a short talk to the PyData Zürich meetup on Julian's
> temporary elision PR, and Pauli's overlapping memory one. My learnings from
> that experiment are:
>
>- there is no way to talk about both things in a 30 minute talk: I
>barely scraped the surface and ended up needing 25 minutes.
>- many people that use numpy in their daily work don't know what
>strides are, this was a BIG surprise for me.
>
> Based on that experience, I was thinking that maybe a good topic for a
> workshop would be NumPy's memory model: views, reshaping, strides, some
> hints of buffering in the iterator...
>

​Yeah, I think that workshop would represent a very valuable insight to
many people using NumPy​.


>
> And Julian's temporary work lends itself to a very nice talk, more on
> Python internals than on NumPy, but it's a very cool subject nonetheless.
>
> So my thinking is that I am going to propose those two, as a workshop and
> a talk. Thoughts?
>

​+1​



>
> Jaime
>
> On Thu, Mar 9, 2017 at 8:29 PM, Sebastian Berg <sebast...@sipsolutions.net
> > wrote:
>
>> On Thu, 2017-03-09 at 15:45 +0100, Jaime Fernández del Río wrote:
>> > There will be a PyData conference in Barcelona this May:
>> >
>> > http://pydata.org/barcelona2017/
>> >
>> > I am planning on attending, and was thinking of maybe proposing to
>> > organize a numpy-themed workshop or tutorial.
>> >
>> > My personal inclination would be to look at some advanced topic that
>> > I know well, like writing gufuncs in Cython, but wouldn't mind doing
>> > a more run of the mill thing. Anyone has any thoughts or experiences
>> > on what has worked well in similar situations? Any specific topic you
>> > always wanted to attend a workshop on, but were afraid to ask?
>> >
>> > Alternatively, or on top of the workshop, I could propose to do a
>> > talk: talking last year at PyData Madrid about the new indexing was a
>> > lot of fun! Thing is, I have been quite disconnected from the project
>> > this past year, and can't really think of any worthwhile topic. Is
>> > there any message that we as a project would like to get out to the
>> > larger community?
>> >
>>
>> Francesc already pointed out the temporary optimization. From what I
>> remember, my personal highlight would probably be Pauli's work on the
>> memory overlap detection. Though both are rather passive improvements I
>> guess (you don't really have to learn them to use them), its very cool!
>> And if its about highlighting new stuff, these can probably easily fill
>> a talk.
>>
>> > And if you are planning on attending, please give me a shout.
>> >
>>
>> Barcelona :). Maybe I should think about it, but probably not.
>>
>>
>> > Thanks,
>> >
>> > Jaime
>> >
>> > --
>> > (\__/)
>> > ( O.o)
>> > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
>> > planes de dominación mundial.
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
> de dominación mundial.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] caching large allocations on gnu/linux

2017-03-13 Thread Francesc Alted
2017-03-13 18:11 GMT+01:00 Julian Taylor :

> On 13.03.2017 16:21, Anne Archibald wrote:
> >
> >
> > On Mon, Mar 13, 2017 at 12:21 PM Julian Taylor
> > >
> > wrote:
> >
> > Should it be agreed that caching is worthwhile I would propose a very
> > simple implementation. We only really need to cache a small handful
> of
> > array data pointers for the fast allocate deallocate cycle that
> appear
> > in common numpy usage.
> > For example a small list of maybe 4 pointers storing the 4 largest
> > recent deallocations. New allocations just pick the first memory
> block
> > of sufficient size.
> > The cache would only be active on systems that support MADV_FREE
> (which
> > is linux 4.5 and probably BSD too).
> >
> > So what do you think of this idea?
> >
> >
> > This is an interesting thought, and potentially a nontrivial speedup
> > with zero user effort. But coming up with an appropriate caching policy
> > is going to be tricky. The thing is, for each array, numpy grabs a block
> > "the right size", and that size can easily vary by orders of magnitude,
> > even within the temporaries of a single expression as a result of
> > broadcasting. So simply giving each new array the smallest cached block
> > that will fit could easily result in small arrays in giant allocated
> > blocks, wasting non-reclaimable memory.  So really you want to recycle
> > blocks of the same size, or nearly, which argues for a fairly large
> > cache, with smart indexing of some kind.
> >
>
> The nice thing about MADV_FREE is that we don't need any clever cache.
> The same process that marked the pages free can reclaim them in another
> allocation, at least that is what my testing indicates it allows.
> So a small allocation getting a huge memory block does not waste memory
> as the top unused part will get reclaimed when needed, either by numpy
> itself doing another allocation or a different program on the system.
>

​Well, what you say makes a lot of sense to me, so if you have tested that
then I'd say that this is worth a PR and see how it works on different
workloads​.


>
> An issue that does arise though is that this memory is not available for
> the page cache used for caching on disk data. A too large cache might
> then be detrimental for IO heavy workloads that rely on the page cache.
>

​Yeah.  Also, memory mapped arrays use the page cache intensively, so we
should test this use case​ and see how the caching affects memory map
performance.


> So we might want to cap it to some max size, provide an explicit on/off
> switch and/or have numpy IO functions clear the cache.
>

​Definitely​ dynamically
 allowing the disabling
​this feature would be desirable.  That would provide an easy path for
testing how it affects performance.  Would that be feasible?


Francesc
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PyData Barcelona this May

2017-03-09 Thread Francesc Alted
Hola Jaime!

2017-03-09 15:45 GMT+01:00 Jaime Fernández del Río <jaime.f...@gmail.com>:

> There will be a PyData conference in Barcelona this May:
>
> http://pydata.org/barcelona2017/
>
> I am planning on attending, and was thinking of maybe proposing to
> organize a numpy-themed workshop or tutorial.
>
> My personal inclination would be to look at some advanced topic that I
> know well, like writing gufuncs in Cython, but wouldn't mind doing a more
> run of the mill thing. Anyone has any thoughts or experiences on what has
> worked well in similar situations? Any specific topic you always wanted to
> attend a workshop on, but were afraid to ask?
>

​Writing gufuncs in Cython seems a quite advanced​ topic for a workshop,
but an interesting one indeed.  Numba also supports creating gufuncs (
http://numba.pydata.org/numba-doc/dev/reference/numpysupported.html), so
this perhaps may work as a first approach before going deeper into Cython.


>
> Alternatively, or on top of the workshop, I could propose to do a talk:
> talking last year at PyData Madrid about the new indexing was a lot of fun!
> Thing is, I have been quite disconnected from the project this past year,
> and can't really think of any worthwhile topic. Is there any message that
> we as a project would like to get out to the larger community?
>

​Not a message in particular, but perhaps it would be nice talking about
the temporaries removal ​in expressions that Julian implemented recently (
https://github.com/numpy/numpy/pull/7997) and that is to be released in
1.13.  It is a really cool (and somewhat scary) patch ;)


>
> And if you are planning on attending, please give me a shout.
>

​It would be nice to attend and see you again, but unfortunately I am quite
swamped.  Will see.​

​Have fun in Barcelona!​

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fortran order in recarray.

2017-02-22 Thread Francesc Alted
2017-02-22 16:30 GMT+01:00 Kiko <kikocorre...@gmail.com>:

>
>
> 2017-02-22 16:23 GMT+01:00 Alex Rogozhnikov <alex.rogozhni...@yandex.ru>:
>
>> Hi Francesc,
>> thanks a lot for you reply and for your impressive job on bcolz!
>>
>> Bcolz seems to make stress on compression, which is not of much interest
>> for me, but the *ctable*, and chunked operations look very appropriate
>> to me now. (Of course, I'll need to test it much before I can say this for
>> sure, that's current impression).
>>
>
​You can disable compression for bcolz by default too:

http://bcolz.blosc.org/en/latest/defaults.html#list-of-default-values​



>
>> The strongest concern with bcolz so far is that it seems to be completely
>> non-trivial to install on windows systems, while pip provides binaries for
>> most (or all?) OS for numpy.
>> I didn't build pip binary wheels myself, but is it hard / impossible to
>> cook pip-installabel binaries?
>>
>
> http://www.lfd.uci.edu/~gohlke/pythonlibs/#bcolz
> Check if the link solves the issue with installing.
>

​Yeah.  Also, there are binaries for conda:

http://bcolz.blosc.org/en/latest/install.html#installing-from-conda-forge​



>
>> ​You can change shapes of numpy arrays, but that usually involves copies
>> of the whole container.
>>
>> sure, but this is ok for me, as I plan to organize column editing in
>> 'batches', so this should require seldom copying.
>> It would be nice to see an example to understand how deep I need to go
>> inside numpy.
>>
>
​Well, if copying is not a problem for you, then you can just create a new
numpy container and do the copy by yourself.​

Francesc


>
>> Cheers,
>> Alex.
>>
>>
>>
>>
>> 22 февр. 2017 г., в 17:03, Francesc Alted <fal...@gmail.com> написал(а):
>>
>> Hi Alex,
>>
>> 2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov <alex.rogozhni...@yandex.ru>:
>>
>>> Hi Nathaniel,
>>>
>>>
>>> pandas
>>>
>>>
>>> yup, the idea was to have minimal pandas.DataFrame-like storage (which I
>>> was using for a long time),
>>> but without irritating problems with its row indexing and some other
>>> problems like interaction with matplotlib.
>>>
>>> A dict of arrays?
>>>
>>>
>>> that's what I've started from and implemented, but at some point I
>>> decided that I'm reinventing the wheel and numpy has something already. In
>>> principle, I can ignore this 'column-oriented' storage requirement, but
>>> potentially it may turn out to be quite slow-ish if dtype's size is large.
>>>
>>> Suggestions are welcome.
>>>
>>
>> ​You may want to try bcolz:
>>
>> https://github.com/Blosc/bcolz
>>
>> bcolz is a columnar storage, basically as you require, but data is
>> compressed by default even when stored in-memory (although you can disable
>> compression if you want to).​
>>
>>
>>
>>>
>>> Another strange question:
>>> in general, it is considered that once numpy.array is created, it's
>>> shape not changed.
>>> But if i want to keep the same recarray and change it's dtype and/or
>>> shape, is there a way to do this?
>>>
>>
>> ​You can change shapes of numpy arrays, but that usually involves copies
>> of the whole container.  With bcolz you can change length and add/del
>> columns without copies.​  If your containers are large, it is better to
>> inform bcolz on its final estimated size.  See:
>>
>> http://bcolz.blosc.org/en/latest/opt-tips.html
>>
>> ​Francesc​
>>
>>
>>>
>>> Thanks,
>>> Alex.
>>>
>>>
>>>
>>> 22 февр. 2017 г., в 3:53, Nathaniel Smith <n...@pobox.com> написал(а):
>>>
>>> On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" <alex.rogozhni...@yandex.ru>
>>> wrote:
>>>
>>> Ah, got it. Thanks, Chris!
>>> I thought recarray can be only one-dimensional (like tables with named
>>> columns).
>>>
>>> Maybe it's better to ask directly what I was looking for:
>>> something that works like a table with named columns (but no labelling
>>> for rows), and keeps data (of different dtypes) in a column-by-column way
>>> (and this is numpy, not pandas).
>>>
>>> Is there such a magic thing?
>>>
>>>
>>> Well, that's what pandas is for...
>>>
>>> A dict of arrays?
>>>
>>> -n
>>> __

Re: [Numpy-discussion] Fortran order in recarray.

2017-02-22 Thread Francesc Alted
Hi Alex,

2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov <alex.rogozhni...@yandex.ru>:

> Hi Nathaniel,
>
>
> pandas
>
>
> yup, the idea was to have minimal pandas.DataFrame-like storage (which I
> was using for a long time),
> but without irritating problems with its row indexing and some other
> problems like interaction with matplotlib.
>
> A dict of arrays?
>
>
> that's what I've started from and implemented, but at some point I decided
> that I'm reinventing the wheel and numpy has something already. In
> principle, I can ignore this 'column-oriented' storage requirement, but
> potentially it may turn out to be quite slow-ish if dtype's size is large.
>
> Suggestions are welcome.
>

​You may want to try bcolz:

https://github.com/Blosc/bcolz

bcolz is a columnar storage, basically as you require, but data is
compressed by default even when stored in-memory (although you can disable
compression if you want to).​



>
> Another strange question:
> in general, it is considered that once numpy.array is created, it's shape
> not changed.
> But if i want to keep the same recarray and change it's dtype and/or
> shape, is there a way to do this?
>

​You can change shapes of numpy arrays, but that usually involves copies of
the whole container.  With bcolz you can change length and add/del columns
without copies.​  If your containers are large, it is better to inform
bcolz on its final estimated size.  See:

http://bcolz.blosc.org/en/latest/opt-tips.html

​Francesc​


>
> Thanks,
> Alex.
>
>
>
> 22 февр. 2017 г., в 3:53, Nathaniel Smith <n...@pobox.com> написал(а):
>
> On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" <alex.rogozhni...@yandex.ru>
> wrote:
>
> Ah, got it. Thanks, Chris!
> I thought recarray can be only one-dimensional (like tables with named
> columns).
>
> Maybe it's better to ask directly what I was looking for:
> something that works like a table with named columns (but no labelling for
> rows), and keeps data (of different dtypes) in a column-by-column way (and
> this is numpy, not pandas).
>
> Is there such a magic thing?
>
>
> Well, that's what pandas is for...
>
> A dict of arrays?
>
> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumExpr3 Alpha

2017-02-21 Thread Francesc Alted
Yes, Julian is doing an amazing work on getting rid of temporaries inside
NumPy.  However, NumExpr still has the advantage of using multi-threading
right out of the box, as well as integration with Intel VML.  Hopefully
these features will eventually arrive to NumPy, but meanwhile there is
still value in pushing NumExpr.

Francesc

2017-02-19 18:21 GMT+01:00 Marten van Kerkwijk <m.h.vankerkw...@gmail.com>:

> Hi All,
>
> Just a side note that at a smaller scale some of the benefits of
> numexpr are coming to numpy: Julian Taylor has been working on
> identifying temporary arrays in
> https://github.com/numpy/numpy/pull/7997. Julian also commented
> (https://github.com/numpy/numpy/pull/7997#issuecomment-246118772) that
> with PEP 523 in python 3.6, this should indeed become a lot easier.
>
> All the best,
>
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumExpr3 Alpha

2017-02-17 Thread Francesc Alted
---
>
> * strided complex functions
> * Intel VML support (less necessary now with gcc auto-vectorization)
> * bytes and unicode support
> * reductions (mean, sum, prod, std)
>
>
> What I'm looking for feedback on
> 
>
> * String arrays: How do you use them?  How would unicode differ from bytes
> strings?
> * Interface: We now have a more object-oriented interface underneath the
> familiar
>   evaluate() interface. How would you like to use this interface?
> Francesc suggested
>   generator support, as currently it's more difficult to use NumExpr
> within a loop than
>   it should be.
>
>
> Ideas for the future
> -
>
> * vectorize real functions (such as exp, sqrt, log) similar to the
> complex_functions.hpp vectorization.
> * Add a keyword (likely 'yield') to indicate that a token is intended to
> be changed by a generator inside a loop with each call to NumExpr.run()
>
> If you have any thoughts or find any issues please don't hesitate to open
> an issue at the Github repo. Although unit tests have been run over the
> operation space there are undoubtedly a number of bugs to squash.
>
> Sincerely,
>
> Robert
>
> --
> Robert McLeod, Ph.D.
> Center for Cellular Imaging and Nano Analytics (C-CINA)
> Biozentrum der Universität Basel
> Mattenstrasse 26, 4058 Basel
> Work: +41.061.387.3225 <061%20387%2032%2025>
> robert.mcl...@unibas.ch
> robert.mcl...@bsse.ethz.ch <robert.mcl...@ethz.ch>
> robbmcl...@gmail.com
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.6.2 released!

2017-01-29 Thread Francesc Alted
=

 Announcing Numexpr 2.6.2

=


What's new

==


This is a maintenance release that fixes several issues, with special

emphasis in keeping compatibility with newer NumPy versions.  Also,

initial support for POWER processors is here.  Thanks to Oleksandr

Pavlyk, Alexander Shadchin, Breno Leitao, Fernando Seiti Furusato and

Antonio Valentino for their nice contributions.


In case you want to know more in detail what has changed in this

version, see:


https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst



What's Numexpr

==


Numexpr is a fast numerical expression evaluator for NumPy.  With it,

expressions that operate on arrays (like "3*a+4*b") are accelerated

and use less memory than doing the same calculation in Python.


It wears multi-threaded capabilities, as well as support for Intel's

MKL (Math Kernel Library), which allows an extremely fast evaluation

of transcendental functions (sin, cos, tan, exp, log...) while

squeezing the last drop of performance out of your multi-core

processors.  Look here for a some benchmarks of numexpr using MKL:


https://github.com/pydata/numexpr/wiki/NumexprMKL


Its only dependency is NumPy (MKL is optional), so it works well as an

easy-to-deploy, easy-to-use, computational engine for projects that

don't want to adopt other solutions requiring more heavy dependencies.


Where I can find Numexpr?

=


The project is hosted at GitHub in:


https://github.com/pydata/numexpr


You can get the packages from PyPI as well (but not for RC releases):


http://pypi.python.org/pypi/numexpr


Share your experience

=


Let us know of any bugs, suggestions, gripes, kudos, etc. you may

have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Francesc Alted
2016-11-04 14:36 GMT+01:00 Neal Becker <ndbeck...@gmail.com>:

> Francesc Alted wrote:
>
> > 2016-11-04 13:06 GMT+01:00 Neal Becker <ndbeck...@gmail.com>:
> >
> >> I find I often write:
> >> np.array ([some list comprehension])
> >>
> >> mainly because list comprehensions are just so sweet.
> >>
> >> But I imagine this isn't particularly efficient.
> >>
> >
> > Right.  Using a generator and np.fromiter() will avoid the creation of
> the
> > intermediate list.  Something like:
> >
> > np.fromiter((i for i in range(x)))  # use xrange for Python 2
> >
> >
> Does this generalize to >1 dimensions?
>

A reshape() is not enough?  What do you want to do exactly?


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Francesc Alted
2016-11-04 13:06 GMT+01:00 Neal Becker <ndbeck...@gmail.com>:

> I find I often write:
> np.array ([some list comprehension])
>
> mainly because list comprehensions are just so sweet.
>
> But I imagine this isn't particularly efficient.
>

Right.  Using a generator and np.fromiter() will avoid the creation of the
intermediate list.  Something like:

np.fromiter((i for i in range(x)))  # use xrange for Python 2


>
> I wonder if numpy has a "better" way, and if not, maybe it would be a nice
> addition?
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.6.1 released

2016-07-17 Thread Francesc Alted
=
 Announcing Numexpr 2.6.1
=

What's new
==

This is a maintenance release that fixes a performance regression in
some situations. More specifically, the BLOCK_SIZE1 constant has been
set to 1024 (down from 8192). This allows for better cache utilization
when there are many operands and with VML.  Fixes #221.

Also, support for NetBSD has been added.  Thanks to Thomas Klausner.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst


What's Numexpr
==

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.6.0 released

2016-06-01 Thread Francesc Alted
=
 Announcing Numexpr 2.6.0
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a minor version bump because it introduces a new function.
Also some minor fine tuning for recent CPUs has been done.  More
specifically:

- Introduced a new re_evaluate() function for re-evaluating the
  previous executed array expression without any check.  This is meant
  for accelerating loops that are re-evaluating the same expression
  repeatedly without changing anything else than the operands.  If
  unsure, use evaluate() which is safer.

- The BLOCK_SIZE1 and BLOCK_SIZE2 constants have been re-checked in
  order to find a value maximizing most of the benchmarks in bench/
  directory.  The new values (8192 and 16 respectively) give somewhat
  better results (~5%) overall.  The CPU used for fine tuning is a
  relatively new Haswell processor (E3-1240 v3).

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Calling C code that assumes SIMD aligned data.

2016-05-06 Thread Francesc Alted
2016-05-05 22:10 GMT+02:00 Øystein Schønning-Johansen <oyste...@gmail.com>:

> Thanks for your answer, Francesc. Knowing that there is no numpy solution
> saves the work of searching for this. I've not tried the solution described
> at SO, but it looks like a real performance killer. I'll rather try to
> override malloc with glibs malloc_hooks or LD_PRELOAD tricks. Do you think
> that will do it? I'll try it and report back.
>

I don't think you need that much weaponry.  Just create an array with some
spare space for alignment.  Realize that you want a 64-byte aligned double
precision array.  With that, create your desired array + 64 additional
bytes (8 doubles):

In [92]: a = np.zeros(int(1e6) + 8)

In [93]: a.ctypes.data % 64
Out[93]: 16

and compute the elements to shift this:

In [94]: shift = (64 / a.itemsize) - (a.ctypes.data % 64) / a.itemsize

In [95]: shift
Out[95]: 6

now, create a view with the required elements less:

In [98]: b = a[shift:-((64 / a.itemsize)-shift)]

In [99]: len(b)
Out[99]: 100

In [100]: b.ctypes.data % 64
Out[100]: 0

and voila, b is now aligned to 64 bytes.  As the view is a copy-free
operation, this is fast, and you only wasted 64 bytes.  Pretty cheap indeed.

Francesc


>
> Thanks,
> -Øystein
>
> On Thu, May 5, 2016 at 1:55 PM, Francesc Alted <fal...@gmail.com> wrote:
>
>> 2016-05-05 11:38 GMT+02:00 Øystein Schønning-Johansen <oyste...@gmail.com
>> >:
>>
>>> Hi!
>>>
>>> I've written a little code of numpy code that does a neural network
>>> feedforward calculation:
>>>
>>> def feedforward(self,x):
>>> for activation, w, b in zip( self.activations, self.weights,
>>> self.biases ):
>>> x = activation( np.dot(w, x) + b)
>>>
>>> This works fine when my activation functions are in Python, however I've
>>> wrapped the activation functions from a C implementation that requires the
>>> array to be memory aligned. (due to simd instructions in the C
>>> implementation.) So I need the operation np.dot( w, x) + b to return a
>>> ndarray where the data pointer is aligned. How can I do that? Is it
>>> possible at all?
>>>
>>
>> Yes.  np.dot() does accept an `out` parameter where you can pass your
>> aligned array.  The way for testing if numpy is returning you an aligned
>> array is easy:
>>
>> In [15]: x = np.arange(6).reshape(2,3)
>>
>> In [16]: x.ctypes.data % 16
>> Out[16]: 0
>>
>> but:
>>
>> In [17]: x.ctypes.data % 32
>> Out[17]: 16
>>
>> so, in this case NumPy returned a 16-byte aligned array which should be
>> enough for 128 bit SIMD (SSE family).  This kind of alignment is pretty
>> common in modern computers.  If you need 256 bit (32-byte) alignment then
>> you will need to build your container manually.  See here for an example:
>> http://stackoverflow.com/questions/9895787/memory-alignment-for-fast-fft-in-python-using-shared-arrrays
>>
>> Francesc
>>
>>
>>>
>>> (BTW: the function works  correctly about 20% of the time I run it, and
>>> else it segfaults on the simd instruction in the the C function)
>>>
>>> Thanks,
>>> -Øystein
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>>
>> --
>> Francesc Alted
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Calling C code that assumes SIMD aligned data.

2016-05-05 Thread Francesc Alted
2016-05-05 11:38 GMT+02:00 Øystein Schønning-Johansen <oyste...@gmail.com>:

> Hi!
>
> I've written a little code of numpy code that does a neural network
> feedforward calculation:
>
> def feedforward(self,x):
> for activation, w, b in zip( self.activations, self.weights,
> self.biases ):
> x = activation( np.dot(w, x) + b)
>
> This works fine when my activation functions are in Python, however I've
> wrapped the activation functions from a C implementation that requires the
> array to be memory aligned. (due to simd instructions in the C
> implementation.) So I need the operation np.dot( w, x) + b to return a
> ndarray where the data pointer is aligned. How can I do that? Is it
> possible at all?
>

Yes.  np.dot() does accept an `out` parameter where you can pass your
aligned array.  The way for testing if numpy is returning you an aligned
array is easy:

In [15]: x = np.arange(6).reshape(2,3)

In [16]: x.ctypes.data % 16
Out[16]: 0

but:

In [17]: x.ctypes.data % 32
Out[17]: 16

so, in this case NumPy returned a 16-byte aligned array which should be
enough for 128 bit SIMD (SSE family).  This kind of alignment is pretty
common in modern computers.  If you need 256 bit (32-byte) alignment then
you will need to build your container manually.  See here for an example:
http://stackoverflow.com/questions/9895787/memory-alignment-for-fast-fft-in-python-using-shared-arrrays

Francesc


>
> (BTW: the function works  correctly about 20% of the time I run it, and
> else it segfaults on the simd instruction in the the C function)
>
> Thanks,
> -Øystein
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 1.0.0 (final) released

2016-04-07 Thread Francesc Alted
=
Announcing bcolz 1.0.0 final
=

What's new
==

Yeah, 1.0.0 is finally here.  We are not introducing any exciting new
feature (just some optimizations and bug fixes), but bcolz is already 6
years old and it implements most of the capabilities that it was
designed for, so I decided to release a 1.0.0 meaning that the format is
declared stable and that people can be assured that future bcolz
releases will be able to read bcolz 1.0 data files (and probably much
earlier ones too) for a long while.  Such a format is fully described
at:

https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst

Also, a 1.0.0 release means that bcolz 1.x series will be based on
C-Blosc 1.x series (https://github.com/Blosc/c-blosc).  After C-Blosc
2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is
expected taking advantage of shiny new features of C-Blosc2 (more
compressors, more filters, native variable length support and the
concept of super-chunks), which should be very beneficial for next bcolz
generation.

Important: this is a final release and there are no important known bugs
there, so this is recommended to be used in production.  Enjoy!

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst

For some comparison between bcolz and other compressed data containers,
see:

https://github.com/FrancescAlted/DataContainersTutorials

specially chapters 3 (in-memory containers) and 4 (on-disk containers).

Also, if it happens that you are in Madrid during this weekend, you can
drop by my tutorial and talk:

http://pydata.org/madrid2016/schedule/

See you!


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) ,
Quantopian
(https://www.quantopian.com/) and Scikit-Allel (
https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.3.1

2016-04-07 Thread Francesc Alted
=
Announcing python-blosc 1.3.1
=

What is new?


This is an important release in terms of stability.  Now, the -O1 flag
for compiling the included C-Blosc sources on Linux.  This represents
slower performance, but fixes the nasty issue #110.  In case maximum
speed is needed, please `compile python-blosc with an external C-Blosc
library <
https://github.com/Blosc/python-blosc#compiling-with-an-installed-blosc-library-recommended
)>`_.

Also, symbols like BLOSC_MAX_BUFFERSIZE have been replaced for allowing
backward compatibility with python-blosc 1.2.x series.

For whetting your appetite, look at some benchmarks here:

https://github.com/Blosc/python-blosc#benchmarking

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor optimized
for binary data.  It has been designed to transmit data to the processor
cache faster than the traditional, non-compressed, direct memory fetch
approach via a memcpy() OS call.  Blosc works well for compressing
numerical arrays that contains data with relatively low entropy, like
sparse data, time series, grids with regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library, with added functions (`compress_ptr()`
and `pack_array()`) for efficiently compressing NumPy arrays, minimizing
the number of memory copies during the process.  python-blosc can be
used to compress in-memory data buffers for transmission to other
machines, persistence or just as a compressed cache.

There is also a handy tool built on top of python-blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a commmand line
interface that allows you to compress large binary datafiles on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Sources repository
==

The sources and documentation are managed through github services at:

http://github.com/Blosc/python-blosc




  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.5.2 released

2016-04-07 Thread Francesc Alted
=
 Announcing Numexpr 2.5.2
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a maintenance release shaking some remaining problems with VML
(it is nice to see how Anaconda VML's support helps raising hidden
issues).  Now conj() and abs() are actually added as VML-powered
functions, preventing the same problems than log10() before (PR #212);
thanks to Tom Kooij.  Upgrading to this release is highly recommended.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 1.0.0 RC2 is out!

2016-03-31 Thread Francesc Alted
==
Announcing bcolz 1.0.0 RC2
==

What's new
==

Yeah, 1.0.0 is finally here.  We are not introducing any exciting new
feature (just some optimizations and bug fixes), but bcolz is already 6
years old and it implements most of the capabilities that it was
designed for, so I decided to release a 1.0.0 meaning that the format is
declared stable and that people can be assured that future bcolz
releases will be able to read bcolz 1.0 data files (and probably much
earlier ones too) for a long while.  Such a format is fully described
at:

https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst

Also, a 1.0.0 release means that bcolz 1.x series will be based on
C-Blosc 1.x series (https://github.com/Blosc/c-blosc).  After C-Blosc
2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is
expected taking advantage of shiny new features of C-Blosc2 (more
compressors, more filters, native variable length support and the
concept of super-chunks), which should be very beneficial for next bcolz
generation.

Important: this is a Release Candidate, so please test it as much as you
can.  If no issues would appear in a week or so, I will proceed to tag
and release 1.0.0 final.  Enjoy!

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html

Installing
==

bcolz is in the PyPI repository, so installing it is easy::

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.5.1 released

2016-03-31 Thread Francesc Alted
=
 Announcing Numexpr 2.5.1
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

Fixed a critical bug that caused wrong evaluations of log10() and
conj().  These produced wrong results when numexpr was compiled with
Intel's MKL (which is a popular build since Anaconda ships it by
default) and non-contiguous data.  This is considered a *critical* bug
and upgrading is highly recommended. Thanks to Arne de Laat and Tom
Kooij for reporting and providing a test unit.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [ANN] bcolz 1.0.0 RC1 released

2016-03-08 Thread Francesc Alted
==
Announcing bcolz 1.0.0 RC1
==

What's new
==

Yeah, 1.0.0 is finally here.  We are not introducing any exciting new
feature (just some optimizations and bug fixes), but bcolz is already 6
years old and it implements most of the capabilities that it was
designed for, so I decided to release a 1.0.0 meaning that the format is
declared stable and that people can be assured that future bcolz
releases will be able to read bcolz 1.0 data files (and probably much
earlier ones too) for a long while.  Such a format is fully described
at:

https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst

Also, a 1.0.0 release means that bcolz 1.x series will be based on
C-Blosc 1.x series (https://github.com/Blosc/c-blosc).  After C-Blosc
2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is
expected taking advantage of shiny new features of C-Blosc2 (more
compressors, more filters, native variable length support and the
concept of super-chunks), which should be very beneficial for next bcolz
generation.

Important: this is a Release Candidate, so please test it as much as you
can.  If no issues would appear in a week or so, I will proceed to tag
and release 1.0.0 final.  Enjoy!

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html

Installing
==

bcolz is in the PyPI repository, so installing it is easy::

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: Numexpr-3.0 proposal

2016-02-16 Thread Francesc Alted
2016-02-16 10:04 GMT+01:00 Robert McLeod <robbmcl...@gmail.com>:

> On Mon, Feb 15, 2016 at 10:43 AM, Gregor Thalhammer <
> gregor.thalham...@gmail.com> wrote:
>
>>
>> Dear Robert,
>>
>> thanks for your effort on improving numexpr. Indeed, vectorized math
>> libraries (VML) can give a large boost in performance (~5x), except for a
>> couple of basic operations (add, mul, div), which current compilers are
>> able to vectorize automatically. With recent gcc even more functions are
>> vectorized, see https://sourceware.org/glibc/wiki/libmvec But you need
>> special flags depending on the platform (SSE, AVX present?), runtime
>> detection of processor capabilities would be nice for distributing
>> binaries. Some time ago, since I lost access to Intels MKL, I patched
>> numexpr to use Accelerate/Veclib on os x, which is preinstalled on each
>> mac, see https://github.com/geggo/numexpr.git veclib_support branch.
>>
>> As you increased the opcode size, I could imagine providing a bit to
>> switch (during runtime) between internal functions and vectorized ones,
>> that would be handy for tests and benchmarks.
>>
>
> Dear Gregor,
>
> Your suggestion to separate the opcode signature from the library used to
> execute it is very clever. Based on your suggestion, I think that the
> natural evolution of the opcodes is to specify them by function signature
> and library, using a two-level dict, i.e.
>
> numexpr.interpreter.opcodes['exp_f8f8f8'][gnu] = some_enum
> numexpr.interpreter.opcodes['exp_f8f8f8'][msvc] = some_enum +1
> numexpr.interpreter.opcodes['exp_f8f8f8'][vml] = some_enum + 2
> numexpr.interpreter.opcodes['exp_f8f8f8'][yeppp] = some_enum +3
>

Yes, by using a two level dictionary you can access the functions
implementing opcodes much faster and hence you can add much more opcodes
without too much slow-down.


>
> I want to procedurally generate opcodes.cpp and interpreter_body.cpp.  If
> I do it the way you suggested funccodes.hpp and all the many #define's
> regarding function codes in the interpreter can hopefully be removed and
> hence simplify the overall codebase. One could potentially take it a step
> further and plan (optimize) each expression, similar to what FFTW does with
> regards to matrix shape. That is, the basic way to control the library
> would be with a singleton library argument, i.e.:
>
> result = ne.evaluate( "A*log(foo**2 / bar**2", lib=vml )
>
> However, we could also permit a tuple to be passed in, where each element
> of the tuple reflects the library to use for each operation in the AST tree:
>
> result = ne.evaluate( "A*log(foo**2 / bar**2", lib=(gnu,gnu,gnu,yeppp,gnu)
> )
>
> In this case the ops are (mul,mul,div,log,mul).  The op-code picking is
> done by the Python side, and this tuple could be potentially optimized by
> numexpr rather than hand-optimized, by trying various permutations of the
> linked C math libraries. The wisdom from the planning could be pickled and
> saved in a wisdom file.  Currently Numexpr has cacheDict in util.py but
> there's no reason this can't be pickled and saved to disk. I've done a
> similar thing by creating wrappers for PyFFTW already.
>

I like the idea of various permutations of linked C math libraries to be
probed by numexpr during the initial iteration and then cached somehow.
That will probably require run-time detection of available C math libraries
(think that a numexpr binary will be able to run on different machines with
different libraries and computing capabilities), but in exchange, it will
allow for the fastest execution paths independently of the machine that
runs the code.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.5

2016-02-06 Thread Francesc Alted
=
 Announcing Numexpr 2.5
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

In this version, a lock has been added so that numexpr can be called
from multithreaded apps.  Mind that this does not prevent numexpr
to use multiple cores internally.  Also, a new min() and max()
functions have been added.  Thanks to contributors!

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

2016-01-14 Thread Francesc Alted
you
not only storage, but processing time too.

Francesc


2016-01-14 11:19 GMT+01:00 Nathaniel Smith <n...@pobox.com>:

> I'd try storing the data in hdf5 (probably via h5py, which is a more
> basic interface without all the bells-and-whistles that pytables
> adds), though any method you use is going to be limited by the need to
> do a seek before each read. Storing the data on SSD will probably help
> a lot if you can afford it for your data size.
>
> On Thu, Jan 14, 2016 at 1:15 AM, Ryan R. Rosario <r...@bytemining.com>
> wrote:
> > Hi,
> >
> > I have a very large dictionary that must be shared across processes and
> does not fit in RAM. I need access to this object to be fast. The key is an
> integer ID and the value is a list containing two elements, both of them
> numpy arrays (one has ints, the other has floats). The key is sequential,
> starts at 0, and there are no gaps, so the “outer” layer of this data
> structure could really just be a list with the key actually being the
> index. The lengths of each pair of arrays may differ across keys.
> >
> > For a visual:
> >
> > {
> > key=0:
> > [
> > numpy.array([1,8,15,…, 16000]),
> > numpy.array([0.1,0.1,0.1,…,0.1])
> > ],
> > key=1:
> > [
> > numpy.array([5,6]),
> > numpy.array([0.5,0.5])
> > ],
> > …
> > }
> >
> > I’ve tried:
> > -   manager proxy objects, but the object was so big that low-level
> code threw an exception due to format and monkey-patching wasn’t successful.
> > -   Redis, which was far too slow due to setting up connections and
> data conversion etc.
> > -   Numpy rec arrays + memory mapping, but there is a restriction
> that the numpy arrays in each “column” must be of fixed and same size.
> > -   I looked at PyTables, which may be a solution, but seems to have
> a very steep learning curve.
> > -   I haven’t tried SQLite3, but I am worried about the time it
> takes to query the DB for a sequential ID, and then translate byte arrays.
> >
> > Any ideas? I greatly appreciate any guidance you can provide.
> >
> > Thanks,
> > Ryan
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance solving system of equations in numpy and MATLAB

2015-12-17 Thread Francesc Alted
2015-12-17 12:00 GMT+01:00 Daπid <davidmen...@gmail.com>:

> On 16 December 2015 at 18:59, Francesc Alted <fal...@gmail.com> wrote:
>
>> Probably MATLAB is shipping with Intel MKL enabled, which probably is the
>> fastest LAPACK implementation out there.  NumPy supports linking with MKL,
>> and actually Anaconda does that by default, so switching to Anaconda would
>> be a good option for you.
>
>
> A free alternative is OpenBLAS. I am getting 20 s in an i7 Haswell with 8
> cores.
>

Pretty good.  I did not know that OpenBLAS was so close in performance to
MKL.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance solving system of equations in numpy and MATLAB

2015-12-16 Thread Francesc Alted
Sorry, I have to correct myself, as per:
http://docs.continuum.io/mkl-optimizations/index it seems that Anaconda is
not linking with MKL by default (I thought that was the case before?).
After installing MKL (conda install mkl), I am getting:

In [1]: import numpy as np
Vendor:  Continuum Analytics, Inc.
Package: mkl
Message: trial mode expires in 30 days

In [2]: testA = np.random.randn(15000, 15000)

In [3]: testb = np.random.randn(15000)

In [4]: %time testx = np.linalg.solve(testA, testb)
CPU times: user 1min, sys: 468 ms, total: 1min 1s
Wall time: 15.3 s


so, it looks like you will need to buy a MKL license separately (which
makes sense for a commercial product).

Sorry for the confusion.
Francesc


2015-12-16 18:59 GMT+01:00 Francesc Alted <fal...@gmail.com>:

> Hi,
>
> Probably MATLAB is shipping with Intel MKL enabled, which probably is the
> fastest LAPACK implementation out there.  NumPy supports linking with MKL,
> and actually Anaconda does that by default, so switching to Anaconda would
> be a good option for you.
>
> Here you have what I am getting with Anaconda's NumPy and a machine with 8
> cores:
>
> In [1]: import numpy as np
>
> In [2]: testA = np.random.randn(15000, 15000)
>
> In [3]: testb = np.random.randn(15000)
>
> In [4]: %time testx = np.linalg.solve(testA, testb)
> CPU times: user 5min 36s, sys: 4.94 s, total: 5min 41s
> Wall time: 46.1 s
>
> This is not 20 sec, but it is not 3 min either (but of course that depends
> on your machine).
>
> Francesc
>
> 2015-12-16 18:34 GMT+01:00 Edward Richards <edwardlricha...@gmail.com>:
>
>> I recently did a conceptual experiment to estimate the computational time
>> required to solve an exact expression in contrast to an approximate
>> solution (Helmholtz vs. Helmholtz-Kirchhoff integrals). The exact solution
>> requires a matrix inversion, and in my case the matrix would contain ~15000
>> rows.
>>
>> On my machine MATLAB seems to perform this matrix inversion with random
>> matrices about 9x faster (20 sec vs 3 mins). I thought the performance
>> would be roughly the same because I presume both rely on the same LAPACK
>> solvers.
>>
>> I will not actually need to solve this problem (even at 20 sec it is
>> prohibitive for broadband simulation), but if I needed to I would
>> reluctantly choose MATLAB . I am simply wondering why there is this
>> performance gap, and if there is a better way to solve this problem in
>> numpy?
>>
>> Thank you,
>>
>> Ned
>>
>> #Python version
>>
>> import numpy as np
>>
>> testA = np.random.randn(15000, 15000)
>>
>> testb = np.random.randn(15000)
>>
>> %time testx = np.linalg.solve(testA, testb)
>>
>> %MATLAB version
>>
>> testA = randn(15000);
>>
>> testb = randn(15000, 1);
>> tic(); testx = testA \ testb; toc();
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
> Francesc Alted
>



-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance solving system of equations in numpy and MATLAB

2015-12-16 Thread Francesc Alted
Hi,

Probably MATLAB is shipping with Intel MKL enabled, which probably is the
fastest LAPACK implementation out there.  NumPy supports linking with MKL,
and actually Anaconda does that by default, so switching to Anaconda would
be a good option for you.

Here you have what I am getting with Anaconda's NumPy and a machine with 8
cores:

In [1]: import numpy as np

In [2]: testA = np.random.randn(15000, 15000)

In [3]: testb = np.random.randn(15000)

In [4]: %time testx = np.linalg.solve(testA, testb)
CPU times: user 5min 36s, sys: 4.94 s, total: 5min 41s
Wall time: 46.1 s

This is not 20 sec, but it is not 3 min either (but of course that depends
on your machine).

Francesc

2015-12-16 18:34 GMT+01:00 Edward Richards <edwardlricha...@gmail.com>:

> I recently did a conceptual experiment to estimate the computational time
> required to solve an exact expression in contrast to an approximate
> solution (Helmholtz vs. Helmholtz-Kirchhoff integrals). The exact solution
> requires a matrix inversion, and in my case the matrix would contain ~15000
> rows.
>
> On my machine MATLAB seems to perform this matrix inversion with random
> matrices about 9x faster (20 sec vs 3 mins). I thought the performance
> would be roughly the same because I presume both rely on the same LAPACK
> solvers.
>
> I will not actually need to solve this problem (even at 20 sec it is
> prohibitive for broadband simulation), but if I needed to I would
> reluctantly choose MATLAB . I am simply wondering why there is this
> performance gap, and if there is a better way to solve this problem in
> numpy?
>
> Thank you,
>
> Ned
>
> #Python version
>
> import numpy as np
>
> testA = np.random.randn(15000, 15000)
>
> testb = np.random.randn(15000)
>
> %time testx = np.linalg.solve(testA, testb)
>
> %MATLAB version
>
> testA = randn(15000);
>
> testb = randn(15000, 1);
> tic(); testx = testA \ testb; toc();
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 0.12.0 released

2015-11-16 Thread Francesc Alted
===
Announcing bcolz 0.12.0
===

What's new
==

This release copes with some compatibility issues with NumPy 1.10.
Also, several improvements have happened in the installation procedure,
allowing for a smoother process.  Last but not least, the tutorials
haven been migrated to the IPython notebook format (a huge thank you to
Francesc Elies for this!).  This will hopefully will allow users to
better exercise the different features of bcolz.

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html

Installing
==

bcolz is in the PyPI repository, so installing it is easy::

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.4.6 released

2015-11-02 Thread Francesc Alted
Hi,

This is a quick release fixing some reported problems in the 2.4.5 version
that I announced a few hours ago.  Hope I have fixed the main issues now.
Now, the official announcement:

=
 Announcing Numexpr 2.4.6
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a quick maintenance version that offers better handling of
MSVC symbols (#168, Francesc Alted), as well as fising some
UserWarnings in Solaris (#189, Graham Jones).

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.4.5 released

2015-11-02 Thread Francesc Alted
=
 Announcing Numexpr 2.4.5
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a maintenance release where an important bug in multithreading
code has been fixed (#185 Benedikt Reinartz, Francesc Alted).  Also,
many harmless warnings (overflow/underflow, divide by zero and others)
in the test suite have been silenced  (#183, Francesc Alted).

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 0.11.3 released!

2015-10-05 Thread Francesc Alted
===
Announcing bcolz 0.11.3
===

What's new
==

Implemented new feature (#255): bcolz.zeros() can create new ctables
too, either empty or filled with zeros. (#256 @FrancescElies
@FrancescAlted).

Also, in previous, non announced versions (0.11.1 and 0.11.2), new
dependencies were added and other fixes are there too.

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html

Installing
==

bcolz is in the PyPI repository, so installing it is easy::

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Governance model request

2015-09-23 Thread Francesc Alted
, famous and powerful as Travis may be,
> he's still our colleague, a member of our community, and *a human being*,
> so let's remember that as well...
>
>
> 2. Conflicts of interest are a fact of life, in fact, I would argue that
> every healthy and sufficiently interconnected community eventually *should*
> have conflicts of interest. They are a sign that there is activity across
> multiple centers of interest, and individuals with connections in multiple
> areas of the community.  And we *want* folks who are engaged enough
> precisely to have such interests!
>
> For conflict of interest management, we don't need to reinvent the wheel,
> this is actually something where our beloved institutions, blessed be their
> bureaucratic souls, have tons of training materials that happen to be not
> completely useless.  Most universities and the national labs have
> information on COIs that provides guidelines, and Numpy could include in
> its governance model more explicit language about COIs if desired.
>
> So, the issue is not to view COIs as something evil or undesirable, but
> rather as the very real consequence of operating in an interconnected set
> of institutions.  And once you take that stance, you deal with that
> rationally and realistically.
>
> For example, you accept that companies aren't the only ones with potential
> COIs: *all* entities have them. As Ryan May aptly pointed out, the notion
> that academic institutions are somehow immune to hidden agendas or other
> interests is naive at best... And I say that as someone who has happily
> stayed in academia, resisting multiple overtures from industry over the
> years, but not out of some quaint notion that academia is a pristine haven
> of principled purity. Quite the opposite: in building large and complex
> projects, I've seen painfully close how the university/government research
> world has its own flavor of the same power, financial and political
> ugliness that we attribute to the commercial side.
>
>
> 3. Commercial actors.  Following up on the last paragraph, we should
> accept that *all* institutions have agendas, not just companies.  We live
> in a world with companies, and I think it's naive to take a knee-jerk
> anti-commercial stance: our community has had a productive and successful
> history of interaction with industry in the past, and hopefully that will
> continue in the future.
>
> What is true, however, is that community projects should maintain the
> "seat of power" in the community, and *not* in any single company.  In
> fact, this is important even to ensure that many companies feel comfortable
> engaging the projects, precisely so they know that the technology is driven
> in an open and neutral way even if some of their competitors participate.
>
> That's why a governance model that is anchored in neutral ground is so
> important.  We've worked hard to make Numfocus the legal entity that can
> play that role (that's why it's a 501(c)3), and that's why we've framed our
> governance model for Jupyter in a way that makes all the institutions
> (including Berkeley and Cal Poly) simply 'partners' that contribute by
> virtue of supporting employees.  But the owners of the decisions are the
> *individuals* who do the work and form the community, not the
> companies/institutions.
>
>
> If we accept these premises, then hopefully we can have a rational
> conversation about how to build a community, where at any point in time,
> any of us should be judged on the merit of our actions, not the
> hypotheticals of our intentions or our affiliations (commercial,
> government, academic, etc).
>
>
> Sorry for the long wall of text, I rarely post on this list anymore.  But
> I was saddened to see the turn of this thread, and I hope I can contribute
> some perspective (and not make things worse :)
>
>
> Cheers,
>
> --
> Fernando Perez (@fperez_org; http://fperez.org)
> fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
> fernando.perez-at-berkeley: contact me here for any direct mail
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: Numexpr 2.4.4 is out

2015-09-18 Thread Francesc Alted
=
 Announcing Numexpr 2.4.4
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a maintenance release which contains several bug fixes, like
better testing on Python3 platform and some harmless data race.  Among
the enhancements, AppVeyor support is here and OMP_NUM_THREADS is
honored as a fallback in case NUMEXPR_NUM_THREADS is not set.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst


Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.

Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.2.8 released

2015-09-18 Thread Francesc Alted
=
Announcing python-blosc 1.2.8
=

What is new?


This is a maintenance release.  Internal C-Blosc has been upgraded to
1.7.0 (although new bitshuffle support has not been made public, as it
seems not ready for production yet).

Also, there is support for bytes-like objects that support the buffer
interface as input to ``compress`` and ``decompress``. On Python 2.x
this includes unicode, on Python 3.x it doesn't.  Thanks to Valentin
Haenel.

Finally, a memory leak in ``decompress```has been hunted and fixed.  And
new tests have been added to catch possible leaks in the future.  Thanks
to Santi Villalba.

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc is the first compressor that is meant not only to reduce the size
of large datasets on-disk or in-memory, but also to accelerate object
manipulations that are memory-bound
(http://www.blosc.org/docs/StarvingCPUs.pdf).  See
http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on
how much speed it can achieve in some datasets.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library.

There is also a handy tool built on Blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a commmand line
interface that allows you to compress large binary datafiles on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you must omit the 'python-' prefix


Download sources


The sources are managed through github services at:

http://github.com/Blosc/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://python-blosc.blosc.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/Blosc/python-blosc/blob/master/LICENSES

for more details.



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 0.11.0 released

2015-09-09 Thread Francesc Alted
===
Announcing bcolz 0.11.0
===

What's new
==

Although this is mostly a maintenance release that fixes some bugs, the
setup.py is entirely based now in setuptools and has been greatly
modernized to use a new versioning system.  Just this deserves a bump in
the minor version.  Thanks to Gabi Davar (@mindw) for such a nice
improvement.

Also, many improvements to the Continuous Integration part (and hence
not directly visible to users) and others have been made by Francesc
Elies (@FrancescElies).  Thanks for his quiet but effective work.

And last but not least, I would like to announce that Valentin Haenel
(@esc) just stepped down as release manager.  Thanks Valentin for all
the hard work that you put in making bcolz a better piece of software!


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/bcolz.html

Installing
==

bcolz is in the PyPI repository, so installing it is easy::

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015)

2015-08-28 Thread Francesc Alted
 and deeper in
 (technical) debt.

 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
 de dominación mundial.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-26 Thread Francesc Alted
Hi,

Thanks Nathaniel and others for sparking this discussion as I think it is
very timely.

2015-08-25 12:03 GMT+02:00 Nathaniel Smith n...@pobox.com:

   Let's focus on evolving numpy as far as we can without major
   break-the-world changes (no numpy 2.0, at least in the foreseeable
   future).

   And, as a target for that evolution, let's change our focus from
   numpy as NumPy is the library that gives you the np.ndarray object
   (plus some attached infrastructure), to NumPy provides the
   standard framework for working with arrays and array-like objects in
   Python


Sorry to disagree here, but in my opinion NumPy *already* provides the
standard framework for working with arrays and array-like objects in Python
as its huge popularity shows.  If what you mean is that there are too many
efforts trying to provide other, specialized data containers (things like
DataFrame in pandas, DataArray/Dataset in xarray or carray/ctable in bcolz
just to mention a few), then let me say that I am of the opinion that there
can't be a silver bullet for tackling all the problems that the PyData
community is facing.

The libraries using specialized data containers (pandas, xray, bcolz...)
may have more or less machinery on top of them so that conversion to NumPy
not necessarily happens internally (many times we don't want conversions
for efficiency), but it is the capability of producing NumPy arrays out of
them (or parts of them) what makes these specialized containers to be
incredible more useful to users because they can use NumPy to fill the
missing gaps, or just use NumPy as an intermediate container that acts as
input for other libraries.

On the subject on why I don't think a universal data container is feasible
for PyData, you just have to have a look at how many data structures Python
is providing in the language itself (tuples, lists, dicts, sets...), and
how many are added in the standard library (like those in the collections
sub-package).  Every data container is designed to do a couple of things
(maybe three) well, but for other use cases it is the responsibility of the
user to choose the more appropriate depending on her needs.  In the same
vein, I also think that it makes little sense to try to come with a
standard solution that is going to satisfy everyone's need.  IMHO, and
despite all efforts, neither NumPy,  NumPy 2.0, DyND, bcolz or any other is
going to offer the universal data container.

Instead of that, let me summarize what users/developers like me need from
NumPy for continue creating more specialized data containers:

1) Keep NumPy simple. NumPy is the truly cornerstone of PyData right now,
and it will be for the foreseeable future, so please keep it usable and
*minimal*.  Before adding any more feature the increase in complexity
should carefully weighted.

2) Make NumPy more flexible. Any rewrite that allows arrays or dtypes to be
subclassed and extended more easily will be a huge win.  *But* if in order
to allow flexibility you have to make NumPy much more complex, then point
1) should prevail.

3) Make of NumPy a sustainable project. Historically NumPy depended on
heroic efforts of individuals to make it what it is now: *an industry
standard*.  But individual efforts, while laudable, are not enough, so
please, please, please continue the effort of constituting a governance
team that ensures the future of NumPy (and with it, the whole PyData
community).

Finally, the question on whether NumPy 2.0 or projects like DyND should be
chosen instead for implementing new features is still legitimate, and while
I have my own opinions (favourable to DyND), I still see (such is the price
of technological debt) a distant future where we will find NumPy as we know
it, allowing more innovation to happen in Python Data space.

Again, thanks to all those braves that are allowing others to build on top
of NumPy's shoulders.

--
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] UTC-based datetime64

2015-08-26 Thread Francesc Alted
Hi,

We've found that NumPy uses the local TZ for printing datetime64 timestamps:

In [22]: t = datetime.utcnow()

In [23]: print t
2015-08-26 11:52:10.662745

In [24]: np.array([t], dtype=datetime64[s])
Out[24]: array(['2015-08-26T13:52:10+0200'], dtype='datetime64[s]')

Googling for a way to print UTC out of the box, the best thing I could find
is:

In [40]: [str(i.item()) for i in np.array([t], dtype=datetime64[s])]
Out[40]: ['2015-08-26 11:52:10']

Now, is there a better way to specify that I want the datetimes printed
always in UTC?

Thanks,
-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about unaligned access

2015-07-06 Thread Francesc Alted
2015-07-06 18:04 GMT+02:00 Jaime Fernández del Río jaime.f...@gmail.com:

 On Mon, Jul 6, 2015 at 10:18 AM, Francesc Alted fal...@gmail.com wrote:

 Hi,

 I have stumbled into this:

 In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)),
 dtype=[('f0', np.int64), ('f1', np.int32)])

 In [63]: %timeit sa['f0'].sum()
 100 loops, best of 3: 4.52 ms per loop

 In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)),
 dtype=[('f0', np.int64), ('f1', np.int64)])

 In [65]: %timeit sa['f0'].sum()
 1000 loops, best of 3: 896 µs per loop

 The first structured array is made of 12-byte records, while the second
 is made by 16-byte records, but the latter performs 5x faster.  Also, using
 an structured array that is made of 8-byte records is the fastest
 (expected):

 In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)), dtype=[('f0',
 np.int64)])

 In [67]: %timeit sa['f0'].sum()
 1000 loops, best of 3: 567 µs per loop

 Now, my laptop has a Ivy Bridge processor (i5-3380M) that should perform
 quite well on unaligned data:


 http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/

 So, if 4 years-old Intel architectures do not have a penalty for
 unaligned access, why I am seeing that in NumPy?  That strikes like a quite
 strange thing to me.


 I believe that the way numpy is setup, it never does unaligned access,
 regardless of the platform, in case it gets run on one that would go up in
 flames if you tried to. So my guess would be that you are seeing chunked
 copies into a buffer, as opposed to bulk copying or no copying at all, and
 that would explain your timing differences. But Julian or Sebastian can
 probably give you a more informed answer.


Yes, my guess is that you are right.  I suppose that it is possible to
improve the numpy codebase to accelerate this particular access pattern on
Intel platforms, but provided that structured arrays are not that used
(pandas is probably leading this use case by far, and as far as I know,
they are not using structured arrays internally in DataFrames), then maybe
it is not worth to worry about this too much.

Thanks anyway,
Francesc



 Jaime



 Thanks,
 Francesc

 --
 Francesc Alted

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
 de dominación mundial.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about unaligned access

2015-07-06 Thread Francesc Alted
Oops, forgot to mention my NumPy version:

In [72]: np.__version__
Out[72]: '1.9.2'

Francesc

2015-07-06 17:18 GMT+02:00 Francesc Alted fal...@gmail.com:

 Hi,

 I have stumbled into this:

 In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
 np.int64), ('f1', np.int32)])

 In [63]: %timeit sa['f0'].sum()
 100 loops, best of 3: 4.52 ms per loop

 In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
 np.int64), ('f1', np.int64)])

 In [65]: %timeit sa['f0'].sum()
 1000 loops, best of 3: 896 µs per loop

 The first structured array is made of 12-byte records, while the second is
 made by 16-byte records, but the latter performs 5x faster.  Also, using an
 structured array that is made of 8-byte records is the fastest (expected):

 In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)), dtype=[('f0',
 np.int64)])

 In [67]: %timeit sa['f0'].sum()
 1000 loops, best of 3: 567 µs per loop

 Now, my laptop has a Ivy Bridge processor (i5-3380M) that should perform
 quite well on unaligned data:


 http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/

 So, if 4 years-old Intel architectures do not have a penalty for unaligned
 access, why I am seeing that in NumPy?  That strikes like a quite strange
 thing to me.

 Thanks,
 Francesc

 --
 Francesc Alted




-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Question about unaligned access

2015-07-06 Thread Francesc Alted
Hi,

I have stumbled into this:

In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
np.int64), ('f1', np.int32)])

In [63]: %timeit sa['f0'].sum()
100 loops, best of 3: 4.52 ms per loop

In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
np.int64), ('f1', np.int64)])

In [65]: %timeit sa['f0'].sum()
1000 loops, best of 3: 896 µs per loop

The first structured array is made of 12-byte records, while the second is
made by 16-byte records, but the latter performs 5x faster.  Also, using an
structured array that is made of 8-byte records is the fastest (expected):

In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)), dtype=[('f0',
np.int64)])

In [67]: %timeit sa['f0'].sum()
1000 loops, best of 3: 567 µs per loop

Now, my laptop has a Ivy Bridge processor (i5-3380M) that should perform
quite well on unaligned data:

http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/

So, if 4 years-old Intel architectures do not have a penalty for unaligned
access, why I am seeing that in NumPy?  That strikes like a quite strange
thing to me.

Thanks,
Francesc

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.2.7 released

2015-05-06 Thread Francesc Alted
=
Announcing python-blosc 1.2.7
=

What is new?


Updated to use c-blosc v1.6.1.  Although that this supports AVX2, it is
not enabled in python-blosc because we still need a way to devise how to
detect AVX2 in the underlying platform.

At any rate, c-blosc 1.6.1 fixed an important bug in the blosclz codec that
a release was deemed important.

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc is the first compressor that is meant not only to reduce the size
of large datasets on-disk or in-memory, but also to accelerate object
manipulations that are memory-bound
(http://www.blosc.org/docs/StarvingCPUs.pdf).  See
http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on
how much speed it can achieve in some datasets.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library.

There is also a handy tool built on Blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a commmand line
interface that allows you to compress large binary datafiles on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the python- prefix


Download sources


The sources are managed through github services at:

http://github.com/Blosc/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://python-blosc.blosc.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/Blosc/python-blosc/blob/master/LICENSES

for more details.



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: PyTables 3.2.0 (final) released!

2015-05-06 Thread Francesc Alted
===
 Announcing PyTables 3.2.0
===

We are happy to announce PyTables 3.2.0.

***
IMPORTANT NOTICE:

If you are a user of PyTables, it needs your help to keep going.  Please
read the next thread as it contains important information about the
future (or the lack of it) of the project:

https://groups.google.com/forum/#!topic/pytables-users/yY2aUa4H7W4

Thanks!
***


What's new
==

This is a major release of PyTables and it is the result of more than a
year of accumulated patches, but most specially it fixes a couple of
nasty problem with indexed queries not returning the correct results in
some scenarios.  There are many usablity and performance improvements
too.

In case you want to know more in detail what has changed in this
version, please refer to: http://www.pytables.org/release_notes.html

You can install it via pip or download a source package with generated
PDF and HTML docs from:
http://sourceforge.net/projects/pytables/files/pytables/3.2.0

For an online version of the manual, visit:
http://www.pytables.org/usersguide/index.html


What it is?
===

PyTables is a library for managing hierarchical datasets and
designed to efficiently cope with extremely large amounts of data with
support for full 64-bit file addressing.  PyTables runs on top of
the HDF5 library and NumPy package for achieving maximum throughput and
convenient use.  PyTables includes OPSI, a new indexing technology,
allowing to perform data lookups in tables exceeding 10 gigarows
(10**10 rows) in less than a tenth of a second.


Resources
=

About PyTables: http://www.pytables.org

About the HDF5 library: http://hdfgroup.org/HDF5/

About NumPy: http://numpy.scipy.org/


Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy makers.
Without them, PyTables simply would not exist.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.




  **Enjoy data!**

  -- The PyTables Developers
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: PyTables 3.2.0 RC2 is out

2015-05-01 Thread Francesc Alted
===
 Announcing PyTables 3.2.0rc2
===

We are happy to announce PyTables 3.2.0rc2.

***
IMPORTANT NOTICE:

If you are a user of PyTables, it needs your help to keep going.  Please
read the next thread as it contains important information about the
future (or lack of it) of the project:

https://groups.google.com/forum/#!topic/pytables-users/yY2aUa4H7W4

Thanks!
***


What's new
==

This is a major release of PyTables and it is the result of more than a
year of accumulated patches, but most specially it fixes a couple of
nasty problem with indexed queries not returning the correct results in
some scenarios (mainly pandas users).  There are many usability and
performance improvements too.

In case you want to know more in detail what has changed in this
version, please refer to: http://www.pytables.org/release_notes.html

You can install it via pip or download a source package with generated
PDF and HTML docs from:
http://sourceforge.net/projects/pytables/files/pytables/3.2.0rc2

For an online version of the manual, visit:
http://www.pytables.org/usersguide/index.html


What it is?
===

PyTables is a library for managing hierarchical datasets and
designed to efficiently cope with extremely large amounts of data with
support for full 64-bit file addressing.  PyTables runs on top of
the HDF5 library and NumPy package for achieving maximum throughput and
convenient use.  PyTables includes OPSI, a new indexing technology,
allowing to perform data lookups in tables exceeding 10 gigarows
(10**10 rows) in less than a tenth of a second.


Resources
=

About PyTables: http://www.pytables.org

About the HDF5 library: http://hdfgroup.org/HDF5/

About NumPy: http://numpy.scipy.org/


Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy makers.
Without them, PyTables simply would not exist.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.




  **Enjoy data!**

  -- The PyTables Developers
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: numexpr 2.4.3 released

2015-04-28 Thread Francesc Alted
2015-04-28 4:59 GMT+02:00 Neil Girdhar mistersh...@gmail.com:

 I don't think I'm asking for so much.  Somewhere inside numexpr it builds
 an AST of its own, which it converts into the optimized code.   It would be
 more useful to me if that AST were in the same format as the one returned
 by Python's ast module.  This way, I could glue in the bits of numexpr that
 I like with my code.  For my purpose, this would have been the more ideal
 design.


I don't think implementing this for numexpr would be that complex. So for
example, one could add a new numexpr.eval_ast(ast_expr) function.  Pull
requests are welcome.

At any rate, which is your use case?  I am curious.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.4.3 released

2015-04-27 Thread Francesc Alted
 Announcing Numexpr 2.4.3
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a maintenance release to cope with an old bug affecting
comparisons with empty strings.  Fixes #121 and PyTables #184.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: PyTables 3.2.0 release candidate 1 is out

2015-04-21 Thread Francesc Alted
===
 Announcing PyTables 3.2.0rc1
===

We are happy to announce PyTables 3.2.0rc1.

***
IMPORTANT NOTICE:

If you are a user of PyTables, it needs your help to keep going.  Please
read the next thread as it contains important information about the
future of the project:

https://groups.google.com/forum/#!topic/pytables-users/yY2aUa4H7W4

Thanks!
***


What's new
==

This is a major release of PyTables and it is the result of more than a
year of accumulated patches, but most specially it fixes a nasty problem
with indexed queries not returning the correct results in some
scenarios.  There are many usablity and performance improvements too.

In case you want to know more in detail what has changed in this
version, please refer to: http://pytables.github.io/release_notes.html

You can download a source package with generated PDF and HTML docs, as
well as binaries for Windows, from:
http://sourceforge.net/projects/pytables/files/pytables/3.2.0rc1

For an online version of the manual, visit:
http://pytables.github.io/usersguide/index.html


What it is?
===

PyTables is a library for managing hierarchical datasets and
designed to efficiently cope with extremely large amounts of data with
support for full 64-bit file addressing.  PyTables runs on top of
the HDF5 library and NumPy package for achieving maximum throughput and
convenient use.  PyTables includes OPSI, a new indexing technology,
allowing to perform data lookups in tables exceeding 10 gigarows
(10**10 rows) in less than a tenth of a second.


Resources
=

About PyTables: http://www.pytables.org

About the HDF5 library: http://hdfgroup.org/HDF5/

About NumPy: http://numpy.scipy.org/


Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy makers.
Without them, PyTables simply would not exist.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.




  **Enjoy data!**

  -- The PyTables Developers
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.4.1 released

2015-04-14 Thread Francesc Alted
=
 Announcing Numexpr 2.4.1
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

In this version there is improved support for newer MKL library as well
as other minor improvements.  This version is meant for production.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Introductory mail and GSoc Project Vector math library integration

2015-03-11 Thread Francesc Alted
2015-03-08 21:47 GMT+01:00 Dp Docs sdpa...@gmail.com:

 Hi all,
 I am a CS 3rd Undergrad. Student from an Indian Institute (III T). I
 believe I am good in Programming languages like C/C++, Python as I
 have already done Some Projects using these language as a part of my
 academics. I really like Coding (Competitive as well as development).
 I really want to get involved in Numpy Development Project and want to
 take  Vector math library integration as a part of my project. I
 want to here any idea from your side for this project.
 Thanks For your time for reading this email and responding back.


As Sturla and Gregor suggested, there are quite a few attempts to solve
this shortcoming in NumPy.  In particular Gregor integrated MKL/VML support
in numexpr quite a long time ago, and when combined with my own
implementation of pooled threads (behaving better than Intel's
implementation in VML), then the thing literally flies:

 https://github.com/pydata/numexpr/wiki/NumexprMKL

numba is also another interesting option and it shows much better compiling
times than the integrated compiler in numexpr.  You can see a quick
comparison about expected performances between numexpr and numba:

http://nbviewer.ipython.org/gist/anonymous/4117896

In general, numba wins for small arrays, but numexpr can achieve very good
performance for larger ones.  I think there are interesting things to
discover in both projects, as for example, how they manage memory in order
to avoid temporaries or how they deal with unaligned data efficiently.  I
would advise to look at existing docs and presentations explaining things
in more detail too.

All in all, I would really love to see such a vector math library support
integrated in NumPy because frankly, I don't have bandwidth for maintaining
numexpr anymore (and I am afraid that nobody else would jump in this ship
;).

Good luck!

Francesc



 My IRCnickname: dp

 Real Name: Durgesh Pandey.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Vectorizing computation

2015-02-13 Thread Francesc Alted
Hi,

I would like to vectorize the next computation:

nx, ny, nz = 720, 180, 3
outheight = np.arange(nz) * 3
oro = np.arange(nx * ny).reshape((nx, ny))

def compute1(outheight, oro):
result = np.zeros((nx, ny, nz))
for ix in range(nx):
for iz in range(nz):
result[ix, :, iz] = outheight[iz] + oro[ix, :]
return result

I think this should be possible by using an advanced use of broadcasting in
numpy.  Anyone willing to post a solution?

Thanks,
-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Vectorizing computation

2015-02-13 Thread Francesc Alted
2015-02-13 12:51 GMT+01:00 Julian Taylor jtaylor.deb...@googlemail.com:

 On 02/13/2015 11:51 AM, Francesc Alted wrote:
  Hi,
 
  I would like to vectorize the next computation:
 
  nx, ny, nz = 720, 180, 3
  outheight = np.arange(nz) * 3
  oro = np.arange(nx * ny).reshape((nx, ny))
 
  def compute1(outheight, oro):
  result = np.zeros((nx, ny, nz))
  for ix in range(nx):
  for iz in range(nz):
  result[ix, :, iz] = outheight[iz] + oro[ix, :]
  return result
 
  I think this should be possible by using an advanced use of broadcasting
  in numpy.  Anyone willing to post a solution?


 result = outheight + oro.reshape(nx, ny, 1)


And 4x faster for my case.  Oh my, I am afraid that my mind will never
scratch all the amazing possibilities that broadcasting is offering :)

Thank you very much for such an elegant solution!

Francesc
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Vectorizing computation

2015-02-13 Thread Francesc Alted
2015-02-13 13:25 GMT+01:00 Julian Taylor jtaylor.deb...@googlemail.com:

 On 02/13/2015 01:03 PM, Francesc Alted wrote:
  2015-02-13 12:51 GMT+01:00 Julian Taylor jtaylor.deb...@googlemail.com
  mailto:jtaylor.deb...@googlemail.com:
 
  On 02/13/2015 11:51 AM, Francesc Alted wrote:
   Hi,
  
   I would like to vectorize the next computation:
  
   nx, ny, nz = 720, 180, 3
   outheight = np.arange(nz) * 3
   oro = np.arange(nx * ny).reshape((nx, ny))
  
   def compute1(outheight, oro):
   result = np.zeros((nx, ny, nz))
   for ix in range(nx):
   for iz in range(nz):
   result[ix, :, iz] = outheight[iz] + oro[ix, :]
   return result
  
   I think this should be possible by using an advanced use of
  broadcasting
   in numpy.  Anyone willing to post a solution?
 
 
  result = outheight + oro.reshape(nx, ny, 1)
 
 
  And 4x faster for my case.  Oh my, I am afraid that my mind will never
  scratch all the amazing possibilities that broadcasting is offering :)
 
  Thank you very much for such an elegant solution!
 


 if speed is a concern this is faster as it has a better data layout for
 numpy during the computation, but the result may be worse layed out for
 further processing

 result = outheight.reshape(nz, 1, 1) + oro
 return np.rollaxis(result, 0, 3)


Holly cow, this makes for another 4x speed improvement!  I don't think I
need that much in my scenario, so I will stick with the first one (more
readable and the expected data layout), but thanks a lot!

Francesc
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 0.7.1 released

2014-07-30 Thread Francesc Alted
==
Announcing bcolz 0.7.1
==

What's new
==

This is maintenance release, where bcolz got rid of the nose dependency
for Python 2.6 (only unittest2 should be required).  Also, some small
fixes for the test suite, specially in 32-bit has been done.  Thanks to
Ilan Schnell for pointing out the problems and for suggesting fixes.

``bcolz`` is a renaming of the ``carray`` project.  The new goals for
the project are to create simple, yet flexible compressed containers,
that can live either on-disk or in-memory, and with some
high-performance iterators (like `iter()`, `where()`) for querying them.

Together, bcolz and the Blosc compressor, are finally fullfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

For more detailed info, see the release notes in:
https://github.com/Blosc/bcolz/wiki/Release-Notes


What it is
==

bcolz provides columnar and compressed data containers.  Column storage
allows for efficiently querying tables with a large number of columns.
It also allows for cheap addition and removal of column.  In addition,
bcolz objects are compressed by default for reducing memory/disk I/O
needs.  The compression process is carried out internally by Blosc, a
high-performance compressor that is optimized for binary data.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.


Installing
==

bcolz is in the PyPI repository, so installing it is easy:

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt




   **Enjoy data!**

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 0.7.0 released

2014-07-22 Thread Francesc Alted
==
Announcing bcolz 0.7.0
==

What's new
==

In this release, support for Python 3 has been added, Pandas and
HDF5/PyTables conversion, support for different compressors via latest
release of Blosc, and a new `iterblocks()` iterator.

Also, intensive benchmarking has lead to an important tuning of buffer
sizes parameters so that compression and evaluation goes faster than
ever.  Together, bcolz and the Blosc compressor, are finally fullfilling
the promise of accelerating memory I/O, at least for some real
scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

``bcolz`` is a renaming of the ``carray`` project.  The new goals for
the project are to create simple, yet flexible compressed containers,
that can live either on-disk or in-memory, and with some
high-performance iterators (like `iter()`, `where()`) for querying them.

For more detailed info, see the release notes in:
https://github.com/Blosc/bcolz/wiki/Release-Notes


What it is
==

bcolz provides columnar and compressed data containers.  Column storage
allows for efficiently querying tables with a large number of columns.
It also allows for cheap addition and removal of column.  In addition,
bcolz objects are compressed by default for reducing memory/disk I/O
needs.  The compression process is carried out internally by Blosc, a
high-performance compressor that is optimized for binary data.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.


Installing
==

bcolz is in the PyPI repository, so installing it is easy:

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt




   **Enjoy data!**

-- Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.2.7 released

2014-07-07 Thread Francesc Alted
=
Announcing python-blosc 1.2.4
=

What is new?


This is a maintenance release, where included c-blosc sources have been
updated to 1.4.0.  This adds support for non-Intel architectures, most
specially those not supporting unaligned access.

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc is the first compressor that is meant not only to reduce the size
of large datasets on-disk or in-memory, but also to accelerate object
manipulations that are memory-bound
(http://www.blosc.org/docs/StarvingCPUs.pdf).  See
http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on
how much speed it can achieve in some datasets.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library.

There is also a handy command line and Python library for Blosc called
Bloscpack (https://github.com/Blosc/bloscpack) that allows you to
compress large binary datafiles on-disk.


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the python- prefix


Download sources


The sources are managed through github services at:

http://github.com/Blosc/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://python-blosc.blosc.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/Blosc/python-blosc/blob/master/LICENSES

for more details.



   **Enjoy data!**

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [CORRECTION] python-blosc 1.2.4 released (Was: ANN: python-blosc 1.2.7 released)

2014-07-07 Thread Francesc Alted
Indeed it was 1.2.4 the version just released and not 1.2.7.  Sorry for 
the typo!

Francesc

On 7/7/14, 8:20 PM, Francesc Alted wrote:
 =
 Announcing python-blosc 1.2.4
 =

 What is new?
 

 This is a maintenance release, where included c-blosc sources have been
 updated to 1.4.0.  This adds support for non-Intel architectures, most
 specially those not supporting unaligned access.

 For more info, you can have a look at the release notes in:

 https://github.com/Blosc/python-blosc/wiki/Release-notes

 More docs and examples are available in the documentation site:

 http://python-blosc.blosc.org


 What is it?
 ===

 Blosc (http://www.blosc.org) is a high performance compressor
 optimized for binary data.  It has been designed to transmit data to
 the processor cache faster than the traditional, non-compressed,
 direct memory fetch approach via a memcpy() OS call.

 Blosc is the first compressor that is meant not only to reduce the size
 of large datasets on-disk or in-memory, but also to accelerate object
 manipulations that are memory-bound
 (http://www.blosc.org/docs/StarvingCPUs.pdf).  See
 http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on
 how much speed it can achieve in some datasets.

 Blosc works well for compressing numerical arrays that contains data
 with relatively low entropy, like sparse data, time series, grids with
 regular-spaced values, etc.

 python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
 the Blosc compression library.

 There is also a handy command line and Python library for Blosc called
 Bloscpack (https://github.com/Blosc/bloscpack) that allows you to
 compress large binary datafiles on-disk.


 Installing
 ==

 python-blosc is in PyPI repository, so installing it is easy:

 $ pip install -U blosc  # yes, you should omit the python- prefix


 Download sources
 

 The sources are managed through github services at:

 http://github.com/Blosc/python-blosc


 Documentation
 =

 There is Sphinx-based documentation site at:

 http://python-blosc.blosc.org/


 Mailing list
 

 There is an official mailing list for Blosc at:

 bl...@googlegroups.com
 http://groups.google.es/group/blosc


 Licenses
 

 Both Blosc and its Python wrapper are distributed using the MIT license.
 See:

 https://github.com/Blosc/python-blosc/blob/master/LICENSES

 for more details.

 

   **Enjoy data!**



-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] IDL vs Python parallel computing

2014-05-05 Thread Francesc Alted
On 5/3/14, 11:56 PM, Siegfried Gonzi wrote:
 Hi all

 I noticed IDL uses at least 400% (4 processors or cores) out of the box
 for simple things like reading and processing files, calculating the
 mean etc.

 I have never seen this happening with numpy except for the linalgebra
 stuff (e.g lapack).

Well, this might be because it is the place where using several 
processes makes more sense.  Normally, when you are reading files, the 
bottleneck is the I/O subsystem (at least if you don't have to convert 
from text to numbers), and for calculating the mean, normally the 
bottleneck is memory throughput.

Having said this, there are several packages that work on top of NumPy 
that can use multiple cores when performing numpy operations, like 
numexpr (https://github.com/pydata/numexpr), or Theano 
(http://deeplearning.net/software/theano/tutorial/multi_cores.html)

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-19 Thread Francesc Alted
El 18/04/14 13:39, Francesc Alted ha escrit:
 So, sqrt in numpy has barely the same speed than the one in MKL. 
 Again, I wonder why :)

So by peeking into the code I have seen that you implemented sqrt using 
SSE2 intrinsics.  Cool!

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-18 Thread Francesc Alted
El 17/04/14 21:19, Julian Taylor ha escrit:
 On 17.04.2014 20:30, Francesc Alted wrote:
 El 17/04/14 19:28, Julian Taylor ha escrit:
 On 17.04.2014 18:06, Francesc Alted wrote:

 In [4]: x_unaligned = np.zeros(shape,
 dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x']
 on arrays of this size you won't see alignment issues you are dominated
 by memory bandwidth. If at all you will only see it if the data fits
 into the cache.
 Its also about unaligned to simd vectors not unaligned to basic types.
 But it doesn't matter anymore on modern x86 cpus. I guess for array data
 cache line splits should also not be a big concern.
 Yes, that was my point, that in x86 CPUs this is not such a big
 problem.  But still a factor of 2 is significant, even for CPU-intensive
 tasks.  For example, computing sin() is affected similarly (sin() is
 using SIMD, right?):

 In [6]: shape = (1, 1)

 In [7]: x_aligned = np.zeros(shape,
 dtype=[('x',np.float64),('y',np.int64)])['x']

 In [8]: x_unaligned = np.zeros(shape,
 dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x']

 In [9]: %timeit res = np.sin(x_aligned)
 1 loops, best of 3: 654 ms per loop

 In [10]: %timeit res = np.sin(x_unaligned)
 1 loops, best of 3: 1.08 s per loop

 and again, numexpr can deal with that pretty well (using 8 physical
 cores here):

 In [6]: %timeit res = ne.evaluate('sin(x_aligned)')
 10 loops, best of 3: 149 ms per loop

 In [7]: %timeit res = ne.evaluate('sin(x_unaligned)')
 10 loops, best of 3: 151 ms per loop
 in this case the unaligned triggers a strided memcpy calling loop to
 copy the data into a aligned buffer which is terrible for performance,
 even compared to the expensive sin call.
 numexpr handles this well as it allows the compiler to replace the
 memcpy with inline assembly (a mov instruction).
 We could fix that in numpy, though I don't consider it very important,
 you usually always have base type aligned memory.

Well, that *could* be important for evaluating conditions in structured 
arrays, as it is pretty easy to get unaligned 'columns'. But apparently 
this does not affect very much to numpy:

In [23]: na_aligned = np.fromiter(((, i, i*2) for i in xrange(N)), 
dtype=S16,i4,i8)

In [24]: na_unaligned = np.fromiter(((, i, i*2) for i in xrange(N)), 
dtype=S15,i4,i8)

In [25]: %time sum((r['f1'] for r in na_aligned[na_aligned['f2']  10]))
CPU times: user 10.2 s, sys: 93 ms, total: 10.3 s
Wall time: 10.3 s
Out[25]: 499485

In [26]: %time sum((r['f1'] for r in na_unaligned[na_unaligned['f2']  10]))
CPU times: user 10.2 s, sys: 82 ms, total: 10.3 s
Wall time: 10.3 s
Out[26]: 499485

probably because the bottleneck is in another place.  So yeah, probably 
not worth to worry about that.


 (sin is not a SIMD using function unless you use a vector math library
 not supported by numpy directly yet)

Ah, so MKL is making use of SIMD for computing the sin(), but not in 
general.  But you later said that numpy's sqrt *is* making use of SIMD.  
I wonder why.



 Aligned allocators are not the only allocator which might be useful in
 numpy. Modern CPUs also support larger pages than 4K (huge pages up to
 1GB in size) which reduces TLB cache misses. Memory of this type
 typically needs to be allocated with special mmap flags, though newer
 kernel versions can now also provide this memory to transparent
 anonymous pages (normal non-file mmaps).
 That's interesting.  In which scenarios do you think that could improve
 performance?
 it might improve all numpy operations dealing with big arrays.
 big arrays trigger many large temporaries meaning glibc uses mmap
 meaning lots of moving of address space between the kernel and userspace.
 but I haven't benchmarked it, so it could also be completely irrelevant.

I was curious about this and apparently the speedups that typically 
bring large page caches is around 5%:

http://stackoverflow.com/questions/14275170/performance-degradation-with-large-pages

not a big deal, but it is something.


 Also memory fragments really fast, so after a few hours of operation you
 often can't allocate any huge pages anymore, so you need to reserve
 space for them which requires special setup of machines.

 Another possibility for special allocators are numa allocators that
 ensure you get memory local to a specific compute node regardless of the
 system numa policy.
 But again its probably not very important as python has poor thread
 scalability anyway, these are just examples for keeping flexibility of
 our allocators in numpy and not binding us to what python does.

Agreed.

 That's smart.  Yeah, I don't see a reason why numexpr would be
 performing badly on Ubuntu.  But I am not getting your performance for
 blocked_thread on my AMI linux vbox:

 http://nbviewer.ipython.org/gist/anonymous/11000524

 my numexpr amd64 package does not make use of SIMD e.g. sqrt which is
 vectorized in numpy:

 numexpr:
1.30 │ 4638:   sqrtss (%r14),%xmm0
0.01 │ ucomis

Re: [Numpy-discussion] About the npz format

2014-04-18 Thread Francesc Alted
El 18/04/14 13:01, Valentin Haenel ha escrit:
 Hi again,

 * onefire onefire.mys...@gmail.com [2014-04-18]:
 I think your workaround might help, but a better solution would be to not
 use Python's zipfile module at all. This would make it possible to, say,
 let the user choose the checksum algorithm or to turn that off.
 Or maybe the compression stuff makes this route too complicated to be worth
 the trouble? (after all, the zip format is not that hard to understand)
 Just to give you an idea of what my aforementioned Bloscpack library can
 do in the case of linspace:

 In [1]: import numpy as np

 In [2]: import bloscpack as bp

 In [3]: import bloscpack.sysutil as bps

 In [4]: x = np.linspace(1, 10, 5000)

 In [5]: %timeit np.save(x.npy, x) ; bps.sync()
 1 loops, best of 3: 2.12 s per loop

 In [6]: %timeit bp.pack_ndarray_file(x, 'x.blp') ; bps.sync()
 1 loops, best of 3: 627 ms per loop

 In [7]: %timeit -n 3 -r 3 np.save(x.npy, x) ; bps.sync()
 3 loops, best of 3: 1.92 s per loop

 In [8]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x.blp') ; bps.sync()
 3 loops, best of 3: 564 ms per loop

 In [9]: ls -lah x.npy x.blp
 -rw-r--r-- 1 root root  49M Apr 18 12:53 x.blp
 -rw-r--r-- 1 root root 382M Apr 18 12:52 x.npy

 However, this is a bit of special case, since Blosc does extremely well
 -- both speed and size wise -- on the linspace data, your milage may
 vary.

Exactly, and besides, Blosc can use different codes inside it.  Just for 
completeness, here it is a small benchmark of what you can expect from 
them (my laptop does not have a SSD, so my figures are a bit slow 
compared with Valentin's):

In [50]: %timeit -n 3 -r 3 np.save(x.npy, x) ; bps.sync()
3 loops, best of 3: 5.7 s per loop

In [51]: cargs = bp.args.DEFAULT_BLOSC_ARGS

In [52]: cargs['cname'] = 'blosclz'

In [53]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-blosclz.blp', 
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 1.12 s per loop

In [54]: cargs['cname'] = 'lz4'

In [55]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-lz4.blp', 
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 985 ms per loop

In [56]: cargs['cname'] = 'lz4hc'

In [57]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-lz4hc.blp', 
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 1.95 s per loop

In [58]: cargs['cname'] = 'snappy'

In [59]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-snappy.blp', 
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 1.11 s per loop

In [60]: cargs['cname'] = 'zlib'

In [61]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-zlib.blp', 
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 3.12 s per loop

so all the codecs can make the storage go faster than a pure np.save(), 
and most specially blosclz, lz4 and snappy.  However, lz4hc and zlib 
achieve the best compression ratios:

In [62]: ls -lht x*.*
-rw-r--r-- 1 faltet users 7,0M 18 abr 13:49 x-zlib.blp
-rw-r--r-- 1 faltet users  54M 18 abr 13:48 x-snappy.blp
-rw-r--r-- 1 faltet users 7,0M 18 abr 13:48 x-lz4hc.blp
-rw-r--r-- 1 faltet users  48M 18 abr 13:47 x-lz4.blp
-rw-r--r-- 1 faltet users  49M 18 abr 13:47 x-blosclz.blp
-rw-r--r-- 1 faltet users 382M 18 abr 13:42 x.npy

But again, we are talking about a specially nice compression case.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-17 Thread Francesc Alted
Uh, 15x slower for unaligned access is quite a lot.  But Intel (and AMD) 
arquitectures are much more tolerant in this aspect (and improving).  
For example, with a Xeon(R) CPU E5-2670 (2 years old) I get:


In [1]: import numpy as np

In [2]: shape = (1, 1)

In [3]: x_aligned = np.zeros(shape, 
dtype=[('x',np.float64),('y',np.int64)])['x']


In [4]: x_unaligned = np.zeros(shape, 
dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x']


In [5]: %timeit res = x_aligned ** 2
1 loops, best of 3: 289 ms per loop

In [6]: %timeit res = x_unaligned ** 2
1 loops, best of 3: 664 ms per loop

so the added cost in this case is just a bit more than 2x.  But you can 
also alleviate this overhead if you do a copy that fits in cache prior 
to do computations.  numexpr does this:


https://github.com/pydata/numexpr/blob/master/numexpr/interp_body.cpp#L203

and the results are pretty good:

In [8]: import numexpr as ne

In [9]: %timeit res = ne.evaluate('x_aligned ** 2')
10 loops, best of 3: 133 ms per loop

In [10]: %timeit res = ne.evaluate('x_unaligned ** 2')
10 loops, best of 3: 134 ms per loop

i.e. there is not a significant difference between aligned and unaligned 
access to data.


I wonder if the same technique could be applied to NumPy.

Francesc


El 17/04/14 16:26, Aron Ahmadia ha escrit:

Hmnn, I wasn't being clear :)

The default malloc on BlueGene/Q only returns 8 byte alignment, but 
the SIMD units need 32-byte alignment for loads, stores, and 
operations or performance suffers.  On the /P the required alignment 
was 16-bytes, but malloc only gave you 8, and trying to perform 
vectorized loads/stores generated alignment exceptions on unaligned 
memory.


See https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q and 
https://computing.llnl.gov/tutorials/bgp/BGP-usage.Walkup.pdf (slides 
14 for overview, 15 for the effective performance difference between 
the unaligned/aligned code) for some notes on this.


A




On Thu, Apr 17, 2014 at 10:18 AM, Nathaniel Smith n...@pobox.com 
mailto:n...@pobox.com wrote:


On 17 Apr 2014 15:09, Aron Ahmadia a...@ahmadia.net
mailto:a...@ahmadia.net wrote:

  On the one hand it would be nice to actually know whether
posix_memalign is important, before making api decisions on this
basis.

 FWIW: On the lightweight IBM cores that the extremely popular
BlueGene machines were based on, accessing unaligned memory raised
system faults.  The default behavior of these machines was to
terminate the program if more than 1000 such errors occurred on a
given process, and an environment variable allowed you to
terminate the program if *any* unaligned memory access occurred.
 This is because unaligned memory accesses were 15x (or more)
slower than aligned memory access.

 The newer /Q chips seem to be a little more forgiving of this,
but I think one can in general expect allocated memory alignment
to be an important performance technique for future high
performance computing architectures.

Right, this much is true on lots of architectures, and so malloc
is careful to always return values with sufficient alignment (e.g.
8 bytes) to make sure that any standard operation can succeed.

The question here is whether it will be important to have *even
more* alignment than malloc gives us by default. A 16 or 32 byte
wide SIMD instruction might prefer that data have 16 or 32 byte
alignment, even if normal memory access for the types being
operated on only requires 4 or 8 byte alignment.

-n


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



--
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-17 Thread Francesc Alted
El 17/04/14 19:28, Julian Taylor ha escrit:
 On 17.04.2014 18:06, Francesc Alted wrote:

 In [4]: x_unaligned = np.zeros(shape,
 dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x']
 on arrays of this size you won't see alignment issues you are dominated
 by memory bandwidth. If at all you will only see it if the data fits
 into the cache.
 Its also about unaligned to simd vectors not unaligned to basic types.
 But it doesn't matter anymore on modern x86 cpus. I guess for array data
 cache line splits should also not be a big concern.

Yes, that was my point, that in x86 CPUs this is not such a big 
problem.  But still a factor of 2 is significant, even for CPU-intensive 
tasks.  For example, computing sin() is affected similarly (sin() is 
using SIMD, right?):

In [6]: shape = (1, 1)

In [7]: x_aligned = np.zeros(shape, 
dtype=[('x',np.float64),('y',np.int64)])['x']

In [8]: x_unaligned = np.zeros(shape, 
dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x']

In [9]: %timeit res = np.sin(x_aligned)
1 loops, best of 3: 654 ms per loop

In [10]: %timeit res = np.sin(x_unaligned)
1 loops, best of 3: 1.08 s per loop

and again, numexpr can deal with that pretty well (using 8 physical 
cores here):

In [6]: %timeit res = ne.evaluate('sin(x_aligned)')
10 loops, best of 3: 149 ms per loop

In [7]: %timeit res = ne.evaluate('sin(x_unaligned)')
10 loops, best of 3: 151 ms per loop


 Aligned allocators are not the only allocator which might be useful in
 numpy. Modern CPUs also support larger pages than 4K (huge pages up to
 1GB in size) which reduces TLB cache misses. Memory of this type
 typically needs to be allocated with special mmap flags, though newer
 kernel versions can now also provide this memory to transparent
 anonymous pages (normal non-file mmaps).

That's interesting.  In which scenarios do you think that could improve 
performance?

 In [8]: import numexpr as ne

 In [9]: %timeit res = ne.evaluate('x_aligned ** 2')
 10 loops, best of 3: 133 ms per loop

 In [10]: %timeit res = ne.evaluate('x_unaligned ** 2')
 10 loops, best of 3: 134 ms per loop

 i.e. there is not a significant difference between aligned and unaligned
 access to data.

 I wonder if the same technique could be applied to NumPy.

 you already can do so with relatively simple means:
 http://nbviewer.ipython.org/gist/anonymous/10942132

 If you change the blocking function to get a function as input and use
 inplace operations numpy can even beat numexpr (though I used the
 numexpr Ubuntu package which might not be compiled optimally)
 This type of transformation can probably be applied on the AST quite easily.

That's smart.  Yeah, I don't see a reason why numexpr would be 
performing badly on Ubuntu.  But I am not getting your performance for 
blocked_thread on my AMI linux vbox:

http://nbviewer.ipython.org/gist/anonymous/11000524

oh well, threads are always tricky beasts.  By the way, apparently the 
optimal block size for my machine is something like 1 MB, not 128 KB, 
although the difference is not big:

http://nbviewer.ipython.org/gist/anonymous/11002751

(thanks to Stefan Van der Walt for the script).

-- Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.4 is out

2014-04-13 Thread Francesc Alted

  Announcing Numexpr 2.4


Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

A new `contains()` function has been added for detecting substrings in
strings.  Only plain strings (bytes) are supported for now (see ticket
#142).  Thanks to Marcin Krol.

You can have a glimpse on how `contains()` works in this notebook:

http://nbviewer.ipython.org/gist/FrancescAlted/10595974

where it can be seen that this can make substring searches more
than 10x faster than with regular Python.

You can find the source for the notebook here:

https://github.com/FrancescAlted/ngrams

Also, there is a new version of setup.py that allows better management
of the NumPy dependency during pip installs.  Thanks to Aleks Bunin.

Windows related bugs have been addressed and (hopefully) squashed.
Thanks to Christoph Gohlke.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PEP 465 has been accepted / volunteers needed

2014-04-10 Thread Francesc Alted
On 4/9/14, 10:46 PM, Chris Barker wrote:
 On Tue, Apr 8, 2014 at 11:14 AM, Nathaniel Smith n...@pobox.com 
 mailto:n...@pobox.com wrote:

 Thank you! Though I suspect that the most important part of my
 contribution may have just been my high tolerance for writing emails
 ;-).


 no -- it's your high tolerance for _reading_ emails...

 Far too many of us have a high tolerance for writing them!

Ha ha, very true!

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.4 RC2

2014-04-07 Thread Francesc Alted
===
  Announcing Numexpr 2.4 RC2
===

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

A new `contains()` function has been added for detecting substrings in
strings.  Only plain strings (bytes) are supported for now (see ticket
#142).  Thanks to Marcin Krol.

Also, there is a new version of setup.py that allows better management
of the NumPy dependency during pip installs.  Thanks to Aleks Bunin.

Windows related bugs have been addressed and (hopefully) squashed.
Thanks to Christoph Gohlke.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.4 RC1

2014-04-06 Thread Francesc Alted
===
  Announcing Numexpr 2.4 RC1
===

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

A new `contains()` function has been added for detecting substrings in
strings.  Thanks to Marcin Krol.

Also, there is a new version of setup.py that allows better management
of the NumPy dependency during pip installs.  Thanks to Aleks Bunin.

This is the first release candidate before 2.4 final would be out,
so please give it a go and report back any problems you may have.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] last call for fixes for numpy 1.8.1rc1

2014-02-28 Thread Francesc Alted
Hi Julian,

Any chance that NPY_MAXARGS could be increased to something more than 
the current value of 32?  There is a discussion about this in:

https://github.com/numpy/numpy/pull/226

but I think that, as Charles was suggesting, just increasing NPY_MAXARGS 
to something more reasonable (say 256) should be enough for a long while.

This issue limits quite a bit the number of operands in numexpr 
expressions, and hence, to other projects that depends on it, like 
PyTables or pandas.  See for example this bug report:

https://github.com/PyTables/PyTables/issues/286

Thanks,
Francesc

On 2/27/14, 9:05 PM, Julian Taylor wrote:
 hi,

 We want to start preparing the release candidate for the bugfix release
 1.8.1rc1 this weekend, I'll start preparing the changelog tomorrow.

 So if you want a certain issue fixed please scream now or better create
 a pull request/patch on the maintenance/1.8.x branch.
 Please only consider bugfixes, no enhancements (unless they are really
 really simple), new features or invasive changes.

 I just finished my list of issues I want backported to numpy 1.8
 (gh-4390, gh-4388). Please check if its already included in these PRs.
 I'm probably still going to add gh-4284 after some though tomorrow.

 Cheers,
 Julian
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] last call for fixes for numpy 1.8.1rc1

2014-02-28 Thread Francesc Alted
Well, what numexpr is using is basically NpyIter_AdvancedNew:

https://github.com/pydata/numexpr/blob/master/numexpr/interpreter.cpp#L1178

and nothing else.  If NPY_MAXARGS could be increased just for that, and 
without ABI breaking, then fine.  If not, we should have to wait until 
1.9 I am afraid.

On the other hand, increasing the temporary arrays in nditer from 32kb 
to 128kb is a bit worrying, but probably we should do some benchmarks 
and see how much performance would be compromised (if any).

Francesc

On 2/28/14, 1:09 PM, Julian Taylor wrote:
 hm increasing it for PyArrayMapIterObject would break the public ABI.
 While nobody should be using this part of the ABI its not appropriate
 for a bugfix release.
 Note that as it currently stands in numpy 1.9.dev we will break this ABI
 for the indexing improvements.

 Though for nditer and some other functions we could change it if thats
 enough.
 It would bump some temporary arrays of nditer from 32kb to 128kb, I
 think that would still be fine, but getting to the point where we should
 move them onto the heap.

 On 28.02.2014 12:41, Francesc Alted wrote:
 Hi Julian,

 Any chance that NPY_MAXARGS could be increased to something more than
 the current value of 32?  There is a discussion about this in:

 https://github.com/numpy/numpy/pull/226

 but I think that, as Charles was suggesting, just increasing NPY_MAXARGS
 to something more reasonable (say 256) should be enough for a long while.

 This issue limits quite a bit the number of operands in numexpr
 expressions, and hence, to other projects that depends on it, like
 PyTables or pandas.  See for example this bug report:

 https://github.com/PyTables/PyTables/issues/286

 Thanks,
 Francesc

 On 2/27/14, 9:05 PM, Julian Taylor wrote:
 hi,

 We want to start preparing the release candidate for the bugfix release
 1.8.1rc1 this weekend, I'll start preparing the changelog tomorrow.

 So if you want a certain issue fixed please scream now or better create
 a pull request/patch on the maintenance/1.8.x branch.
 Please only consider bugfixes, no enhancements (unless they are really
 really simple), new features or invasive changes.

 I just finished my list of issues I want backported to numpy 1.8
 (gh-4390, gh-4388). Please check if its already included in these PRs.
 I'm probably still going to add gh-4284 after some though tomorrow.

 Cheers,
 Julian
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] last call for fixes for numpy 1.8.1rc1

2014-02-28 Thread Francesc Alted
On 2/28/14, 3:00 PM, Charles R Harris wrote:



 On Fri, Feb 28, 2014 at 5:52 AM, Julian Taylor 
 jtaylor.deb...@googlemail.com mailto:jtaylor.deb...@googlemail.com 
 wrote:

 performance should not be impacted as long as we stay on the stack, it
 just increases offset of a stack pointer a bit more.
 E.g. nditer and einsum use temporary stack arrays of this type for its
 initialization:
 op_axes_arrays[NPY_MAXARGS][NPY_MAXDIMS]; // both 32 currently
 The resulting nditer structure is then in heap space and dependent on
 the real amount of arguments it got.
 So I'm more worried about running out of stack space, though the limit
 is usually 8mb so taking 128kb for a short while should be ok.

 On 28.02.2014 13:32, Francesc Alted wrote:
  Well, what numexpr is using is basically NpyIter_AdvancedNew:
 
 
 
 https://github.com/pydata/numexpr/blob/master/numexpr/interpreter.cpp#L1178
 
  and nothing else.  If NPY_MAXARGS could be increased just for
 that, and
  without ABI breaking, then fine.  If not, we should have to wait
 until
  1.9 I am afraid.
 
  On the other hand, increasing the temporary arrays in nditer
 from 32kb
  to 128kb is a bit worrying, but probably we should do some
 benchmarks
  and see how much performance would be compromised (if any).
 
  Francesc
 
  On 2/28/14, 1:09 PM, Julian Taylor wrote:
  hm increasing it for PyArrayMapIterObject would break the
 public ABI.
  While nobody should be using this part of the ABI its not
 appropriate
  for a bugfix release.
  Note that as it currently stands in numpy 1.9.dev we will break
 this ABI
  for the indexing improvements.
 
  Though for nditer and some other functions we could change it
 if thats
  enough.
  It would bump some temporary arrays of nditer from 32kb to 128kb, I
  think that would still be fine, but getting to the point where
 we should
  move them onto the heap.


 These sort of changes can have subtle side effects and need lots of 
 testing in a release cycle. Bugfix release cycles are kept short by 
 restricting changes to those that look simple and safe.

Agreed.  I have just opened a ticket for having this in mind for NumPy 1.9:

https://github.com/numpy/numpy/issues/4398

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.3.1 released

2014-02-18 Thread Francesc Alted
==
  Announcing Numexpr 2.3.1
==

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

* Added support for shift-left () and shift-right () binary operators.
   See PR #131. Thanks to fish2000!

* Removed the rpath flag for the GCC linker, because it is probably
   not necessary and it chokes to clang.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-17 Thread Francesc Alted
On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote:
 On Sun, Feb 16, 2014 at 6:12 PM, Daπid davidmen...@gmail.com wrote:
 On 16 February 2014 23:43, josef.p...@gmail.com wrote:
 What's the fastest argsort for a 1d array with around 28 Million
 elements, roughly uniformly distributed, random order?

 On numpy latest version:

 for kind in ['quicksort', 'mergesort', 'heapsort']:
  print kind
  %timeit np.sort(data, kind=kind)
  %timeit np.argsort(data, kind=kind)


 quicksort
 1 loops, best of 3: 3.55 s per loop
 1 loops, best of 3: 10.3 s per loop
 mergesort
 1 loops, best of 3: 4.84 s per loop
 1 loops, best of 3: 9.49 s per loop
 heapsort
 1 loops, best of 3: 12.1 s per loop
 1 loops, best of 3: 39.3 s per loop


 It looks quicksort is quicker sorting, but mergesort is marginally faster
 sorting args. The diference is slim, but upon repetition, it remains
 significant.

 Why is that? Probably part of the reason is what Eelco said, and part is
 that in sort comparison are done accessing the array elements directly, but
 in argsort you have to index the array, introducing some overhead.
 Thanks, both.

 I also gain a second with mergesort.

 matlab would be nicer in my case, it returns both.
 I still need to use the argsort to index into the array to also get
 the sorted array.

Many years ago I needed something similar, so I made some functions for 
sorting and argsorting in one single shot.  Maybe you want to reuse 
them.  Here it is an example of the C implementation:

https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619

and here the Cython wrapper for all of them:

https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129

Francesc


 Josef


 I seem unable to find the code for ndarray.sort, so I can't check. I have
 tried to grep it tring all possible combinations of def ndarray,
 self.sort, etc. Where is it?


 /David.


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.3 (final) released

2014-01-25 Thread Francesc Alted
==
  Announcing Numexpr 2.3
==

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.
Numexpr is already being used in a series of packages (PyTables, pandas,
BLZ...) for helping doing computations faster.


What's new
==

The repository has been migrated to https://github.com/pydata/numexpr.
All new tickets and PR should be directed there.

Also, a `conj()` function for computing the conjugate of complex arrays 
has been added.
Thanks to David Menéndez.  See PR #125.

Finallly, we fixed a DeprecationWarning derived of using ``oa_ndim ==
0`` and ``op_axes == NULL`` with `NpyIter_AdvancedNew()` and
NumPy 1.8.  Thanks to Mark Wiebe for advise on how to fix this
properly.

Many thanks to Christoph Gohlke and Ilan Schnell for his help during
the testing of this release in all kinds of possible combinations of
platforms and MKL.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.2.0 released

2014-01-25 Thread Francesc Alted
 services at:

http://github.com/ContinuumIO/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://blosc.pydata.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/ContinuumIO/python-blosc/blob/master/LICENSES

for more details.

--
Francesc Alted
Continuum Analytics, Inc.


-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: BLZ 0.6.1 has been released

2014-01-25 Thread Francesc Alted
Announcing BLZ 0.6 series
=

What it is
--

BLZ is a chunked container for numerical data.  Chunking allows for
efficient enlarging/shrinking of data container.  In addition, it can
also be compressed for reducing memory/disk needs.  The compression
process is carried out internally by Blosc, a high-performance
compressor that is optimized for binary data.

The main objects in BLZ are `barray` and `btable`.  `barray` is meant
for storing multidimensional homogeneous datasets efficiently.
`barray` objects provide the foundations for building `btable`
objects, where each column is made of a single `barray`.  Facilities
are provided for iterating, filtering and querying `btables` in an
efficient way.  You can find more info about `barray` and `btable` in
the tutorial:

http://blz.pydata.org/blz-manual/tutorial.html

BLZ can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too)
either from memory or from disk.  In the future, it is planned to use
Numba as the computational kernel and to provide better Blaze
(http://blaze.pydata.org) integration.


What's new
--

BLZ has been branched off from the Blaze project
(http://blaze.pydata.org).  BLZ was meant as a persistent format and
library for I/O in Blaze.  BLZ in Blaze is based on previous carray
0.5 and this is why this new version is labeled 0.6.

BLZ supports completely transparent storage on-disk in addition to
memory.  That means that *everything* that can be done with the
in-memory container can be done using the disk as well.

The advantages of a disk-based container is that the addressable space
is much larger than just your available memory.  Also, as BLZ is based
on a chunked and compressed data layout based on the super-fast Blosc
compression library, the data access speed is very good.

The format chosen for the persistence layer is based on the
'bloscpack' library and described in the Persistent format for BLZ
chapter of the user manual ('docs/source/persistence-format.rst').
More about Bloscpack here: https://github.com/esc/bloscpack

You may want to know more about BLZ in this blog entry:
http://continuum.io/blog/blz-format

In this version, support for Blosc 1.3 has been added, that meaning
that a new `cname` parameter has been added to the `bparams` class, so
that you can select you preferred compressor from 'blosclz', 'lz4',
'lz4hc', 'snappy' and 'zlib'.

Also, many bugs have been fixed, providing a much smoother experience.

CAVEAT: The BLZ/bloscpack format is still evolving, so don't trust on
forward compatibility of the format, at least until 1.0, where the
internal format will be declared frozen.


Resources
-

Visit the main BLZ site repository at:
http://github.com/ContinuumIO/blz

Read the online docs at:
http://blz.pydata.org/blz-manual/index.html

Home of Blosc compressor:
http://www.blosc.org

User's mail list:
blaze-...@continuum.io



Enjoy!

Francesc Alted
Continuum Analytics, Inc.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Francesc Alted

Yeah, numexpr is pretty cool for avoiding temporaries in an easy way:

https://github.com/pydata/numexpr

Francesc

El 24/01/14 16:30, Nathaniel Smith ha escrit:


There is no reliable way to predict how much memory an arbitrary numpy 
operation will need, no. However, in most cases the main memory cost 
will be simply the need to store the input and output arrays; for 
large arrays, all other allocations should be negligible.


The most effective way to avoid running out of memory, therefore, is 
to avoid creating temporary arrays, by using only in-place operations.


E.g., if a and b each require N bytes of ram, then memory requirements 
(roughly).


c = a + b: 3N
c = a + 2*b: 4N
a += b: 2N
np.add(a, b, out=a): 2N
b *= 2; a += b: 2N

Note that simply loading a and b requires 2N memory, so the latter 
code samples are near-optimal.


Of course some calculations do require the use of temporary storage 
space...


-n

On 24 Jan 2014 15:19, Dinesh Vadhia dineshbvad...@hotmail.com 
mailto:dineshbvad...@hotmail.com wrote:


I want to write a general exception handler to warn if too much
data is being loaded for the ram size in a machine for a
successful numpy array operation to take place.  For example, the
program multiplies two floating point arrays A and B which are
populated with loadtext.  While the data is being loaded, want to
continuously check that the data volume doesn't pass a threshold
that will cause on out-of-memory error during the A*B operation.
The known variables are the amount of memory available, data type
(floats in this case) and the numpy array operation to be
performed. It seems this requires knowledge of the internal memory
requirements of each numpy operation.  For sake of simplicity, can
ignore other memory needs of program.  Is this possible?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



--
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] -ffast-math

2013-12-03 Thread Francesc Alted
On 12/2/13, 12:14 AM, Dan Goodman wrote:
 Dan Goodman dg.gmane at thesamovar.net writes:
 ...
 I got around 5x slower. Using numexpr 'dumbly' (i.e. just putting the
 expression in directly) was slower than the function above, but doing a
 hybrid between the two approaches worked well:

 def timefunc_numexpr_smart():
  _sin_term = sin(2.0*freq*pi*t)
  _exp_term = exp(-dt/tau)
  _a_term = (_sin_term-_sin_term*_exp_term)
  _const_term = -b*_exp_term + b
  v[:] = numexpr.evaluate('a*_a_term+v*_exp_term+_const_term')
  #numexpr.evaluate('a*_a_term+v*_exp_term+_const_term', out=v)

 This was about 3.5x slower than weave. If I used the commented out final
 line then it was only 1.5x slower than weave, but it also gives wrong
 results. I reported this as a bug in numexpr a long time ago but I guess it
 hasn't been fixed yet (or maybe I didn't upgrade my version recently).
 I just upgraded numexpr to 2.2 where they did fix this bug, and now the
 'smart' numexpr version runs exactly as fast as weave (so I guess there were
 some performance enhancements in numexpr as well).

Err no, there have not been performance improvements in numexpr since 
2.0 (that I am aware of).  Maybe you are running in a multi-core machine 
now and you are seeing better speedup because of this?  Also, your 
expressions are made of transcendental functions, so linking numexpr 
with MKL could accelerate computations a good deal too.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [ANN] numexpr 2.2 released

2013-08-31 Thread Francesc Alted
==
 Announcing Numexpr 2.2
==

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
VML library (included in Intel MKL), which allows an extremely fast
evaluation of transcendental functions (sin, cos, tan, exp, log...)
while squeezing the last drop of performance out of your multi-core
processors.

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational kernel for projects that
don't want to adopt other solutions that require more heavy
dependencies.

What's new
==

This release is mainly meant to fix a problem with the license the
numexpr/win32/pthread.{c,h} files emulating pthreads on Windows. After
persmission from the original authors is granted, these files adopt
the MIT license and can be redistributed without problems.  See issue
#109 for details
(https://code.google.com/p/numexpr/issues/detail?id=110).

Another important improvement is the new algorithm to decide the initial
number of threads to be used.  This was necessary because by default,
numexpr was using a number of threads equal to the detected number of
cores, and this can be just too much for moder systems where this
number can be too high (and counterporductive for performance in many
cases).  Now, the 'NUMEXPR_NUM_THREADS' environment variable is
honored, and in case this is not present, a maximum number of *8*
threads are setup initially.  The new algorithm is fully described in
the Users Guide, in the note of 'General routines' section:
https://code.google.com/p/numexpr/wiki/UsersGuide#General_routines.
Closes #110.

In case you want to know more in detail what has changed in this
version, see:

http://code.google.com/p/numexpr/wiki/ReleaseNotes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

You can get the packages from PyPI as well:

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] RAM problem during code execution - Numpya arrays

2013-08-23 Thread Francesc Alted
(std_dev_size_medio_nuevo)/numero_experimentos)

 tiempos=np.append(tiempos, time.clock()-empieza)

 componente_y=np.append(componente_y, sum(comp_y)/numero_experimentos)
 componente_x=np.append(componente_x, sum(comp_x)/numero_experimentos)

 anisotropia_macroscopica_porcentual=100*(1-(componente_y/componente_x))

 I tryed with gc and gc.collect() and 'del'command for deleting arrays
 after his use and nothing work!

 What am I doing wrong? Why the memory becomes full while running (starts
 with 10% of RAM used and in 1-2hour is totally full used)?

 Please help me, I'm totally stuck!
 Thanks a lot!

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.1 (final) released

2013-05-24 Thread Francesc Alted
===
Announcing python-blosc 1.1
===

What is it?
===

python-blosc (http://blosc.pydata.org/) is a Python wrapper for the
Blosc compression library.

Blosc (http://blosc.org) is a high performance compressor optimized for
binary data.  It has been designed to transmit data to the processor
cache faster than the traditional, non-compressed, direct memory fetch
approach via a memcpy() OS call.  Whether this is achieved or not
depends of the data compressibility, the number of cores in the system,
and other factors.  See a series of benchmarks conducted for many
different systems: http://blosc.org/trac/wiki/SyntheticBenchmarks.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

There is also a handy command line for Blosc called Bloscpack
(https://github.com/esc/bloscpack) that allows you to compress large
binary datafiles on-disk.  Although the format for Bloscpack has not
stabilized yet, it allows you to effectively use Blosc from your
favorite shell.


What is new?


- Added new `compress_ptr` and `decompress_ptr` functions that allows to
   compress and decompress from/to a data pointer, avoiding an
   itermediate copy for maximum speed.  Be careful, as these are low
   level calls, and user must make sure that the pointer data area is
   safe.

- Since Blosc (the C library) already supports to be installed as an
   standalone library (via cmake), it is also possible to link
   python-blosc against a system Blosc library.

- The Python calls to Blosc are now thread-safe (another consequence of
   recent Blosc library supporting this at C level).

- Many checks on types and ranges of values have been added.  Most of
   the calls will now complain when passed the wrong values.

- Docstrings are much improved. Also, Sphinx-based docs are available
   now.

Many thanks to Valentin Hänel for his impressive work for this release.

For more info, you can see the release notes in:

https://github.com/FrancescAlted/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://blosc.pydata.org


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the python- prefix


Download sources


The sources are managed through github services at:

http://github.com/FrancescAlted/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://blosc.pydata.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/FrancescAlted/python-blosc/blob/master/LICENSES

for more details.

Enjoy!

--
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.1 RC1 available for testing

2013-05-17 Thread Francesc Alted

Announcing python-blosc 1.1 RC1


What is it?
===

python-blosc (http://blosc.pydata.org) is a Python wrapper for the
Blosc compression library.

Blosc (http://blosc.org) is a high performance compressor optimized for
binary data.  It has been designed to transmit data to the processor
cache faster than the traditional, non-compressed, direct memory fetch
approach via a memcpy() OS call.  Whether this is achieved or not
depends of the data compressibility, the number of cores in the system,
and other factors.  See a series of benchmarks conducted for many
different systems: http://blosc.org/trac/wiki/SyntheticBenchmarks.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

There is also a handy command line for Blosc called Bloscpack
(https://github.com/esc/bloscpack) that allows you to compress large
binary datafiles on-disk.  Although the format for Bloscpack has not
stabilized yet, it allows you to effectively use Blosc from your
favorite shell.


What is new?


- Added new `compress_ptr` and `decompress_ptr` functions that allows to
   compress and decompress from/to a data pointer.  These are low level
   calls and user must make sure that the pointer data area is safe.

- Since Blosc (the C library) already supports to be installed as an
   standalone library (via cmake), it is also possible to link
   python-blosc against a system Blosc library.

- The Python calls to Blosc are now thread-safe (another consequence of
   recent Blosc library supporting this at C level).

- Many checks on types and ranges of values have been added.  Most of
   the calls will now complain when passed the wrong values.

- Docstrings are much improved. Also, Sphinx-based docs are available
   now.

Many thanks to Valentin Hänel for his impressive work for this release.

For more info, you can see the release notes in:

https://github.com/FrancescAlted/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://blosc.pydata.org


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the blosc- prefix


Download sources


The sources are managed through github services at:

http://github.com/FrancescAlted/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://blosc.pydata.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/FrancescAlted/python-blosc/blob/master/LICENSES

for more details.

--
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Profiling (was GSoC : Performance parity between numpy arrays and Python scalars)

2013-05-02 Thread Francesc Alted
On 5/2/13 3:58 PM, Nathaniel Smith wrote:
 callgrind has the *fabulous* kcachegrind front-end, but it only
 measures memory access performance on a simulated machine, which is
 very useful sometimes (if you're trying to optimize cache locality),
 but there's no guarantee that the bottlenecks on its simulated machine
 are the same as the bottlenecks on your real machine.

Agreed, there is no guarantee, but my experience is that kcachegrind 
normally gives you a pretty decent view of cache faults and hence it can 
do pretty good predictions on how this affects your computations.  I 
have used this feature extensively for optimizing parts of the Blosc 
compressor, and I cannot be more happier (to the point that, if it were 
not for Valgrind, I could not figure out many interesting memory access 
optimizations).

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.1 RC1

2013-04-14 Thread Francesc Alted


 Announcing Numexpr 2.1RC1


Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like 3*a+4*b) are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
VML library, which allows for squeezing the last drop of performance
out of your multi-core processors.

What's new
==

This version adds compatibility for Python 3.  A bunch of thanks to 
Antonio Valentino for his excellent work on this.I apologize for taking 
so long in releasing his contributions.


In case you want to know more in detail what has changed in this
version, see:

http://code.google.com/p/numexpr/wiki/ReleaseNotes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

This is a release candidate 1, so it will not be available on the PyPi 
repository.  I'll post it there when the final version will be released.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy!

--
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] timezones and datetime64

2013-04-04 Thread Francesc Alted
On 4/4/13 1:52 AM, Chris Barker - NOAA Federal wrote:
 Thanks all for taking an interest. I need to think a bot more about
 the options before commenting more, but:

 while we're at it:

 It seems very odd to me that datetime64 supports different units
 (right down to  attosecond) but not different epochs. How can it
 possible be useful to use nanoseconds, etc, but only right around
 1970? For that matter, why all the units at all? I can see the need
 for nanosecond resolution, but not without changing the epoch -- so if
 the epoch is fixed, why bother with different units?
snip

When Ivan and me were discussing that, I remember us deciding that such 
a small units would be useful mainly for the timedelta datatype, which 
is a relative, not absolute time.  We did not want to make short for 
very precise time measurements, and this is why we decided to go with 
attoseconds.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] timezones and datetime64

2013-04-04 Thread Francesc Alted
On 4/4/13 1:54 PM, Nathaniel Smith wrote:
 On Thu, Apr 4, 2013 at 12:52 AM, Chris Barker - NOAA Federal
 chris.bar...@noaa.gov wrote:
 Thanks all for taking an interest. I need to think a bot more about
 the options before commenting more, but:

 while we're at it:

 It seems very odd to me that datetime64 supports different units
 (right down to  attosecond) but not different epochs. How can it
 possible be useful to use nanoseconds, etc, but only right around
 1970? For that matter, why all the units at all? I can see the need
 for nanosecond resolution, but not without changing the epoch -- so if
 the epoch is fixed, why bother with different units? Using days (for
 instance) rather than seconds doesn't save memory, as we're always
 using 64 bits. It can't be common to need more than 2.9e12 years (OK,
 that's not quite as old as the universe, so some cosmologists may need
 it...)
 Another reason why it might be interesting to support different epochs
 is that many timeseries (e.g., the ones I work with) aren't linked to
 absolute time, but are instead milliseconds since we turned on the
 recording equipment. You can reasonably represent these as timedeltas
 of course, but it'd be even more elegant to be able to be able to
 represent them as absolute times against an opaque epoch. In
 particular, when you have multiple recording tracks, only those which
 were recorded against the same epoch are actually commensurable --
 trying to do
recording1_times[10] - recording2_times[10]
 is meaningless and should be an error.

I remember to be discussing this with some level of depth 5 years ago in 
this list, as we asked people about the convenience of including an 
user-defined 'epoch'.  We were calling it 'origin'. But apparently it 
was decided that this was not needed because timestamps+timedelta would 
be enough.  The NEP still reflects this discussion:

https://github.com/numpy/numpy/blob/master/doc/neps/datetime-proposal.rst#why-the-origin-metadata-disappeared

This is just an historical note, not that we can't change that again.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] timezones and datetime64

2013-04-04 Thread Francesc Alted
On 4/4/13 8:56 PM, Chris Barker - NOAA Federal wrote:
 On Thu, Apr 4, 2013 at 10:54 AM, Francesc Alted franc...@continuum.io wrote:

 That makes a difference.  This can be specially important for creating
 user-defined time origins:

 In []: np.array(int(1.5e9), dtype='datetime64[s]') + np.array(1,
 dtype='timedelta64[ns]')
 Out[]: numpy.datetime64('2017-07-14T04:40:00.1+0200')
 but that's worthless if you try it higher-resolution:

 In [40]: np.array(int(1.5e9), dtype='datetime64[s]')
 Out[40]: array(datetime.datetime(2017, 7, 14, 2, 40), dtype='datetime64[s]')

 # Start at 2017

 # add a picosecond:
 In [41]: np.array(int(1.5e9), dtype='datetime64[s]') + np.array(1,
 dtype='timedelta64[ps]')
 Out[41]: numpy.datetime64('1970-03-08T22:55:30.029526319105-0800')

 # get 1970???

This is clearly a bug.  Could you file a ticket please?

Also, using attoseconds is giving a weird behavior:

In []: np.array(int(1.5e9), dtype='datetime64[s]') + np.array(1, 
dtype='timedelta64[as]')
---
OverflowError Traceback (most recent call last)
ipython-input-42-acd66c465bef in module()
 1 np.array(int(1.5e9), dtype='datetime64[s]') + np.array(1, 
dtype='timedelta64[as]')

OverflowError: Integer overflow getting a common metadata divisor for 
NumPy datetime metadata [s] and [as]

I would expect the attosecond to be happily ignored and nothing would be 
added.


 And even with nanoseconds, given the leap-second issues, etc, you
 really wouldn't want to do this anyway -- rather, keep your epoch
 close by.

 Now that I think about it -- being able to set your epoch could lessen
 the impact of leap-seconds for second-resolution as well.

Probably this is the way to go, yes.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks

2013-03-13 Thread Francesc Alted
On 3/13/13 2:45 PM, Andrea Cimatoribus wrote:
 Hi everybody, I hope this has not been discussed before, I couldn't find a 
 solution elsewhere.
 I need to read some binary data, and I am using numpy.fromfile to do this. 
 Since the files are huge, and would make me run out of memory, I need to read 
 data skipping some records (I am reading data recorded at high frequency, so 
 basically I want to read subsampling).
[clip]

You can do a fid.seek(offset) prior to np.fromfile() and the it will 
read from offset.  See the docstrings for `file.seek()` on how to use it.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks

2013-03-13 Thread Francesc Alted
On 3/13/13 3:53 PM, Francesc Alted wrote:
 On 3/13/13 2:45 PM, Andrea Cimatoribus wrote:
 Hi everybody, I hope this has not been discussed before, I couldn't 
 find a solution elsewhere.
 I need to read some binary data, and I am using numpy.fromfile to do 
 this. Since the files are huge, and would make me run out of memory, 
 I need to read data skipping some records (I am reading data recorded 
 at high frequency, so basically I want to read subsampling).
 [clip]

 You can do a fid.seek(offset) prior to np.fromfile() and the it will 
 read from offset.  See the docstrings for `file.seek()` on how to use it.


Ups, you were already using file.seek().  Disregard, please.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] aligned / unaligned structured dtype behavior (was: GSOC 2013)

2013-03-07 Thread Francesc Alted
On 3/6/13 7:42 PM, Kurt Smith wrote:
 And regarding performance, doing simple timings shows a 30%-ish
 slowdown for unaligned operations:

 In [36]: %timeit packed_arr['b']**2
 100 loops, best of 3: 2.48 ms per loop

 In [37]: %timeit aligned_arr['b']**2
 1000 loops, best of 3: 1.9 ms per loop

Hmm, that clearly depends on the architecture.  On my machine:

In [1]: import numpy as np

In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True)

In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False)

In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt)

In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt)

In [6]: baligned = aligned_arr['b']

In [7]: bpacked = packed_arr['b']

In [8]: %timeit baligned**2
1000 loops, best of 3: 1.96 ms per loop

In [9]: %timeit bpacked**2
100 loops, best of 3: 7.84 ms per loop

That is, the unaligned column is 4x slower (!).  numexpr allows somewhat 
better results:

In [11]: %timeit numexpr.evaluate('baligned**2')
1000 loops, best of 3: 1.13 ms per loop

In [12]: %timeit numexpr.evaluate('bpacked**2')
1000 loops, best of 3: 865 us per loop

Yes, in this case, the unaligned array goes faster (as much as 30%).  I 
think the reason is that numexpr optimizes the unaligned access by doing 
a copy of the different chunks in internal buffers that fits in L1 
cache.  Apparently this is very beneficial in this case (not sure why, 
though).


 Whereas summing shows just a 10%-ish slowdown:

 In [38]: %timeit packed_arr['b'].sum()
 1000 loops, best of 3: 1.29 ms per loop

 In [39]: %timeit aligned_arr['b'].sum()
 1000 loops, best of 3: 1.14 ms per loop

On my machine:

In [14]: %timeit baligned.sum()
1000 loops, best of 3: 1.03 ms per loop

In [15]: %timeit bpacked.sum()
100 loops, best of 3: 3.79 ms per loop

Again, the 4x slowdown is here.  Using numexpr:

In [16]: %timeit numexpr.evaluate('sum(baligned)')
100 loops, best of 3: 2.16 ms per loop

In [17]: %timeit numexpr.evaluate('sum(bpacked)')
100 loops, best of 3: 2.08 ms per loop

Again, the unaligned case is (sligthly better).  In this case numexpr is 
a bit slower that NumPy because sum() is not parallelized internally.  
Hmm, provided that, I'm wondering if some internal copies to L1 in NumPy 
could help improving unaligned performance. Worth a try?

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] aligned / unaligned structured dtype behavior

2013-03-07 Thread Francesc Alted
On 3/7/13 6:47 PM, Francesc Alted wrote:
 On 3/6/13 7:42 PM, Kurt Smith wrote:
 And regarding performance, doing simple timings shows a 30%-ish
 slowdown for unaligned operations:

 In [36]: %timeit packed_arr['b']**2
 100 loops, best of 3: 2.48 ms per loop

 In [37]: %timeit aligned_arr['b']**2
 1000 loops, best of 3: 1.9 ms per loop

 Hmm, that clearly depends on the architecture.  On my machine:

 In [1]: import numpy as np

 In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True)

 In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False)

 In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt)

 In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt)

 In [6]: baligned = aligned_arr['b']

 In [7]: bpacked = packed_arr['b']

 In [8]: %timeit baligned**2
 1000 loops, best of 3: 1.96 ms per loop

 In [9]: %timeit bpacked**2
 100 loops, best of 3: 7.84 ms per loop

 That is, the unaligned column is 4x slower (!).  numexpr allows 
 somewhat better results:

 In [11]: %timeit numexpr.evaluate('baligned**2')
 1000 loops, best of 3: 1.13 ms per loop

 In [12]: %timeit numexpr.evaluate('bpacked**2')
 1000 loops, best of 3: 865 us per loop

Just for completeness, here it is what Theano gets:

In [18]: import theano

In [20]: a = theano.tensor.vector()

In [22]: f = theano.function([a], a**2)

In [23]: %timeit f(baligned)
100 loops, best of 3: 7.74 ms per loop

In [24]: %timeit f(bpacked)
100 loops, best of 3: 12.6 ms per loop

So yeah, Theano is also slower for the unaligned case (but less than 2x 
in this case).


 Yes, in this case, the unaligned array goes faster (as much as 30%).  
 I think the reason is that numexpr optimizes the unaligned access by 
 doing a copy of the different chunks in internal buffers that fits in 
 L1 cache.  Apparently this is very beneficial in this case (not sure 
 why, though).


 Whereas summing shows just a 10%-ish slowdown:

 In [38]: %timeit packed_arr['b'].sum()
 1000 loops, best of 3: 1.29 ms per loop

 In [39]: %timeit aligned_arr['b'].sum()
 1000 loops, best of 3: 1.14 ms per loop

 On my machine:

 In [14]: %timeit baligned.sum()
 1000 loops, best of 3: 1.03 ms per loop

 In [15]: %timeit bpacked.sum()
 100 loops, best of 3: 3.79 ms per loop

 Again, the 4x slowdown is here.  Using numexpr:

 In [16]: %timeit numexpr.evaluate('sum(baligned)')
 100 loops, best of 3: 2.16 ms per loop

 In [17]: %timeit numexpr.evaluate('sum(bpacked)')
 100 loops, best of 3: 2.08 ms per loop

And with Theano:

In [26]: f2 = theano.function([a], a.sum())

In [27]: %timeit f2(baligned)
100 loops, best of 3: 2.52 ms per loop

In [28]: %timeit f2(bpacked)
100 loops, best of 3: 7.43 ms per loop

Again, the unaligned case is significantly slower (as much as 3x here!).

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] GSOC 2013

2013-03-06 Thread Francesc Alted
On 3/5/13 7:14 PM, Kurt Smith wrote:
 On Tue, Mar 5, 2013 at 1:45 AM, Eric Firing efir...@hawaii.edu wrote:
 On 2013/03/04 9:01 PM, Nicolas Rougier wrote:
 This made me think of a serious performance limitation of structured 
 dtypes: a
 structured dtype is always packed, which may lead to terrible byte 
 alignment
 for common types.  For instance, `dtype([('a', 'u1'), ('b',
 'u8')]).itemsize == 9`,
 meaning that the 8-byte integer is not aligned as an equivalent C-struct's
 would be, leading to all sorts of horrors at the cache and register level.
 Doesn't the align kwarg of np.dtype do what you want?

 In [2]: dt = np.dtype(dict(names=['a', 'b'], formats=['u1', 'u8']),
 align=True)

 In [3]: dt.itemsize
 Out[3]: 16
 Thanks!  That's what I get for not checking before posting.

 Consider this my vote to make `aligned=True` the default.

I would not run too much.  The example above takes 9 bytes to host the 
structure, while a `aligned=True` will take 16 bytes.  I'd rather let 
the default as it is, and in case performance is critical, you can 
always copy the unaligned field to a new (homogeneous) array.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] pip install numpy throwing a lot of output.

2013-02-12 Thread Francesc Alted
On 2/12/13 1:37 PM, Daπid wrote:
 I have just upgraded numpy with pip on Linux 64 bits with Python 2.7,
 and I got *a lot* of output, so much it doesn't fit in the terminal.
 Most of it are gcc commands, but there are many different errors
 thrown by the compiler. Is this expected?

Yes, I think that's expected. Just to make sure, can you send some 
excerpts of the errors that you are getting?


 I am not too worried as the test suite passes, but pip is supposed to
 give only meaningful output (or at least, this is what the creators
 intended).

Well, pip needs to compile the libraries prior to install them, so 
compile messages are meaningful. Another question would be to reduce the 
amount of compile messages by default in NumPy, but I don't think this 
is realistic (and even not desirable).

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] pip install numpy throwing a lot of output.

2013-02-12 Thread Francesc Alted
On 2/12/13 3:18 PM, Daπid wrote:
 On 12 February 2013 14:58, Francesc Alted franc...@continuum.io wrote:
 Yes, I think that's expected. Just to make sure, can you send some
 excerpts of the errors that you are getting?
 Actually the errors are at the beginning of the process, so they are
 out of the reach of my terminal right now. Seems like pip doesn't keep
 a log in case of success.

Well, I think these errors are part of the auto-discovering process of 
the functions supported by the libraries in the hosting OS (kind of 
`autoconf`for Python), so they can be considered 'normal'.


 The ones I can see are mostly warnings of unused variables and
 functions, maybe this is the expected behaviour for a library? This
 errors come from a complete reinstall instead of the original upgrade
 (the cat closed the terminal, worst excuse ever!):
[clip]

These ones are not errors, but warnings. While it should be desirable to 
avoid any warning during the compilation process, not many libraries 
fulfill this (but patches for removing them are accepted).

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Byte aligned arrays

2012-12-21 Thread Francesc Alted
On 12/20/12 7:35 PM, Henry Gomersall wrote:
 On Thu, 2012-12-20 at 15:23 +0100, Francesc Alted wrote:
 On 12/20/12 9:53 AM, Henry Gomersall wrote:
 On Wed, 2012-12-19 at 19:03 +0100, Francesc Alted wrote:
 The only scenario that I see that this would create unaligned
 arrays
 is
 for machines having AVX.  But provided that the Intel architecture
 is
 making great strides in fetching unaligned data, I'd be surprised
 that
 the difference in performance would be even noticeable.

 Can you tell us which difference in performance are you seeing for
 an
 AVX-aligned array and other that is not AVX-aligned?  Just curious.
 Further to this point, from an Intel article...


 http://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors
 Aligning data to vector length is always recommended. When using
 Intel
 SSE and Intel SSE2 instructions, loaded data should be aligned to 16
 bytes. Similarly, to achieve best results use Intel AVX instructions
 on
 32-byte vectors that are 32-byte aligned. The use of Intel AVX
 instructions on unaligned 32-byte vectors means that every second
 load
 will be across a cache-line split, since the cache line is 64 bytes.
 This doubles the cache line split rate compared to Intel SSE code
 that
 uses 16-byte vectors. A high cache-line split rate in
 memory-intensive
 code is extremely likely to cause performance degradation. For that
 reason, it is highly recommended to align the data to 32 bytes for
 use
 with Intel AVX.

 Though it would be nice to put together a little example of this!
 Indeed, an example is what I was looking for.  So provided that I
 have
 access to an AVX capable machine (having 6 physical cores), and that
 MKL
 10.3 has support for AVX, I have made some comparisons using the
 Anaconda Python distribution (it ships with most packages linked
 against
 MKL 10.3).
 snip

 All in all, it is not clear that AVX alignment would have an
 advantage,
 even for memory-bounded problems.  But of course, if Intel people are
 saying that AVX alignment is important is because they have use cases
 for asserting this.  It is just that I'm having a difficult time to
 find
 these cases.
 Thanks for those examples, they were very interesting. I managed to
 temporarily get my hands on a machine with AVX and I have shown some
 speed-up with aligned arrays.

 FFT (using my wrappers) gives about a 15% speedup.

 Also this convolution code:
 https://github.com/hgomersall/SSE-convolution/blob/master/convolve.c

 Shows a small but repeatable speed-up (a few %) when using some aligned
 loads (as many as I can work out to use!).

Okay, so a 15% is significant, yes.  I'm still wondering why I did not 
get any speedup at all using MKL, but probably the reason is that it 
manages the unaligned corners of the datasets first, and then uses an 
aligned access for the rest of the data (but just guessing here).

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Byte aligned arrays

2012-12-21 Thread Francesc Alted
On 12/21/12 11:58 AM, Henry Gomersall wrote:
 On Fri, 2012-12-21 at 11:34 +0100, Francesc Alted wrote:
 Also this convolution code:
 https://github.com/hgomersall/SSE-convolution/blob/master/convolve.c

 Shows a small but repeatable speed-up (a few %) when using some
 aligned
 loads (as many as I can work out to use!).
 Okay, so a 15% is significant, yes.  I'm still wondering why I did
 not
 get any speedup at all using MKL, but probably the reason is that it
 manages the unaligned corners of the datasets first, and then uses an
 aligned access for the rest of the data (but just guessing here).
 With SSE in that convolution code example above (in which all alignments
 need be considered for each output element), I note a significant
 speedup by creating 4 copies of the float input array using memcopy,
 each shifted by 1 float (so the 5th element is aligned again). Despite
 all the extra copies its still quicker than using an unaligned load.
 However, when one tries the same trick with 8 copies for AVX it's
 actually slower than the SSE case.

 The fastest AVX (and any) implementation I have so far is with
 16-aligned arrays (made with 4 copies as with SSE), with alternate
 aligned and unaligned loads (which is always at worst 16-byte aligned).

 Fascinating stuff!

Yes, to say the least.  And it supports the fact that, when fine tuning 
memory access performance, there is no replacement for experimentation 
(in some weird ways many times :)

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Byte aligned arrays

2012-12-21 Thread Francesc Alted
On 12/21/12 1:35 PM, Dag Sverre Seljebotn wrote:
 On 12/20/2012 03:23 PM, Francesc Alted wrote:
 On 12/20/12 9:53 AM, Henry Gomersall wrote:
 On Wed, 2012-12-19 at 19:03 +0100, Francesc Alted wrote:
 The only scenario that I see that this would create unaligned arrays
 is
 for machines having AVX.  But provided that the Intel architecture is
 making great strides in fetching unaligned data, I'd be surprised
 that
 the difference in performance would be even noticeable.

 Can you tell us which difference in performance are you seeing for an
 AVX-aligned array and other that is not AVX-aligned?  Just curious.
 Further to this point, from an Intel article...

 http://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors

 Aligning data to vector length is always recommended. When using Intel
 SSE and Intel SSE2 instructions, loaded data should be aligned to 16
 bytes. Similarly, to achieve best results use Intel AVX instructions on
 32-byte vectors that are 32-byte aligned. The use of Intel AVX
 instructions on unaligned 32-byte vectors means that every second load
 will be across a cache-line split, since the cache line is 64 bytes.
 This doubles the cache line split rate compared to Intel SSE code that
 uses 16-byte vectors. A high cache-line split rate in memory-intensive
 code is extremely likely to cause performance degradation. For that
 reason, it is highly recommended to align the data to 32 bytes for use
 with Intel AVX.

 Though it would be nice to put together a little example of this!
 Indeed, an example is what I was looking for.  So provided that I have
 access to an AVX capable machine (having 6 physical cores), and that MKL
 10.3 has support for AVX, I have made some comparisons using the
 Anaconda Python distribution (it ships with most packages linked against
 MKL 10.3).

 Here it is a first example using a DGEMM operation.  First using a NumPy
 that is not turbo-loaded with MKL:

 In [34]: a = np.linspace(0,1,1e7)

 In [35]: b = a.reshape(1000, 1)

 In [36]: c = a.reshape(1, 1000)

 In [37]: time d = np.dot(b,c)
 CPU times: user 7.56 s, sys: 0.03 s, total: 7.59 s
 Wall time: 7.63 s

 In [38]: time d = np.dot(c,b)
 CPU times: user 78.52 s, sys: 0.18 s, total: 78.70 s
 Wall time: 78.89 s

 This is getting around 2.6 GFlop/s.  Now, with a MKL 10.3 NumPy and
 AVX-unaligned data:

 In [7]: p = ctypes.create_string_buffer(int(8e7)); hex(ctypes.addressof(p))
 Out[7]: '0x7fcdef3b4010'  # 16 bytes alignment

 In [8]: a = np.ndarray(1e7, f8, p)

 In [9]: a[:] = np.linspace(0,1,1e7)

 In [10]: b = a.reshape(1000, 1)

 In [11]: c = a.reshape(1, 1000)

 In [37]: %timeit d = np.dot(b,c)
 10 loops, best of 3: 164 ms per loop

 In [38]: %timeit d = np.dot(c,b)
 1 loops, best of 3: 1.65 s per loop

 That is around 120 GFlop/s (i.e. almost 50x faster than without MKL/AVX).

 Now, using MKL 10.3 and AVX-aligned data:

 In [21]: p2 = ctypes.create_string_buffer(int(8e7+16));
 hex(ctypes.addressof(p))
 Out[21]: '0x7f8cb9598010'

 In [22]: a2 = np.ndarray(1e7+2, f8, p2)[2:]  # skip the first 16 bytes
 (now is 32-bytes aligned)

 In [23]: a2[:] = np.linspace(0,1,1e7)

 In [24]: b2 = a2.reshape(1000, 1)

 In [25]: c2 = a2.reshape(1, 1000)

 In [35]: %timeit d2 = np.dot(b2,c2)
 10 loops, best of 3: 163 ms per loop

 In [36]: %timeit d2 = np.dot(c2,b2)
 1 loops, best of 3: 1.67 s per loop

 So, again, around 120 GFlop/s, and the difference wrt to unaligned AVX
 data is negligible.

 One may argue that DGEMM is CPU-bounded and that memory access plays
 little role here, and that is certainly true.  So, let's go with a more
 memory-bounded problem, like computing a transcendental function with
 numexpr.  First with a with NumPy and numexpr with no MKL support:

 In [8]: a = np.linspace(0,1,1e8)

 In [9]: %time b = np.sin(a)
 CPU times: user 1.20 s, sys: 0.22 s, total: 1.42 s
 Wall time: 1.42 s

 In [10]: import numexpr as ne

 In [12]: %time b = ne.evaluate(sin(a))
 CPU times: user 1.42 s, sys: 0.27 s, total: 1.69 s
 Wall time: 0.37 s

 This time is around 4x faster than regular 'sin' in libc, and about the
 same speed than a memcpy():

 In [13]: %time c = a.copy()
 CPU times: user 0.19 s, sys: 0.20 s, total: 0.39 s
 Wall time: 0.39 s

 Now, with a MKL-aware numexpr and non-AVX alignment:

 In [8]: p = ctypes.create_string_buffer(int(8e8)); hex(ctypes.addressof(p))
 Out[8]: '0x7fce435da010'  # 16 bytes alignment

 In [9]: a = np.ndarray(1e8, f8, p)

 In [10]: a[:] = np.linspace(0,1,1e8)

 In [11]: %time b = ne.evaluate(sin(a))
 CPU times: user 0.44 s, sys: 0.27 s, total: 0.71 s
 Wall time: 0.15 s

 That is, more than 2x faster than a memcpy() in this system, meaning
 that the problem is truly memory-bounded.  So now, with an AVX aligned
 buffer:

 In [14]: a2 = a[2:]  # skip the first 16 bytes

 In [15]: %time b = ne.evaluate(sin(a2))
 CPU times: user 0.40 s, sys: 0.28 s, total: 0.69 s
 Wall time: 0.16 s

 Again, times are very close.  Just

Re: [Numpy-discussion] Byte aligned arrays

2012-12-20 Thread Francesc Alted
On 12/20/12 9:53 AM, Henry Gomersall wrote:
 On Wed, 2012-12-19 at 19:03 +0100, Francesc Alted wrote:
 The only scenario that I see that this would create unaligned arrays
 is
 for machines having AVX.  But provided that the Intel architecture is
 making great strides in fetching unaligned data, I'd be surprised
 that
 the difference in performance would be even noticeable.

 Can you tell us which difference in performance are you seeing for an
 AVX-aligned array and other that is not AVX-aligned?  Just curious.
 Further to this point, from an Intel article...

 http://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors

 Aligning data to vector length is always recommended. When using Intel
 SSE and Intel SSE2 instructions, loaded data should be aligned to 16
 bytes. Similarly, to achieve best results use Intel AVX instructions on
 32-byte vectors that are 32-byte aligned. The use of Intel AVX
 instructions on unaligned 32-byte vectors means that every second load
 will be across a cache-line split, since the cache line is 64 bytes.
 This doubles the cache line split rate compared to Intel SSE code that
 uses 16-byte vectors. A high cache-line split rate in memory-intensive
 code is extremely likely to cause performance degradation. For that
 reason, it is highly recommended to align the data to 32 bytes for use
 with Intel AVX.

 Though it would be nice to put together a little example of this!

Indeed, an example is what I was looking for.  So provided that I have 
access to an AVX capable machine (having 6 physical cores), and that MKL 
10.3 has support for AVX, I have made some comparisons using the 
Anaconda Python distribution (it ships with most packages linked against 
MKL 10.3).

Here it is a first example using a DGEMM operation.  First using a NumPy 
that is not turbo-loaded with MKL:

In [34]: a = np.linspace(0,1,1e7)

In [35]: b = a.reshape(1000, 1)

In [36]: c = a.reshape(1, 1000)

In [37]: time d = np.dot(b,c)
CPU times: user 7.56 s, sys: 0.03 s, total: 7.59 s
Wall time: 7.63 s

In [38]: time d = np.dot(c,b)
CPU times: user 78.52 s, sys: 0.18 s, total: 78.70 s
Wall time: 78.89 s

This is getting around 2.6 GFlop/s.  Now, with a MKL 10.3 NumPy and 
AVX-unaligned data:

In [7]: p = ctypes.create_string_buffer(int(8e7)); hex(ctypes.addressof(p))
Out[7]: '0x7fcdef3b4010'  # 16 bytes alignment

In [8]: a = np.ndarray(1e7, f8, p)

In [9]: a[:] = np.linspace(0,1,1e7)

In [10]: b = a.reshape(1000, 1)

In [11]: c = a.reshape(1, 1000)

In [37]: %timeit d = np.dot(b,c)
10 loops, best of 3: 164 ms per loop

In [38]: %timeit d = np.dot(c,b)
1 loops, best of 3: 1.65 s per loop

That is around 120 GFlop/s (i.e. almost 50x faster than without MKL/AVX).

Now, using MKL 10.3 and AVX-aligned data:

In [21]: p2 = ctypes.create_string_buffer(int(8e7+16)); 
hex(ctypes.addressof(p))
Out[21]: '0x7f8cb9598010'

In [22]: a2 = np.ndarray(1e7+2, f8, p2)[2:]  # skip the first 16 bytes 
(now is 32-bytes aligned)

In [23]: a2[:] = np.linspace(0,1,1e7)

In [24]: b2 = a2.reshape(1000, 1)

In [25]: c2 = a2.reshape(1, 1000)

In [35]: %timeit d2 = np.dot(b2,c2)
10 loops, best of 3: 163 ms per loop

In [36]: %timeit d2 = np.dot(c2,b2)
1 loops, best of 3: 1.67 s per loop

So, again, around 120 GFlop/s, and the difference wrt to unaligned AVX 
data is negligible.

One may argue that DGEMM is CPU-bounded and that memory access plays 
little role here, and that is certainly true.  So, let's go with a more 
memory-bounded problem, like computing a transcendental function with 
numexpr.  First with a with NumPy and numexpr with no MKL support:

In [8]: a = np.linspace(0,1,1e8)

In [9]: %time b = np.sin(a)
CPU times: user 1.20 s, sys: 0.22 s, total: 1.42 s
Wall time: 1.42 s

In [10]: import numexpr as ne

In [12]: %time b = ne.evaluate(sin(a))
CPU times: user 1.42 s, sys: 0.27 s, total: 1.69 s
Wall time: 0.37 s

This time is around 4x faster than regular 'sin' in libc, and about the 
same speed than a memcpy():

In [13]: %time c = a.copy()
CPU times: user 0.19 s, sys: 0.20 s, total: 0.39 s
Wall time: 0.39 s

Now, with a MKL-aware numexpr and non-AVX alignment:

In [8]: p = ctypes.create_string_buffer(int(8e8)); hex(ctypes.addressof(p))
Out[8]: '0x7fce435da010'  # 16 bytes alignment

In [9]: a = np.ndarray(1e8, f8, p)

In [10]: a[:] = np.linspace(0,1,1e8)

In [11]: %time b = ne.evaluate(sin(a))
CPU times: user 0.44 s, sys: 0.27 s, total: 0.71 s
Wall time: 0.15 s

That is, more than 2x faster than a memcpy() in this system, meaning 
that the problem is truly memory-bounded.  So now, with an AVX aligned 
buffer:

In [14]: a2 = a[2:]  # skip the first 16 bytes

In [15]: %time b = ne.evaluate(sin(a2))
CPU times: user 0.40 s, sys: 0.28 s, total: 0.69 s
Wall time: 0.16 s

Again, times are very close.  Just to make sure, let's use the timeit magic:

In [16]: %timeit b = ne.evaluate(sin(a))
10 loops, best of 3: 159 ms per loop   # unaligned

In [17]: %timeit b

Re: [Numpy-discussion] Byte aligned arrays

2012-12-19 Thread Francesc Alted
On 12/19/12 5:47 PM, Henry Gomersall wrote:
 On Wed, 2012-12-19 at 15:57 +, Nathaniel Smith wrote:
 Not sure which interface is more useful to users. On the one hand,
 using funny dtypes makes regular non-SIMD access more cumbersome, and
 it forces your array size to be a multiple of the SIMD word size,
 which might be inconvenient if your code is smart enough to handle
 arbitrary-sized arrays with partial SIMD acceleration (i.e., using
 SIMD for most of the array, and then a slow path to handle any partial
 word at the end). OTOH, if your code *is* that smart, you should
 probably just make it smart enough to handle a partial word at the
 beginning as well and then you won't need any special alignment in the
 first place, and representing each SIMD word as a single numpy scalar
 is an intuitively appealing model of how SIMD works. OTOOH, just
 adding a single argument np.array() is a much simpler to explain than
 some elaborate scheme involving the creation of special custom dtypes.
 If it helps, my use-case is in wrapping the FFTW library. This _is_
 smart enough to deal with unaligned arrays, but it just results in a
 performance penalty. In the case of an FFT, there are clearly going to
 be issues with the powers of two indices in the array not lying on a
 suitable n-byte boundary (which would be the case with a misaligned
 array), but I imagine it's not unique.

 The other point is that it's easy to create a suitable power of two
 array that should always bypass any special case unaligned code (e.g.
 with floats, any multiple of 4 array length will fill every 16-byte
 word).

 Finally, I think there is significant value in auto-aligning the array
 based on an appropriate inspection of the cpu capabilities (or
 alternatively, a function that reports back the appropriate SIMD
 alignment). Again, this makes it easier to wrap libraries that may
 function with any alignment, but benefit from optimum alignment.

Hmm, NumPy seems to return data blocks that are aligned to 16 bytes on 
systems (Linux and Mac OSX):

In []: np.empty(1).data
Out[]: read-write buffer for 0x102b97b60, size 8, offset 0 at 0x102e7c130

In []: np.empty(1).data
Out[]: read-write buffer for 0x102ba64e0, size 8, offset 0 at 0x102e7c430

In []: np.empty(1).data
Out[]: read-write buffer for 0x102b86700, size 8, offset 0 at 0x102e7c5b0

In []: np.empty(1).data
Out[]: read-write buffer for 0x102b981d0, size 8, offset 0 at 0x102e7c5f0

[Check that the last digit in the addresses above is always 0]

The only scenario that I see that this would create unaligned arrays is 
for machines having AVX.  But provided that the Intel architecture is 
making great strides in fetching unaligned data, I'd be surprised that 
the difference in performance would be even noticeable.

Can you tell us which difference in performance are you seeing for an 
AVX-aligned array and other that is not AVX-aligned?  Just curious.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] the difference between + and np.add?

2012-11-28 Thread Francesc Alted
On 11/23/12 8:00 PM, Chris Barker - NOAA Federal wrote:
 On Thu, Nov 22, 2012 at 6:20 AM, Francesc Alted franc...@continuum.io wrote:
 As Nathaniel said, there is not a difference in terms of *what* is
 computed.  However, the methods that you suggested actually differ on
 *how* they are computed, and that has dramatic effects on the time
 used.  For example:

 In []: arr1, arr2, arr3, arr4, arr5 = [np.arange(1e7) for x in range(5)]

 In []: %time arr1 + arr2 + arr3 + arr4 + arr5
 CPU times: user 0.05 s, sys: 0.10 s, total: 0.14 s
 Wall time: 0.15 s
 There are also ways to minimize the size of temporaries, and numexpr is
 one of the simplests:
 but you can also use np.add (and friends) to reduce the number of
 temporaries. It can make a difference:

 In [11]: def add_5_arrays(arr1, arr2, arr3, arr4, arr5):
 : result = arr1 + arr2
 : np.add(result, arr3, out=result)
 : np.add(result, arr4, out=result)
 : np.add(result, arr5, out=result)

 In [13]: timeit arr1 + arr2 + arr3 + arr4 + arr5
 1 loops, best of 3: 528 ms per loop

 In [17]: timeit add_5_arrays(arr1, arr2, arr3, arr4, arr5)
 1 loops, best of 3: 293 ms per loop

 (don't have numexpr on this machine for a comparison)

Yes, you are right.  However, numexpr still can beat this:

In [8]: timeit arr1 + arr2 + arr3 + arr4 + arr5
10 loops, best of 3: 138 ms per loop

In [9]: timeit add_5_arrays(arr1, arr2, arr3, arr4, arr5)
10 loops, best of 3: 74.3 ms per loop

In [10]: timeit ne.evaluate(arr1 + arr2 + arr3 + arr4 + arr5)
10 loops, best of 3: 20.8 ms per loop

The reason is that numexpr is multithreaded (using 6 cores above), and 
for memory-bounded problems like this one, fetching data in different 
threads is more efficient than using a single thread:

In [12]: timeit arr1.copy()
10 loops, best of 3: 41 ms per loop

In [13]: ne.set_num_threads(1)
Out[13]: 6

In [14]: timeit ne.evaluate(arr1)
10 loops, best of 3: 30.7 ms per loop

In [15]: ne.set_num_threads(6)
Out[15]: 1

In [16]: timeit ne.evaluate(arr1)
100 loops, best of 3: 13.4 ms per loop

I.e., the joy of multi-threading is that it not only buys you CPU speed, 
but can also bring your data from memory faster.  So yeah, modern 
applications *do* need multi-threading for getting good performance.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Conditional update of recarray field

2012-11-28 Thread Francesc Alted
On 11/28/12 1:47 PM, Bartosz wrote:
 Hi,

 I try to update values in a single field of numpy record array based on
 a condition defined in another array. I found that that the result
 depends on the order in which I apply the boolean indices/field names.

 For example:

 cond = np.zeros(5, dtype=np.bool)
 cond[2:] = True
 X = np.rec.fromarrays([np.arange(5)], names='a')
 X[cond]['a'] = -1
 print X

 returns:  [(0,) (1,) (2,) (3,) (4,)] (the values were not updated)

 X['a'][cond] = -1
 print X

 returns: [(0,) (1,) (-1,) (-1,) (-1,)] (it worked this time).

 I find this behaviour very confusing. Is it expected?

Yes, it is.  In the first idiom, X[cond] is a fancy indexing operation 
and the result is not a view, so what you are doing is basically 
modifying the temporary object that results from the indexing.  In the 
second idiom, X['a'] is returning a *view* of the original object, so 
this is why it works.

   Would it be
 possible to emit a warning message in the case of faulty assignments?

The only solution that I can see for this is that the fancy indexing 
would return a view, and not a different object, but NumPy containers 
are not prepared for this.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Conditional update of recarray field

2012-11-28 Thread Francesc Alted
Hey Bartosz,

On 11/28/12 3:26 PM, Bartosz wrote:
 Thanks for answer, Francesc.

 I understand now that fancy indexing returns a copy of a recarray. Is
 it also true for standard ndarrays? If so, I do not understand why
 X['a'][cond]=-1 should work.

Yes, that's a good question.  No, in this case the boolean array `cond` 
is passed to the __setitem__() of the original view, so this is why this 
works.  The first idiom is concatenating the fancy indexing with another 
indexing operation, and NumPy needs to create a temporary for executing 
this, so the second indexing operation acts over a copy, not a view.

And yes, fancy indexing returning a copy is standard for all ndarrays.

Hope it is clearer now (although admittedly it is a bit strange at first 
sight),

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] the difference between + and np.add?

2012-11-22 Thread Francesc Alted
On 11/22/12 1:41 PM, Chao YUE wrote:
 Dear all,

 if I have two ndarray arr1 and arr2 (with the same shape), is there 
 some difference when I do:

 arr = arr1 + arr2

 and

 arr = np.add(arr1, arr2),

 and then if I have more than 2 arrays: arr1, arr2, arr3, arr4, arr5, 
 then I cannot use np.add anymore as it only recieves 2 arguments.
 then what's the best practice to add these arrays? should I do

 arr = arr1 + arr2 + arr3 + arr4 + arr5

 or I do

 arr = np.sum(np.array([arr1, arr2, arr3, arr4, arr5]), axis=0)?

 because I just noticed recently that there are functions like np.add, 
 np.divide, np.substract... before I am using all like directly 
 arr1/arr2, rather than np.divide(arr1,arr2).

As Nathaniel said, there is not a difference in terms of *what* is 
computed.  However, the methods that you suggested actually differ on 
*how* they are computed, and that has dramatic effects on the time 
used.  For example:

In []: arr1, arr2, arr3, arr4, arr5 = [np.arange(1e7) for x in range(5)]

In []: %time arr1 + arr2 + arr3 + arr4 + arr5
CPU times: user 0.05 s, sys: 0.10 s, total: 0.14 s
Wall time: 0.15 s
Out[]:
array([  0.e+00,   5.e+00,   1.e+01, ...,
  4.9850e+07,   4.9900e+07,   4.9950e+07])

In []: %time np.sum(np.array([arr1, arr2, arr3, arr4, arr5]), axis=0)
CPU times: user 2.98 s, sys: 0.15 s, total: 3.13 s
Wall time: 3.14 s
Out[]:
array([  0.e+00,   5.e+00,   1.e+01, ...,
  4.9850e+07,   4.9900e+07,   4.9950e+07])

The difference is how memory is used.  In the first case, the additional 
memory was just a temporary with the size of the operands, while for the 
second case a big temporary has to be created, so the difference in is 
speed is pretty large.

There are also ways to minimize the size of temporaries, and numexpr is 
one of the simplests:

In []: import numexpr as ne

In []: %time ne.evaluate('arr1 + arr2 + arr3 + arr4 + arr5')
CPU times: user 0.04 s, sys: 0.04 s, total: 0.08 s
Wall time: 0.04 s
Out[]:
array([  0.e+00,   5.e+00,   1.e+01, ...,
  4.9850e+07,   4.9900e+07,   4.9950e+07])

Again, the computations are the same, but how you manage memory is critical.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


  1   2   3   4   5   >