Re: [Numpy-discussion] PyData Barcelona this May

2017-03-17 Thread Francesc Alted
2017-03-17 12:37 GMT+01:00 Jaime Fernández del Río :

> Last night I gave a short talk to the PyData Zürich meetup on Julian's
> temporary elision PR, and Pauli's overlapping memory one. My learnings from
> that experiment are:
>
>- there is no way to talk about both things in a 30 minute talk: I
>barely scraped the surface and ended up needing 25 minutes.
>- many people that use numpy in their daily work don't know what
>strides are, this was a BIG surprise for me.
>
> Based on that experience, I was thinking that maybe a good topic for a
> workshop would be NumPy's memory model: views, reshaping, strides, some
> hints of buffering in the iterator...
>

​Yeah, I think that workshop would represent a very valuable insight to
many people using NumPy​.


>
> And Julian's temporary work lends itself to a very nice talk, more on
> Python internals than on NumPy, but it's a very cool subject nonetheless.
>
> So my thinking is that I am going to propose those two, as a workshop and
> a talk. Thoughts?
>

​+1​



>
> Jaime
>
> On Thu, Mar 9, 2017 at 8:29 PM, Sebastian Berg  > wrote:
>
>> On Thu, 2017-03-09 at 15:45 +0100, Jaime Fernández del Río wrote:
>> > There will be a PyData conference in Barcelona this May:
>> >
>> > http://pydata.org/barcelona2017/
>> >
>> > I am planning on attending, and was thinking of maybe proposing to
>> > organize a numpy-themed workshop or tutorial.
>> >
>> > My personal inclination would be to look at some advanced topic that
>> > I know well, like writing gufuncs in Cython, but wouldn't mind doing
>> > a more run of the mill thing. Anyone has any thoughts or experiences
>> > on what has worked well in similar situations? Any specific topic you
>> > always wanted to attend a workshop on, but were afraid to ask?
>> >
>> > Alternatively, or on top of the workshop, I could propose to do a
>> > talk: talking last year at PyData Madrid about the new indexing was a
>> > lot of fun! Thing is, I have been quite disconnected from the project
>> > this past year, and can't really think of any worthwhile topic. Is
>> > there any message that we as a project would like to get out to the
>> > larger community?
>> >
>>
>> Francesc already pointed out the temporary optimization. From what I
>> remember, my personal highlight would probably be Pauli's work on the
>> memory overlap detection. Though both are rather passive improvements I
>> guess (you don't really have to learn them to use them), its very cool!
>> And if its about highlighting new stuff, these can probably easily fill
>> a talk.
>>
>> > And if you are planning on attending, please give me a shout.
>> >
>>
>> Barcelona :). Maybe I should think about it, but probably not.
>>
>>
>> > Thanks,
>> >
>> > Jaime
>> >
>> > --
>> > (\__/)
>> > ( O.o)
>> > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
>> > planes de dominación mundial.
>> > ___
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion@scipy.org
>> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
> de dominación mundial.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] caching large allocations on gnu/linux

2017-03-13 Thread Francesc Alted
2017-03-13 18:11 GMT+01:00 Julian Taylor :

> On 13.03.2017 16:21, Anne Archibald wrote:
> >
> >
> > On Mon, Mar 13, 2017 at 12:21 PM Julian Taylor
> > mailto:jtaylor.deb...@googlemail.com>>
> > wrote:
> >
> > Should it be agreed that caching is worthwhile I would propose a very
> > simple implementation. We only really need to cache a small handful
> of
> > array data pointers for the fast allocate deallocate cycle that
> appear
> > in common numpy usage.
> > For example a small list of maybe 4 pointers storing the 4 largest
> > recent deallocations. New allocations just pick the first memory
> block
> > of sufficient size.
> > The cache would only be active on systems that support MADV_FREE
> (which
> > is linux 4.5 and probably BSD too).
> >
> > So what do you think of this idea?
> >
> >
> > This is an interesting thought, and potentially a nontrivial speedup
> > with zero user effort. But coming up with an appropriate caching policy
> > is going to be tricky. The thing is, for each array, numpy grabs a block
> > "the right size", and that size can easily vary by orders of magnitude,
> > even within the temporaries of a single expression as a result of
> > broadcasting. So simply giving each new array the smallest cached block
> > that will fit could easily result in small arrays in giant allocated
> > blocks, wasting non-reclaimable memory.  So really you want to recycle
> > blocks of the same size, or nearly, which argues for a fairly large
> > cache, with smart indexing of some kind.
> >
>
> The nice thing about MADV_FREE is that we don't need any clever cache.
> The same process that marked the pages free can reclaim them in another
> allocation, at least that is what my testing indicates it allows.
> So a small allocation getting a huge memory block does not waste memory
> as the top unused part will get reclaimed when needed, either by numpy
> itself doing another allocation or a different program on the system.
>

​Well, what you say makes a lot of sense to me, so if you have tested that
then I'd say that this is worth a PR and see how it works on different
workloads​.


>
> An issue that does arise though is that this memory is not available for
> the page cache used for caching on disk data. A too large cache might
> then be detrimental for IO heavy workloads that rely on the page cache.
>

​Yeah.  Also, memory mapped arrays use the page cache intensively, so we
should test this use case​ and see how the caching affects memory map
performance.


> So we might want to cap it to some max size, provide an explicit on/off
> switch and/or have numpy IO functions clear the cache.
>

​Definitely​ dynamically
 allowing the disabling
​this feature would be desirable.  That would provide an easy path for
testing how it affects performance.  Would that be feasible?


Francesc
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PyData Barcelona this May

2017-03-09 Thread Francesc Alted
Hola Jaime!

2017-03-09 15:45 GMT+01:00 Jaime Fernández del Río :

> There will be a PyData conference in Barcelona this May:
>
> http://pydata.org/barcelona2017/
>
> I am planning on attending, and was thinking of maybe proposing to
> organize a numpy-themed workshop or tutorial.
>
> My personal inclination would be to look at some advanced topic that I
> know well, like writing gufuncs in Cython, but wouldn't mind doing a more
> run of the mill thing. Anyone has any thoughts or experiences on what has
> worked well in similar situations? Any specific topic you always wanted to
> attend a workshop on, but were afraid to ask?
>

​Writing gufuncs in Cython seems a quite advanced​ topic for a workshop,
but an interesting one indeed.  Numba also supports creating gufuncs (
http://numba.pydata.org/numba-doc/dev/reference/numpysupported.html), so
this perhaps may work as a first approach before going deeper into Cython.


>
> Alternatively, or on top of the workshop, I could propose to do a talk:
> talking last year at PyData Madrid about the new indexing was a lot of fun!
> Thing is, I have been quite disconnected from the project this past year,
> and can't really think of any worthwhile topic. Is there any message that
> we as a project would like to get out to the larger community?
>

​Not a message in particular, but perhaps it would be nice talking about
the temporaries removal ​in expressions that Julian implemented recently (
https://github.com/numpy/numpy/pull/7997) and that is to be released in
1.13.  It is a really cool (and somewhat scary) patch ;)


>
> And if you are planning on attending, please give me a shout.
>

​It would be nice to attend and see you again, but unfortunately I am quite
swamped.  Will see.​

​Have fun in Barcelona!​

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fortran order in recarray.

2017-02-22 Thread Francesc Alted
2017-02-22 16:30 GMT+01:00 Kiko :

>
>
> 2017-02-22 16:23 GMT+01:00 Alex Rogozhnikov :
>
>> Hi Francesc,
>> thanks a lot for you reply and for your impressive job on bcolz!
>>
>> Bcolz seems to make stress on compression, which is not of much interest
>> for me, but the *ctable*, and chunked operations look very appropriate
>> to me now. (Of course, I'll need to test it much before I can say this for
>> sure, that's current impression).
>>
>
​You can disable compression for bcolz by default too:

http://bcolz.blosc.org/en/latest/defaults.html#list-of-default-values​



>
>> The strongest concern with bcolz so far is that it seems to be completely
>> non-trivial to install on windows systems, while pip provides binaries for
>> most (or all?) OS for numpy.
>> I didn't build pip binary wheels myself, but is it hard / impossible to
>> cook pip-installabel binaries?
>>
>
> http://www.lfd.uci.edu/~gohlke/pythonlibs/#bcolz
> Check if the link solves the issue with installing.
>

​Yeah.  Also, there are binaries for conda:

http://bcolz.blosc.org/en/latest/install.html#installing-from-conda-forge​



>
>> ​You can change shapes of numpy arrays, but that usually involves copies
>> of the whole container.
>>
>> sure, but this is ok for me, as I plan to organize column editing in
>> 'batches', so this should require seldom copying.
>> It would be nice to see an example to understand how deep I need to go
>> inside numpy.
>>
>
​Well, if copying is not a problem for you, then you can just create a new
numpy container and do the copy by yourself.​

Francesc


>
>> Cheers,
>> Alex.
>>
>>
>>
>>
>> 22 февр. 2017 г., в 17:03, Francesc Alted  написал(а):
>>
>> Hi Alex,
>>
>> 2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov :
>>
>>> Hi Nathaniel,
>>>
>>>
>>> pandas
>>>
>>>
>>> yup, the idea was to have minimal pandas.DataFrame-like storage (which I
>>> was using for a long time),
>>> but without irritating problems with its row indexing and some other
>>> problems like interaction with matplotlib.
>>>
>>> A dict of arrays?
>>>
>>>
>>> that's what I've started from and implemented, but at some point I
>>> decided that I'm reinventing the wheel and numpy has something already. In
>>> principle, I can ignore this 'column-oriented' storage requirement, but
>>> potentially it may turn out to be quite slow-ish if dtype's size is large.
>>>
>>> Suggestions are welcome.
>>>
>>
>> ​You may want to try bcolz:
>>
>> https://github.com/Blosc/bcolz
>>
>> bcolz is a columnar storage, basically as you require, but data is
>> compressed by default even when stored in-memory (although you can disable
>> compression if you want to).​
>>
>>
>>
>>>
>>> Another strange question:
>>> in general, it is considered that once numpy.array is created, it's
>>> shape not changed.
>>> But if i want to keep the same recarray and change it's dtype and/or
>>> shape, is there a way to do this?
>>>
>>
>> ​You can change shapes of numpy arrays, but that usually involves copies
>> of the whole container.  With bcolz you can change length and add/del
>> columns without copies.​  If your containers are large, it is better to
>> inform bcolz on its final estimated size.  See:
>>
>> http://bcolz.blosc.org/en/latest/opt-tips.html
>>
>> ​Francesc​
>>
>>
>>>
>>> Thanks,
>>> Alex.
>>>
>>>
>>>
>>> 22 февр. 2017 г., в 3:53, Nathaniel Smith  написал(а):
>>>
>>> On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" 
>>> wrote:
>>>
>>> Ah, got it. Thanks, Chris!
>>> I thought recarray can be only one-dimensional (like tables with named
>>> columns).
>>>
>>> Maybe it's better to ask directly what I was looking for:
>>> something that works like a table with named columns (but no labelling
>>> for rows), and keeps data (of different dtypes) in a column-by-column way
>>> (and this is numpy, not pandas).
>>>
>>> Is there such a magic thing?
>>>
>>>
>>> Well, that's what pandas is for...
>>>
>>> A dict of arrays?
>>>
>>> -n
>>> ___
>>> NumPy-Discussion mailing list
>>

Re: [Numpy-discussion] Fortran order in recarray.

2017-02-22 Thread Francesc Alted
Hi Alex,

2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov :

> Hi Nathaniel,
>
>
> pandas
>
>
> yup, the idea was to have minimal pandas.DataFrame-like storage (which I
> was using for a long time),
> but without irritating problems with its row indexing and some other
> problems like interaction with matplotlib.
>
> A dict of arrays?
>
>
> that's what I've started from and implemented, but at some point I decided
> that I'm reinventing the wheel and numpy has something already. In
> principle, I can ignore this 'column-oriented' storage requirement, but
> potentially it may turn out to be quite slow-ish if dtype's size is large.
>
> Suggestions are welcome.
>

​You may want to try bcolz:

https://github.com/Blosc/bcolz

bcolz is a columnar storage, basically as you require, but data is
compressed by default even when stored in-memory (although you can disable
compression if you want to).​



>
> Another strange question:
> in general, it is considered that once numpy.array is created, it's shape
> not changed.
> But if i want to keep the same recarray and change it's dtype and/or
> shape, is there a way to do this?
>

​You can change shapes of numpy arrays, but that usually involves copies of
the whole container.  With bcolz you can change length and add/del columns
without copies.​  If your containers are large, it is better to inform
bcolz on its final estimated size.  See:

http://bcolz.blosc.org/en/latest/opt-tips.html

​Francesc​


>
> Thanks,
> Alex.
>
>
>
> 22 февр. 2017 г., в 3:53, Nathaniel Smith  написал(а):
>
> On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" 
> wrote:
>
> Ah, got it. Thanks, Chris!
> I thought recarray can be only one-dimensional (like tables with named
> columns).
>
> Maybe it's better to ask directly what I was looking for:
> something that works like a table with named columns (but no labelling for
> rows), and keeps data (of different dtypes) in a column-by-column way (and
> this is numpy, not pandas).
>
> Is there such a magic thing?
>
>
> Well, that's what pandas is for...
>
> A dict of arrays?
>
> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumExpr3 Alpha

2017-02-21 Thread Francesc Alted
Yes, Julian is doing an amazing work on getting rid of temporaries inside
NumPy.  However, NumExpr still has the advantage of using multi-threading
right out of the box, as well as integration with Intel VML.  Hopefully
these features will eventually arrive to NumPy, but meanwhile there is
still value in pushing NumExpr.

Francesc

2017-02-19 18:21 GMT+01:00 Marten van Kerkwijk :

> Hi All,
>
> Just a side note that at a smaller scale some of the benefits of
> numexpr are coming to numpy: Julian Taylor has been working on
> identifying temporary arrays in
> https://github.com/numpy/numpy/pull/7997. Julian also commented
> (https://github.com/numpy/numpy/pull/7997#issuecomment-246118772) that
> with PEP 523 in python 3.6, this should indeed become a lot easier.
>
> All the best,
>
> Marten
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumExpr3 Alpha

2017-02-17 Thread Francesc Alted
27;s TODO compared to NE2?
> --
>
> * strided complex functions
> * Intel VML support (less necessary now with gcc auto-vectorization)
> * bytes and unicode support
> * reductions (mean, sum, prod, std)
>
>
> What I'm looking for feedback on
> 
>
> * String arrays: How do you use them?  How would unicode differ from bytes
> strings?
> * Interface: We now have a more object-oriented interface underneath the
> familiar
>   evaluate() interface. How would you like to use this interface?
> Francesc suggested
>   generator support, as currently it's more difficult to use NumExpr
> within a loop than
>   it should be.
>
>
> Ideas for the future
> -
>
> * vectorize real functions (such as exp, sqrt, log) similar to the
> complex_functions.hpp vectorization.
> * Add a keyword (likely 'yield') to indicate that a token is intended to
> be changed by a generator inside a loop with each call to NumExpr.run()
>
> If you have any thoughts or find any issues please don't hesitate to open
> an issue at the Github repo. Although unit tests have been run over the
> operation space there are undoubtedly a number of bugs to squash.
>
> Sincerely,
>
> Robert
>
> --
> Robert McLeod, Ph.D.
> Center for Cellular Imaging and Nano Analytics (C-CINA)
> Biozentrum der Universität Basel
> Mattenstrasse 26, 4058 Basel
> Work: +41.061.387.3225 <061%20387%2032%2025>
> robert.mcl...@unibas.ch
> robert.mcl...@bsse.ethz.ch 
> robbmcl...@gmail.com
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.6.2 released!

2017-01-29 Thread Francesc Alted
=

 Announcing Numexpr 2.6.2

=


What's new

==


This is a maintenance release that fixes several issues, with special

emphasis in keeping compatibility with newer NumPy versions.  Also,

initial support for POWER processors is here.  Thanks to Oleksandr

Pavlyk, Alexander Shadchin, Breno Leitao, Fernando Seiti Furusato and

Antonio Valentino for their nice contributions.


In case you want to know more in detail what has changed in this

version, see:


https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst



What's Numexpr

==


Numexpr is a fast numerical expression evaluator for NumPy.  With it,

expressions that operate on arrays (like "3*a+4*b") are accelerated

and use less memory than doing the same calculation in Python.


It wears multi-threaded capabilities, as well as support for Intel's

MKL (Math Kernel Library), which allows an extremely fast evaluation

of transcendental functions (sin, cos, tan, exp, log...) while

squeezing the last drop of performance out of your multi-core

processors.  Look here for a some benchmarks of numexpr using MKL:


https://github.com/pydata/numexpr/wiki/NumexprMKL


Its only dependency is NumPy (MKL is optional), so it works well as an

easy-to-deploy, easy-to-use, computational engine for projects that

don't want to adopt other solutions requiring more heavy dependencies.


Where I can find Numexpr?

=


The project is hosted at GitHub in:


https://github.com/pydata/numexpr


You can get the packages from PyPI as well (but not for RC releases):


http://pypi.python.org/pypi/numexpr


Share your experience

=


Let us know of any bugs, suggestions, gripes, kudos, etc. you may

have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Francesc Alted
2016-11-04 14:36 GMT+01:00 Neal Becker :

> Francesc Alted wrote:
>
> > 2016-11-04 13:06 GMT+01:00 Neal Becker :
> >
> >> I find I often write:
> >> np.array ([some list comprehension])
> >>
> >> mainly because list comprehensions are just so sweet.
> >>
> >> But I imagine this isn't particularly efficient.
> >>
> >
> > Right.  Using a generator and np.fromiter() will avoid the creation of
> the
> > intermediate list.  Something like:
> >
> > np.fromiter((i for i in range(x)))  # use xrange for Python 2
> >
> >
> Does this generalize to >1 dimensions?
>

A reshape() is not enough?  What do you want to do exactly?


>
> _______
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Francesc Alted
2016-11-04 13:06 GMT+01:00 Neal Becker :

> I find I often write:
> np.array ([some list comprehension])
>
> mainly because list comprehensions are just so sweet.
>
> But I imagine this isn't particularly efficient.
>

Right.  Using a generator and np.fromiter() will avoid the creation of the
intermediate list.  Something like:

np.fromiter((i for i in range(x)))  # use xrange for Python 2


>
> I wonder if numpy has a "better" way, and if not, maybe it would be a nice
> addition?
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.6.1 released

2016-07-17 Thread Francesc Alted
=
 Announcing Numexpr 2.6.1
=

What's new
==

This is a maintenance release that fixes a performance regression in
some situations. More specifically, the BLOCK_SIZE1 constant has been
set to 1024 (down from 8192). This allows for better cache utilization
when there are many operands and with VML.  Fixes #221.

Also, support for NetBSD has been added.  Thanks to Thomas Klausner.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst


What's Numexpr
==

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.6.0 released

2016-06-01 Thread Francesc Alted
=
 Announcing Numexpr 2.6.0
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a minor version bump because it introduces a new function.
Also some minor fine tuning for recent CPUs has been done.  More
specifically:

- Introduced a new re_evaluate() function for re-evaluating the
  previous executed array expression without any check.  This is meant
  for accelerating loops that are re-evaluating the same expression
  repeatedly without changing anything else than the operands.  If
  unsure, use evaluate() which is safer.

- The BLOCK_SIZE1 and BLOCK_SIZE2 constants have been re-checked in
  order to find a value maximizing most of the benchmarks in bench/
  directory.  The new values (8192 and 16 respectively) give somewhat
  better results (~5%) overall.  The CPU used for fine tuning is a
  relatively new Haswell processor (E3-1240 v3).

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Calling C code that assumes SIMD aligned data.

2016-05-06 Thread Francesc Alted
2016-05-05 22:10 GMT+02:00 Øystein Schønning-Johansen :

> Thanks for your answer, Francesc. Knowing that there is no numpy solution
> saves the work of searching for this. I've not tried the solution described
> at SO, but it looks like a real performance killer. I'll rather try to
> override malloc with glibs malloc_hooks or LD_PRELOAD tricks. Do you think
> that will do it? I'll try it and report back.
>

I don't think you need that much weaponry.  Just create an array with some
spare space for alignment.  Realize that you want a 64-byte aligned double
precision array.  With that, create your desired array + 64 additional
bytes (8 doubles):

In [92]: a = np.zeros(int(1e6) + 8)

In [93]: a.ctypes.data % 64
Out[93]: 16

and compute the elements to shift this:

In [94]: shift = (64 / a.itemsize) - (a.ctypes.data % 64) / a.itemsize

In [95]: shift
Out[95]: 6

now, create a view with the required elements less:

In [98]: b = a[shift:-((64 / a.itemsize)-shift)]

In [99]: len(b)
Out[99]: 100

In [100]: b.ctypes.data % 64
Out[100]: 0

and voila, b is now aligned to 64 bytes.  As the view is a copy-free
operation, this is fast, and you only wasted 64 bytes.  Pretty cheap indeed.

Francesc


>
> Thanks,
> -Øystein
>
> On Thu, May 5, 2016 at 1:55 PM, Francesc Alted  wrote:
>
>> 2016-05-05 11:38 GMT+02:00 Øystein Schønning-Johansen > >:
>>
>>> Hi!
>>>
>>> I've written a little code of numpy code that does a neural network
>>> feedforward calculation:
>>>
>>> def feedforward(self,x):
>>> for activation, w, b in zip( self.activations, self.weights,
>>> self.biases ):
>>> x = activation( np.dot(w, x) + b)
>>>
>>> This works fine when my activation functions are in Python, however I've
>>> wrapped the activation functions from a C implementation that requires the
>>> array to be memory aligned. (due to simd instructions in the C
>>> implementation.) So I need the operation np.dot( w, x) + b to return a
>>> ndarray where the data pointer is aligned. How can I do that? Is it
>>> possible at all?
>>>
>>
>> Yes.  np.dot() does accept an `out` parameter where you can pass your
>> aligned array.  The way for testing if numpy is returning you an aligned
>> array is easy:
>>
>> In [15]: x = np.arange(6).reshape(2,3)
>>
>> In [16]: x.ctypes.data % 16
>> Out[16]: 0
>>
>> but:
>>
>> In [17]: x.ctypes.data % 32
>> Out[17]: 16
>>
>> so, in this case NumPy returned a 16-byte aligned array which should be
>> enough for 128 bit SIMD (SSE family).  This kind of alignment is pretty
>> common in modern computers.  If you need 256 bit (32-byte) alignment then
>> you will need to build your container manually.  See here for an example:
>> http://stackoverflow.com/questions/9895787/memory-alignment-for-fast-fft-in-python-using-shared-arrrays
>>
>> Francesc
>>
>>
>>>
>>> (BTW: the function works  correctly about 20% of the time I run it, and
>>> else it segfaults on the simd instruction in the the C function)
>>>
>>> Thanks,
>>> -Øystein
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>
>>
>> --
>> Francesc Alted
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Calling C code that assumes SIMD aligned data.

2016-05-05 Thread Francesc Alted
2016-05-05 11:38 GMT+02:00 Øystein Schønning-Johansen :

> Hi!
>
> I've written a little code of numpy code that does a neural network
> feedforward calculation:
>
> def feedforward(self,x):
> for activation, w, b in zip( self.activations, self.weights,
> self.biases ):
> x = activation( np.dot(w, x) + b)
>
> This works fine when my activation functions are in Python, however I've
> wrapped the activation functions from a C implementation that requires the
> array to be memory aligned. (due to simd instructions in the C
> implementation.) So I need the operation np.dot( w, x) + b to return a
> ndarray where the data pointer is aligned. How can I do that? Is it
> possible at all?
>

Yes.  np.dot() does accept an `out` parameter where you can pass your
aligned array.  The way for testing if numpy is returning you an aligned
array is easy:

In [15]: x = np.arange(6).reshape(2,3)

In [16]: x.ctypes.data % 16
Out[16]: 0

but:

In [17]: x.ctypes.data % 32
Out[17]: 16

so, in this case NumPy returned a 16-byte aligned array which should be
enough for 128 bit SIMD (SSE family).  This kind of alignment is pretty
common in modern computers.  If you need 256 bit (32-byte) alignment then
you will need to build your container manually.  See here for an example:
http://stackoverflow.com/questions/9895787/memory-alignment-for-fast-fft-in-python-using-shared-arrrays

Francesc


>
> (BTW: the function works  correctly about 20% of the time I run it, and
> else it segfaults on the simd instruction in the the C function)
>
> Thanks,
> -Øystein
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 1.0.0 (final) released

2016-04-07 Thread Francesc Alted
=
Announcing bcolz 1.0.0 final
=

What's new
==

Yeah, 1.0.0 is finally here.  We are not introducing any exciting new
feature (just some optimizations and bug fixes), but bcolz is already 6
years old and it implements most of the capabilities that it was
designed for, so I decided to release a 1.0.0 meaning that the format is
declared stable and that people can be assured that future bcolz
releases will be able to read bcolz 1.0 data files (and probably much
earlier ones too) for a long while.  Such a format is fully described
at:

https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst

Also, a 1.0.0 release means that bcolz 1.x series will be based on
C-Blosc 1.x series (https://github.com/Blosc/c-blosc).  After C-Blosc
2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is
expected taking advantage of shiny new features of C-Blosc2 (more
compressors, more filters, native variable length support and the
concept of super-chunks), which should be very beneficial for next bcolz
generation.

Important: this is a final release and there are no important known bugs
there, so this is recommended to be used in production.  Enjoy!

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst

For some comparison between bcolz and other compressed data containers,
see:

https://github.com/FrancescAlted/DataContainersTutorials

specially chapters 3 (in-memory containers) and 4 (on-disk containers).

Also, if it happens that you are in Madrid during this weekend, you can
drop by my tutorial and talk:

http://pydata.org/madrid2016/schedule/

See you!


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) ,
Quantopian
(https://www.quantopian.com/) and Scikit-Allel (
https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.3.1

2016-04-07 Thread Francesc Alted
=
Announcing python-blosc 1.3.1
=

What is new?


This is an important release in terms of stability.  Now, the -O1 flag
for compiling the included C-Blosc sources on Linux.  This represents
slower performance, but fixes the nasty issue #110.  In case maximum
speed is needed, please `compile python-blosc with an external C-Blosc
library <
https://github.com/Blosc/python-blosc#compiling-with-an-installed-blosc-library-recommended
)>`_.

Also, symbols like BLOSC_MAX_BUFFERSIZE have been replaced for allowing
backward compatibility with python-blosc 1.2.x series.

For whetting your appetite, look at some benchmarks here:

https://github.com/Blosc/python-blosc#benchmarking

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor optimized
for binary data.  It has been designed to transmit data to the processor
cache faster than the traditional, non-compressed, direct memory fetch
approach via a memcpy() OS call.  Blosc works well for compressing
numerical arrays that contains data with relatively low entropy, like
sparse data, time series, grids with regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library, with added functions (`compress_ptr()`
and `pack_array()`) for efficiently compressing NumPy arrays, minimizing
the number of memory copies during the process.  python-blosc can be
used to compress in-memory data buffers for transmission to other
machines, persistence or just as a compressed cache.

There is also a handy tool built on top of python-blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a commmand line
interface that allows you to compress large binary datafiles on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Sources repository
==

The sources and documentation are managed through github services at:

http://github.com/Blosc/python-blosc




  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.5.2 released

2016-04-07 Thread Francesc Alted
=
 Announcing Numexpr 2.5.2
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a maintenance release shaking some remaining problems with VML
(it is nice to see how Anaconda VML's support helps raising hidden
issues).  Now conj() and abs() are actually added as VML-powered
functions, preventing the same problems than log10() before (PR #212);
thanks to Tom Kooij.  Upgrading to this release is highly recommended.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 1.0.0 RC2 is out!

2016-03-31 Thread Francesc Alted
==
Announcing bcolz 1.0.0 RC2
==

What's new
==

Yeah, 1.0.0 is finally here.  We are not introducing any exciting new
feature (just some optimizations and bug fixes), but bcolz is already 6
years old and it implements most of the capabilities that it was
designed for, so I decided to release a 1.0.0 meaning that the format is
declared stable and that people can be assured that future bcolz
releases will be able to read bcolz 1.0 data files (and probably much
earlier ones too) for a long while.  Such a format is fully described
at:

https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst

Also, a 1.0.0 release means that bcolz 1.x series will be based on
C-Blosc 1.x series (https://github.com/Blosc/c-blosc).  After C-Blosc
2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is
expected taking advantage of shiny new features of C-Blosc2 (more
compressors, more filters, native variable length support and the
concept of super-chunks), which should be very beneficial for next bcolz
generation.

Important: this is a Release Candidate, so please test it as much as you
can.  If no issues would appear in a week or so, I will proceed to tag
and release 1.0.0 final.  Enjoy!

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html

Installing
==

bcolz is in the PyPI repository, so installing it is easy::

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.5.1 released

2016-03-31 Thread Francesc Alted
=
 Announcing Numexpr 2.5.1
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...) while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

Fixed a critical bug that caused wrong evaluations of log10() and
conj().  These produced wrong results when numexpr was compiled with
Intel's MKL (which is a popular build since Anaconda ships it by
default) and non-contiguous data.  This is considered a *critical* bug
and upgrading is highly recommended. Thanks to Arne de Laat and Tom
Kooij for reporting and providing a test unit.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [ANN] bcolz 1.0.0 RC1 released

2016-03-08 Thread Francesc Alted
==
Announcing bcolz 1.0.0 RC1
==

What's new
==

Yeah, 1.0.0 is finally here.  We are not introducing any exciting new
feature (just some optimizations and bug fixes), but bcolz is already 6
years old and it implements most of the capabilities that it was
designed for, so I decided to release a 1.0.0 meaning that the format is
declared stable and that people can be assured that future bcolz
releases will be able to read bcolz 1.0 data files (and probably much
earlier ones too) for a long while.  Such a format is fully described
at:

https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst

Also, a 1.0.0 release means that bcolz 1.x series will be based on
C-Blosc 1.x series (https://github.com/Blosc/c-blosc).  After C-Blosc
2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is
expected taking advantage of shiny new features of C-Blosc2 (more
compressors, more filters, native variable length support and the
concept of super-chunks), which should be very beneficial for next bcolz
generation.

Important: this is a Release Candidate, so please test it as much as you
can.  If no issues would appear in a week or so, I will proceed to tag
and release 1.0.0 final.  Enjoy!

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html

Installing
==

bcolz is in the PyPI repository, so installing it is easy::

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fwd: Numexpr-3.0 proposal

2016-02-16 Thread Francesc Alted
2016-02-16 10:04 GMT+01:00 Robert McLeod :

> On Mon, Feb 15, 2016 at 10:43 AM, Gregor Thalhammer <
> gregor.thalham...@gmail.com> wrote:
>
>>
>> Dear Robert,
>>
>> thanks for your effort on improving numexpr. Indeed, vectorized math
>> libraries (VML) can give a large boost in performance (~5x), except for a
>> couple of basic operations (add, mul, div), which current compilers are
>> able to vectorize automatically. With recent gcc even more functions are
>> vectorized, see https://sourceware.org/glibc/wiki/libmvec But you need
>> special flags depending on the platform (SSE, AVX present?), runtime
>> detection of processor capabilities would be nice for distributing
>> binaries. Some time ago, since I lost access to Intels MKL, I patched
>> numexpr to use Accelerate/Veclib on os x, which is preinstalled on each
>> mac, see https://github.com/geggo/numexpr.git veclib_support branch.
>>
>> As you increased the opcode size, I could imagine providing a bit to
>> switch (during runtime) between internal functions and vectorized ones,
>> that would be handy for tests and benchmarks.
>>
>
> Dear Gregor,
>
> Your suggestion to separate the opcode signature from the library used to
> execute it is very clever. Based on your suggestion, I think that the
> natural evolution of the opcodes is to specify them by function signature
> and library, using a two-level dict, i.e.
>
> numexpr.interpreter.opcodes['exp_f8f8f8'][gnu] = some_enum
> numexpr.interpreter.opcodes['exp_f8f8f8'][msvc] = some_enum +1
> numexpr.interpreter.opcodes['exp_f8f8f8'][vml] = some_enum + 2
> numexpr.interpreter.opcodes['exp_f8f8f8'][yeppp] = some_enum +3
>

Yes, by using a two level dictionary you can access the functions
implementing opcodes much faster and hence you can add much more opcodes
without too much slow-down.


>
> I want to procedurally generate opcodes.cpp and interpreter_body.cpp.  If
> I do it the way you suggested funccodes.hpp and all the many #define's
> regarding function codes in the interpreter can hopefully be removed and
> hence simplify the overall codebase. One could potentially take it a step
> further and plan (optimize) each expression, similar to what FFTW does with
> regards to matrix shape. That is, the basic way to control the library
> would be with a singleton library argument, i.e.:
>
> result = ne.evaluate( "A*log(foo**2 / bar**2", lib=vml )
>
> However, we could also permit a tuple to be passed in, where each element
> of the tuple reflects the library to use for each operation in the AST tree:
>
> result = ne.evaluate( "A*log(foo**2 / bar**2", lib=(gnu,gnu,gnu,yeppp,gnu)
> )
>
> In this case the ops are (mul,mul,div,log,mul).  The op-code picking is
> done by the Python side, and this tuple could be potentially optimized by
> numexpr rather than hand-optimized, by trying various permutations of the
> linked C math libraries. The wisdom from the planning could be pickled and
> saved in a wisdom file.  Currently Numexpr has cacheDict in util.py but
> there's no reason this can't be pickled and saved to disk. I've done a
> similar thing by creating wrappers for PyFFTW already.
>

I like the idea of various permutations of linked C math libraries to be
probed by numexpr during the initial iteration and then cached somehow.
That will probably require run-time detection of available C math libraries
(think that a numexpr binary will be able to run on different machines with
different libraries and computing capabilities), but in exchange, it will
allow for the fastest execution paths independently of the machine that
runs the code.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.5

2016-02-06 Thread Francesc Alted
=
 Announcing Numexpr 2.5
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

In this version, a lock has been added so that numexpr can be called
from multithreaded apps.  Mind that this does not prevent numexpr
to use multiple cores internally.  Also, a new min() and max()
functions have been added.  Thanks to contributors!

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

2016-01-14 Thread Francesc Alted
 you want.  And as a corollary, (fast) compressors can save you
not only storage, but processing time too.

Francesc


2016-01-14 11:19 GMT+01:00 Nathaniel Smith :

> I'd try storing the data in hdf5 (probably via h5py, which is a more
> basic interface without all the bells-and-whistles that pytables
> adds), though any method you use is going to be limited by the need to
> do a seek before each read. Storing the data on SSD will probably help
> a lot if you can afford it for your data size.
>
> On Thu, Jan 14, 2016 at 1:15 AM, Ryan R. Rosario 
> wrote:
> > Hi,
> >
> > I have a very large dictionary that must be shared across processes and
> does not fit in RAM. I need access to this object to be fast. The key is an
> integer ID and the value is a list containing two elements, both of them
> numpy arrays (one has ints, the other has floats). The key is sequential,
> starts at 0, and there are no gaps, so the “outer” layer of this data
> structure could really just be a list with the key actually being the
> index. The lengths of each pair of arrays may differ across keys.
> >
> > For a visual:
> >
> > {
> > key=0:
> > [
> > numpy.array([1,8,15,…, 16000]),
> > numpy.array([0.1,0.1,0.1,…,0.1])
> > ],
> > key=1:
> > [
> > numpy.array([5,6]),
> > numpy.array([0.5,0.5])
> > ],
> > …
> > }
> >
> > I’ve tried:
> > -   manager proxy objects, but the object was so big that low-level
> code threw an exception due to format and monkey-patching wasn’t successful.
> > -   Redis, which was far too slow due to setting up connections and
> data conversion etc.
> > -   Numpy rec arrays + memory mapping, but there is a restriction
> that the numpy arrays in each “column” must be of fixed and same size.
> > -   I looked at PyTables, which may be a solution, but seems to have
> a very steep learning curve.
> > -   I haven’t tried SQLite3, but I am worried about the time it
> takes to query the DB for a sequential ID, and then translate byte arrays.
> >
> > Any ideas? I greatly appreciate any guidance you can provide.
> >
> > Thanks,
> > Ryan
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance solving system of equations in numpy and MATLAB

2015-12-17 Thread Francesc Alted
2015-12-17 12:00 GMT+01:00 Daπid :

> On 16 December 2015 at 18:59, Francesc Alted  wrote:
>
>> Probably MATLAB is shipping with Intel MKL enabled, which probably is the
>> fastest LAPACK implementation out there.  NumPy supports linking with MKL,
>> and actually Anaconda does that by default, so switching to Anaconda would
>> be a good option for you.
>
>
> A free alternative is OpenBLAS. I am getting 20 s in an i7 Haswell with 8
> cores.
>

Pretty good.  I did not know that OpenBLAS was so close in performance to
MKL.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance solving system of equations in numpy and MATLAB

2015-12-16 Thread Francesc Alted
Sorry, I have to correct myself, as per:
http://docs.continuum.io/mkl-optimizations/index it seems that Anaconda is
not linking with MKL by default (I thought that was the case before?).
After installing MKL (conda install mkl), I am getting:

In [1]: import numpy as np
Vendor:  Continuum Analytics, Inc.
Package: mkl
Message: trial mode expires in 30 days

In [2]: testA = np.random.randn(15000, 15000)

In [3]: testb = np.random.randn(15000)

In [4]: %time testx = np.linalg.solve(testA, testb)
CPU times: user 1min, sys: 468 ms, total: 1min 1s
Wall time: 15.3 s


so, it looks like you will need to buy a MKL license separately (which
makes sense for a commercial product).

Sorry for the confusion.
Francesc


2015-12-16 18:59 GMT+01:00 Francesc Alted :

> Hi,
>
> Probably MATLAB is shipping with Intel MKL enabled, which probably is the
> fastest LAPACK implementation out there.  NumPy supports linking with MKL,
> and actually Anaconda does that by default, so switching to Anaconda would
> be a good option for you.
>
> Here you have what I am getting with Anaconda's NumPy and a machine with 8
> cores:
>
> In [1]: import numpy as np
>
> In [2]: testA = np.random.randn(15000, 15000)
>
> In [3]: testb = np.random.randn(15000)
>
> In [4]: %time testx = np.linalg.solve(testA, testb)
> CPU times: user 5min 36s, sys: 4.94 s, total: 5min 41s
> Wall time: 46.1 s
>
> This is not 20 sec, but it is not 3 min either (but of course that depends
> on your machine).
>
> Francesc
>
> 2015-12-16 18:34 GMT+01:00 Edward Richards :
>
>> I recently did a conceptual experiment to estimate the computational time
>> required to solve an exact expression in contrast to an approximate
>> solution (Helmholtz vs. Helmholtz-Kirchhoff integrals). The exact solution
>> requires a matrix inversion, and in my case the matrix would contain ~15000
>> rows.
>>
>> On my machine MATLAB seems to perform this matrix inversion with random
>> matrices about 9x faster (20 sec vs 3 mins). I thought the performance
>> would be roughly the same because I presume both rely on the same LAPACK
>> solvers.
>>
>> I will not actually need to solve this problem (even at 20 sec it is
>> prohibitive for broadband simulation), but if I needed to I would
>> reluctantly choose MATLAB . I am simply wondering why there is this
>> performance gap, and if there is a better way to solve this problem in
>> numpy?
>>
>> Thank you,
>>
>> Ned
>>
>> #Python version
>>
>> import numpy as np
>>
>> testA = np.random.randn(15000, 15000)
>>
>> testb = np.random.randn(15000)
>>
>> %time testx = np.linalg.solve(testA, testb)
>>
>> %MATLAB version
>>
>> testA = randn(15000);
>>
>> testb = randn(15000, 1);
>> tic(); testx = testA \ testb; toc();
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
> Francesc Alted
>



-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] performance solving system of equations in numpy and MATLAB

2015-12-16 Thread Francesc Alted
Hi,

Probably MATLAB is shipping with Intel MKL enabled, which probably is the
fastest LAPACK implementation out there.  NumPy supports linking with MKL,
and actually Anaconda does that by default, so switching to Anaconda would
be a good option for you.

Here you have what I am getting with Anaconda's NumPy and a machine with 8
cores:

In [1]: import numpy as np

In [2]: testA = np.random.randn(15000, 15000)

In [3]: testb = np.random.randn(15000)

In [4]: %time testx = np.linalg.solve(testA, testb)
CPU times: user 5min 36s, sys: 4.94 s, total: 5min 41s
Wall time: 46.1 s

This is not 20 sec, but it is not 3 min either (but of course that depends
on your machine).

Francesc

2015-12-16 18:34 GMT+01:00 Edward Richards :

> I recently did a conceptual experiment to estimate the computational time
> required to solve an exact expression in contrast to an approximate
> solution (Helmholtz vs. Helmholtz-Kirchhoff integrals). The exact solution
> requires a matrix inversion, and in my case the matrix would contain ~15000
> rows.
>
> On my machine MATLAB seems to perform this matrix inversion with random
> matrices about 9x faster (20 sec vs 3 mins). I thought the performance
> would be roughly the same because I presume both rely on the same LAPACK
> solvers.
>
> I will not actually need to solve this problem (even at 20 sec it is
> prohibitive for broadband simulation), but if I needed to I would
> reluctantly choose MATLAB . I am simply wondering why there is this
> performance gap, and if there is a better way to solve this problem in
> numpy?
>
> Thank you,
>
> Ned
>
> #Python version
>
> import numpy as np
>
> testA = np.random.randn(15000, 15000)
>
> testb = np.random.randn(15000)
>
> %time testx = np.linalg.solve(testA, testb)
>
> %MATLAB version
>
> testA = randn(15000);
>
> testb = randn(15000, 1);
> tic(); testx = testA \ testb; toc();
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 0.12.0 released

2015-11-16 Thread Francesc Alted
===
Announcing bcolz 0.12.0
===

What's new
==

This release copes with some compatibility issues with NumPy 1.10.
Also, several improvements have happened in the installation procedure,
allowing for a smoother process.  Last but not least, the tutorials
haven been migrated to the IPython notebook format (a huge thank you to
Francesc Elies for this!).  This will hopefully will allow users to
better exercise the different features of bcolz.

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html

Installing
==

bcolz is in the PyPI repository, so installing it is easy::

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.4.6 released

2015-11-02 Thread Francesc Alted
Hi,

This is a quick release fixing some reported problems in the 2.4.5 version
that I announced a few hours ago.  Hope I have fixed the main issues now.
Now, the official announcement:

=
 Announcing Numexpr 2.4.6
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a quick maintenance version that offers better handling of
MSVC symbols (#168, Francesc Alted), as well as fising some
UserWarnings in Solaris (#189, Graham Jones).

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.4.5 released

2015-11-02 Thread Francesc Alted
=
 Announcing Numexpr 2.4.5
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a maintenance release where an important bug in multithreading
code has been fixed (#185 Benedikt Reinartz, Francesc Alted).  Also,
many harmless warnings (overflow/underflow, divide by zero and others)
in the test suite have been silenced  (#183, Francesc Alted).

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 0.11.3 released!

2015-10-05 Thread Francesc Alted
===
Announcing bcolz 0.11.3
===

What's new
==

Implemented new feature (#255): bcolz.zeros() can create new ctables
too, either empty or filled with zeros. (#256 @FrancescElies
@FrancescAlted).

Also, in previous, non announced versions (0.11.1 and 0.11.2), new
dependencies were added and other fixes are there too.

For a more detailed change log, see:

https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html

Installing
==

bcolz is in the PyPI repository, so installing it is easy::

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Governance model request

2015-09-23 Thread Francesc Alted
famous and powerful as Travis may be,
> he's still our colleague, a member of our community, and *a human being*,
> so let's remember that as well...
>
>
> 2. Conflicts of interest are a fact of life, in fact, I would argue that
> every healthy and sufficiently interconnected community eventually *should*
> have conflicts of interest. They are a sign that there is activity across
> multiple centers of interest, and individuals with connections in multiple
> areas of the community.  And we *want* folks who are engaged enough
> precisely to have such interests!
>
> For conflict of interest management, we don't need to reinvent the wheel,
> this is actually something where our beloved institutions, blessed be their
> bureaucratic souls, have tons of training materials that happen to be not
> completely useless.  Most universities and the national labs have
> information on COIs that provides guidelines, and Numpy could include in
> its governance model more explicit language about COIs if desired.
>
> So, the issue is not to view COIs as something evil or undesirable, but
> rather as the very real consequence of operating in an interconnected set
> of institutions.  And once you take that stance, you deal with that
> rationally and realistically.
>
> For example, you accept that companies aren't the only ones with potential
> COIs: *all* entities have them. As Ryan May aptly pointed out, the notion
> that academic institutions are somehow immune to hidden agendas or other
> interests is naive at best... And I say that as someone who has happily
> stayed in academia, resisting multiple overtures from industry over the
> years, but not out of some quaint notion that academia is a pristine haven
> of principled purity. Quite the opposite: in building large and complex
> projects, I've seen painfully close how the university/government research
> world has its own flavor of the same power, financial and political
> ugliness that we attribute to the commercial side.
>
>
> 3. Commercial actors.  Following up on the last paragraph, we should
> accept that *all* institutions have agendas, not just companies.  We live
> in a world with companies, and I think it's naive to take a knee-jerk
> anti-commercial stance: our community has had a productive and successful
> history of interaction with industry in the past, and hopefully that will
> continue in the future.
>
> What is true, however, is that community projects should maintain the
> "seat of power" in the community, and *not* in any single company.  In
> fact, this is important even to ensure that many companies feel comfortable
> engaging the projects, precisely so they know that the technology is driven
> in an open and neutral way even if some of their competitors participate.
>
> That's why a governance model that is anchored in neutral ground is so
> important.  We've worked hard to make Numfocus the legal entity that can
> play that role (that's why it's a 501(c)3), and that's why we've framed our
> governance model for Jupyter in a way that makes all the institutions
> (including Berkeley and Cal Poly) simply 'partners' that contribute by
> virtue of supporting employees.  But the owners of the decisions are the
> *individuals* who do the work and form the community, not the
> companies/institutions.
>
>
> If we accept these premises, then hopefully we can have a rational
> conversation about how to build a community, where at any point in time,
> any of us should be judged on the merit of our actions, not the
> hypotheticals of our intentions or our affiliations (commercial,
> government, academic, etc).
>
>
> Sorry for the long wall of text, I rarely post on this list anymore.  But
> I was saddened to see the turn of this thread, and I hope I can contribute
> some perspective (and not make things worse :)
>
>
> Cheers,
>
> --
> Fernando Perez (@fperez_org; http://fperez.org)
> fperez.net-at-gmail: mailing lists only (I ignore this when swamped!)
> fernando.perez-at-berkeley: contact me here for any direct mail
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.2.8 released

2015-09-18 Thread Francesc Alted
=
Announcing python-blosc 1.2.8
=

What is new?


This is a maintenance release.  Internal C-Blosc has been upgraded to
1.7.0 (although new bitshuffle support has not been made public, as it
seems not ready for production yet).

Also, there is support for bytes-like objects that support the buffer
interface as input to ``compress`` and ``decompress``. On Python 2.x
this includes unicode, on Python 3.x it doesn't.  Thanks to Valentin
Haenel.

Finally, a memory leak in ``decompress```has been hunted and fixed.  And
new tests have been added to catch possible leaks in the future.  Thanks
to Santi Villalba.

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/blob/master/RELEASE_NOTES.rst

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc is the first compressor that is meant not only to reduce the size
of large datasets on-disk or in-memory, but also to accelerate object
manipulations that are memory-bound
(http://www.blosc.org/docs/StarvingCPUs.pdf).  See
http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on
how much speed it can achieve in some datasets.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library.

There is also a handy tool built on Blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a commmand line
interface that allows you to compress large binary datafiles on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you must omit the 'python-' prefix


Download sources


The sources are managed through github services at:

http://github.com/Blosc/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://python-blosc.blosc.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/Blosc/python-blosc/blob/master/LICENSES

for more details.



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: Numexpr 2.4.4 is out

2015-09-18 Thread Francesc Alted
=
 Announcing Numexpr 2.4.4
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a maintenance release which contains several bug fixes, like
better testing on Python3 platform and some harmless data race.  Among
the enhancements, AppVeyor support is here and OMP_NUM_THREADS is
honored as a fallback in case NUMEXPR_NUM_THREADS is not set.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst


Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.

Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 0.11.0 released

2015-09-09 Thread Francesc Alted
===
Announcing bcolz 0.11.0
===

What's new
==

Although this is mostly a maintenance release that fixes some bugs, the
setup.py is entirely based now in setuptools and has been greatly
modernized to use a new versioning system.  Just this deserves a bump in
the minor version.  Thanks to Gabi Davar (@mindw) for such a nice
improvement.

Also, many improvements to the Continuous Integration part (and hence
not directly visible to users) and others have been made by Francesc
Elies (@FrancescElies).  Thanks for his quiet but effective work.

And last but not least, I would like to announce that Valentin Haenel
(@esc) just stepped down as release manager.  Thanks Valentin for all
the hard work that you put in making bcolz a better piece of software!


What it is
==

*bcolz* provides columnar and compressed data containers that can live
either on-disk or in-memory.  Column storage allows for efficiently
querying tables with a large number of columns.  It also allows for
cheap addition and removal of column.  In addition, bcolz objects are
compressed by default for reducing memory/disk I/O needs. The
compression process is carried out internally by Blosc, an
extremely fast meta-compressor that is optimized for binary data. Lastly,
high-performance iterators (like ``iter()``, ``where()``) for querying
the objects are provided.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, since the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.

Together, bcolz and the Blosc compressor, are finally fulfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the
Blaze project (http://blaze.pydata.org/), Quantopian
(https://www.quantopian.com/) and Scikit-Allel
(https://github.com/cggh/scikit-allel) which you can read more about by
pointing your browser at the links below.

* Visualfabriq:

  * *bquery*, A query and aggregation framework for Bcolz:
  * https://github.com/visualfabriq/bquery

* Blaze:

  * Notebooks showing Blaze + Pandas + BColz interaction:
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb
  *
http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb

* Quantopian:

  * Using compressed data containers for faster backtesting at scale:
  * https://quantopian.github.io/talks/NeedForSpeed/slides.html

* Scikit-Allel

  * Provides an alternative backend to work with compressed arrays
  * https://scikit-allel.readthedocs.org/en/latest/bcolz.html

Installing
==

bcolz is in the PyPI repository, so installing it is easy::

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt

Release notes can be found in the Git repository:
https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comments on governance proposal (was: Notes from the numpy dev meeting at scipy 2015)

2015-08-28 Thread Francesc Alted
e of just about any Python package
> that has to add two numbers together shouldn't be too hard, especially
> seeing success stories like Jupyter's, who I believe has several paid
> developers working full time.  That requires formalizing governance,
> because apparently sponsors are a little wary of giving money to "people on
> the internet". ;-)  Fernando Pérez was extremely emphatic about the size of
> the opportunity NumPy was letting slip by not formalizing *any* governance
> model.  And it is a necessary first step so that e.g. we have the money to,
> say a year from now, get the right people together for a couple of days to
> figure out a better governance model.  I'd argue that money would be better
> spent financing a talented developer to advance e.g. Nathaniel's new dtype
> system to end all dtype systems, but that's a different story.
>
> Largely because of the above, even if Nathaniel's document involved
> tossing a coin to resolve disputes, I'd rather have that now than something
> much better never. Because there really is no alternative to Nathaniel's
> write-up of the status quo, other than the status quo without a write-up:
> it has taken him two months to put this draft together, **after** we agreed
> over several hours of face to face discussion on what the model should be.
> And I'm sure he has hated every minute he has had to put into it.  So if we
> keep going around this in circles, after a few days we will all grow tired
> and go back to fighting over whether indexing should transpose subspaces or
> not, and all that other cool stuff we really enjoy. And a year from now we
> will be in the same place we are now, only a year older and deeper in
> (technical) debt.
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
> de dominación mundial.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] UTC-based datetime64

2015-08-26 Thread Francesc Alted
Hi,

We've found that NumPy uses the local TZ for printing datetime64 timestamps:

In [22]: t = datetime.utcnow()

In [23]: print t
2015-08-26 11:52:10.662745

In [24]: np.array([t], dtype="datetime64[s]")
Out[24]: array(['2015-08-26T13:52:10+0200'], dtype='datetime64[s]')

Googling for a way to print UTC out of the box, the best thing I could find
is:

In [40]: [str(i.item()) for i in np.array([t], dtype="datetime64[s]")]
Out[40]: ['2015-08-26 11:52:10']

Now, is there a better way to specify that I want the datetimes printed
always in UTC?

Thanks,
-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

2015-08-26 Thread Francesc Alted
Hi,

Thanks Nathaniel and others for sparking this discussion as I think it is
very timely.

2015-08-25 12:03 GMT+02:00 Nathaniel Smith :

>   Let's focus on evolving numpy as far as we can without major
>   break-the-world changes (no "numpy 2.0", at least in the foreseeable
>   future).
>
>   And, as a target for that evolution, let's change our focus from
>   numpy as "NumPy is the library that gives you the np.ndarray object
>   (plus some attached infrastructure)", to "NumPy provides the
>   standard framework for working with arrays and array-like objects in
>   Python"
>

Sorry to disagree here, but in my opinion NumPy *already* provides the
standard framework for working with arrays and array-like objects in Python
as its huge popularity shows.  If what you mean is that there are too many
efforts trying to provide other, specialized data containers (things like
DataFrame in pandas, DataArray/Dataset in xarray or carray/ctable in bcolz
just to mention a few), then let me say that I am of the opinion that there
can't be a silver bullet for tackling all the problems that the PyData
community is facing.

The libraries using specialized data containers (pandas, xray, bcolz...)
may have more or less machinery on top of them so that conversion to NumPy
not necessarily happens internally (many times we don't want conversions
for efficiency), but it is the capability of producing NumPy arrays out of
them (or parts of them) what makes these specialized containers to be
incredible more useful to users because they can use NumPy to fill the
missing gaps, or just use NumPy as an intermediate container that acts as
input for other libraries.

On the subject on why I don't think a universal data container is feasible
for PyData, you just have to have a look at how many data structures Python
is providing in the language itself (tuples, lists, dicts, sets...), and
how many are added in the standard library (like those in the collections
sub-package).  Every data container is designed to do a couple of things
(maybe three) well, but for other use cases it is the responsibility of the
user to choose the more appropriate depending on her needs.  In the same
vein, I also think that it makes little sense to try to come with a
standard solution that is going to satisfy everyone's need.  IMHO, and
despite all efforts, neither NumPy,  NumPy 2.0, DyND, bcolz or any other is
going to offer the universal data container.

Instead of that, let me summarize what users/developers like me need from
NumPy for continue creating more specialized data containers:

1) Keep NumPy simple. NumPy is the truly cornerstone of PyData right now,
and it will be for the foreseeable future, so please keep it usable and
*minimal*.  Before adding any more feature the increase in complexity
should carefully weighted.

2) Make NumPy more flexible. Any rewrite that allows arrays or dtypes to be
subclassed and extended more easily will be a huge win.  *But* if in order
to allow flexibility you have to make NumPy much more complex, then point
1) should prevail.

3) Make of NumPy a sustainable project. Historically NumPy depended on
heroic efforts of individuals to make it what it is now: *an industry
standard*.  But individual efforts, while laudable, are not enough, so
please, please, please continue the effort of constituting a governance
team that ensures the future of NumPy (and with it, the whole PyData
community).

Finally, the question on whether NumPy 2.0 or projects like DyND should be
chosen instead for implementing new features is still legitimate, and while
I have my own opinions (favourable to DyND), I still see (such is the price
of technological debt) a distant future where we will find NumPy as we know
it, allowing more innovation to happen in Python Data space.

Again, thanks to all those braves that are allowing others to build on top
of NumPy's shoulders.

--
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about unaligned access

2015-07-06 Thread Francesc Alted
2015-07-06 18:04 GMT+02:00 Jaime Fernández del Río :

> On Mon, Jul 6, 2015 at 10:18 AM, Francesc Alted  wrote:
>
>> Hi,
>>
>> I have stumbled into this:
>>
>> In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)),
>> dtype=[('f0', np.int64), ('f1', np.int32)])
>>
>> In [63]: %timeit sa['f0'].sum()
>> 100 loops, best of 3: 4.52 ms per loop
>>
>> In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)),
>> dtype=[('f0', np.int64), ('f1', np.int64)])
>>
>> In [65]: %timeit sa['f0'].sum()
>> 1000 loops, best of 3: 896 µs per loop
>>
>> The first structured array is made of 12-byte records, while the second
>> is made by 16-byte records, but the latter performs 5x faster.  Also, using
>> an structured array that is made of 8-byte records is the fastest
>> (expected):
>>
>> In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)), dtype=[('f0',
>> np.int64)])
>>
>> In [67]: %timeit sa['f0'].sum()
>> 1000 loops, best of 3: 567 µs per loop
>>
>> Now, my laptop has a Ivy Bridge processor (i5-3380M) that should perform
>> quite well on unaligned data:
>>
>>
>> http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
>>
>> So, if 4 years-old Intel architectures do not have a penalty for
>> unaligned access, why I am seeing that in NumPy?  That strikes like a quite
>> strange thing to me.
>>
>
> I believe that the way numpy is setup, it never does unaligned access,
> regardless of the platform, in case it gets run on one that would go up in
> flames if you tried to. So my guess would be that you are seeing chunked
> copies into a buffer, as opposed to bulk copying or no copying at all, and
> that would explain your timing differences. But Julian or Sebastian can
> probably give you a more informed answer.
>

Yes, my guess is that you are right.  I suppose that it is possible to
improve the numpy codebase to accelerate this particular access pattern on
Intel platforms, but provided that structured arrays are not that used
(pandas is probably leading this use case by far, and as far as I know,
they are not using structured arrays internally in DataFrames), then maybe
it is not worth to worry about this too much.

Thanks anyway,
Francesc


>
> Jaime
>
>
>>
>> Thanks,
>> Francesc
>>
>> --
>> Francesc Alted
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
> de dominación mundial.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about unaligned access

2015-07-06 Thread Francesc Alted
Oops, forgot to mention my NumPy version:

In [72]: np.__version__
Out[72]: '1.9.2'

Francesc

2015-07-06 17:18 GMT+02:00 Francesc Alted :

> Hi,
>
> I have stumbled into this:
>
> In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
> np.int64), ('f1', np.int32)])
>
> In [63]: %timeit sa['f0'].sum()
> 100 loops, best of 3: 4.52 ms per loop
>
> In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
> np.int64), ('f1', np.int64)])
>
> In [65]: %timeit sa['f0'].sum()
> 1000 loops, best of 3: 896 µs per loop
>
> The first structured array is made of 12-byte records, while the second is
> made by 16-byte records, but the latter performs 5x faster.  Also, using an
> structured array that is made of 8-byte records is the fastest (expected):
>
> In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)), dtype=[('f0',
> np.int64)])
>
> In [67]: %timeit sa['f0'].sum()
> 1000 loops, best of 3: 567 µs per loop
>
> Now, my laptop has a Ivy Bridge processor (i5-3380M) that should perform
> quite well on unaligned data:
>
>
> http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
>
> So, if 4 years-old Intel architectures do not have a penalty for unaligned
> access, why I am seeing that in NumPy?  That strikes like a quite strange
> thing to me.
>
> Thanks,
> Francesc
>
> --
> Francesc Alted
>



-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Question about unaligned access

2015-07-06 Thread Francesc Alted
Hi,

I have stumbled into this:

In [62]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
np.int64), ('f1', np.int32)])

In [63]: %timeit sa['f0'].sum()
100 loops, best of 3: 4.52 ms per loop

In [64]: sa = np.fromiter(((i,i) for i in range(1000*1000)), dtype=[('f0',
np.int64), ('f1', np.int64)])

In [65]: %timeit sa['f0'].sum()
1000 loops, best of 3: 896 µs per loop

The first structured array is made of 12-byte records, while the second is
made by 16-byte records, but the latter performs 5x faster.  Also, using an
structured array that is made of 8-byte records is the fastest (expected):

In [66]: sa = np.fromiter(((i,) for i in range(1000*1000)), dtype=[('f0',
np.int64)])

In [67]: %timeit sa['f0'].sum()
1000 loops, best of 3: 567 µs per loop

Now, my laptop has a Ivy Bridge processor (i5-3380M) that should perform
quite well on unaligned data:

http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/

So, if 4 years-old Intel architectures do not have a penalty for unaligned
access, why I am seeing that in NumPy?  That strikes like a quite strange
thing to me.

Thanks,
Francesc

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: PyTables 3.2.0 (final) released!

2015-05-06 Thread Francesc Alted
===
 Announcing PyTables 3.2.0
===

We are happy to announce PyTables 3.2.0.

***
IMPORTANT NOTICE:

If you are a user of PyTables, it needs your help to keep going.  Please
read the next thread as it contains important information about the
future (or the lack of it) of the project:

https://groups.google.com/forum/#!topic/pytables-users/yY2aUa4H7W4

Thanks!
***


What's new
==

This is a major release of PyTables and it is the result of more than a
year of accumulated patches, but most specially it fixes a couple of
nasty problem with indexed queries not returning the correct results in
some scenarios.  There are many usablity and performance improvements
too.

In case you want to know more in detail what has changed in this
version, please refer to: http://www.pytables.org/release_notes.html

You can install it via pip or download a source package with generated
PDF and HTML docs from:
http://sourceforge.net/projects/pytables/files/pytables/3.2.0

For an online version of the manual, visit:
http://www.pytables.org/usersguide/index.html


What it is?
===

PyTables is a library for managing hierarchical datasets and
designed to efficiently cope with extremely large amounts of data with
support for full 64-bit file addressing.  PyTables runs on top of
the HDF5 library and NumPy package for achieving maximum throughput and
convenient use.  PyTables includes OPSI, a new indexing technology,
allowing to perform data lookups in tables exceeding 10 gigarows
(10**10 rows) in less than a tenth of a second.


Resources
=

About PyTables: http://www.pytables.org

About the HDF5 library: http://hdfgroup.org/HDF5/

About NumPy: http://numpy.scipy.org/


Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy makers.
Without them, PyTables simply would not exist.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.




  **Enjoy data!**

  -- The PyTables Developers
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.2.7 released

2015-05-06 Thread Francesc Alted
=
Announcing python-blosc 1.2.7
=

What is new?


Updated to use c-blosc v1.6.1.  Although that this supports AVX2, it is
not enabled in python-blosc because we still need a way to devise how to
detect AVX2 in the underlying platform.

At any rate, c-blosc 1.6.1 fixed an important bug in the blosclz codec that
a release was deemed important.

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc is the first compressor that is meant not only to reduce the size
of large datasets on-disk or in-memory, but also to accelerate object
manipulations that are memory-bound
(http://www.blosc.org/docs/StarvingCPUs.pdf).  See
http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on
how much speed it can achieve in some datasets.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library.

There is also a handy tool built on Blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a commmand line
interface that allows you to compress large binary datafiles on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the python- prefix


Download sources


The sources are managed through github services at:

http://github.com/Blosc/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://python-blosc.blosc.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/Blosc/python-blosc/blob/master/LICENSES

for more details.



  **Enjoy data!**

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: PyTables 3.2.0 RC2 is out

2015-05-01 Thread Francesc Alted
===
 Announcing PyTables 3.2.0rc2
===

We are happy to announce PyTables 3.2.0rc2.

***
IMPORTANT NOTICE:

If you are a user of PyTables, it needs your help to keep going.  Please
read the next thread as it contains important information about the
future (or lack of it) of the project:

https://groups.google.com/forum/#!topic/pytables-users/yY2aUa4H7W4

Thanks!
***


What's new
==

This is a major release of PyTables and it is the result of more than a
year of accumulated patches, but most specially it fixes a couple of
nasty problem with indexed queries not returning the correct results in
some scenarios (mainly pandas users).  There are many usability and
performance improvements too.

In case you want to know more in detail what has changed in this
version, please refer to: http://www.pytables.org/release_notes.html

You can install it via pip or download a source package with generated
PDF and HTML docs from:
http://sourceforge.net/projects/pytables/files/pytables/3.2.0rc2

For an online version of the manual, visit:
http://www.pytables.org/usersguide/index.html


What it is?
===

PyTables is a library for managing hierarchical datasets and
designed to efficiently cope with extremely large amounts of data with
support for full 64-bit file addressing.  PyTables runs on top of
the HDF5 library and NumPy package for achieving maximum throughput and
convenient use.  PyTables includes OPSI, a new indexing technology,
allowing to perform data lookups in tables exceeding 10 gigarows
(10**10 rows) in less than a tenth of a second.


Resources
=

About PyTables: http://www.pytables.org

About the HDF5 library: http://hdfgroup.org/HDF5/

About NumPy: http://numpy.scipy.org/


Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy makers.
Without them, PyTables simply would not exist.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.




  **Enjoy data!**

  -- The PyTables Developers
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: numexpr 2.4.3 released

2015-04-28 Thread Francesc Alted
2015-04-28 4:59 GMT+02:00 Neil Girdhar :

> I don't think I'm asking for so much.  Somewhere inside numexpr it builds
> an AST of its own, which it converts into the optimized code.   It would be
> more useful to me if that AST were in the same format as the one returned
> by Python's ast module.  This way, I could glue in the bits of numexpr that
> I like with my code.  For my purpose, this would have been the more ideal
> design.
>

I don't think implementing this for numexpr would be that complex. So for
example, one could add a new numexpr.eval_ast(ast_expr) function.  Pull
requests are welcome.

At any rate, which is your use case?  I am curious.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.4.3 released

2015-04-27 Thread Francesc Alted
 Announcing Numexpr 2.4.3
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

This is a maintenance release to cope with an old bug affecting
comparisons with empty strings.  Fixes #121 and PyTables #184.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: PyTables 3.2.0 release candidate 1 is out

2015-04-21 Thread Francesc Alted
===
 Announcing PyTables 3.2.0rc1
===

We are happy to announce PyTables 3.2.0rc1.

***
IMPORTANT NOTICE:

If you are a user of PyTables, it needs your help to keep going.  Please
read the next thread as it contains important information about the
future of the project:

https://groups.google.com/forum/#!topic/pytables-users/yY2aUa4H7W4

Thanks!
***


What's new
==

This is a major release of PyTables and it is the result of more than a
year of accumulated patches, but most specially it fixes a nasty problem
with indexed queries not returning the correct results in some
scenarios.  There are many usablity and performance improvements too.

In case you want to know more in detail what has changed in this
version, please refer to: http://pytables.github.io/release_notes.html

You can download a source package with generated PDF and HTML docs, as
well as binaries for Windows, from:
http://sourceforge.net/projects/pytables/files/pytables/3.2.0rc1

For an online version of the manual, visit:
http://pytables.github.io/usersguide/index.html


What it is?
===

PyTables is a library for managing hierarchical datasets and
designed to efficiently cope with extremely large amounts of data with
support for full 64-bit file addressing.  PyTables runs on top of
the HDF5 library and NumPy package for achieving maximum throughput and
convenient use.  PyTables includes OPSI, a new indexing technology,
allowing to perform data lookups in tables exceeding 10 gigarows
(10**10 rows) in less than a tenth of a second.


Resources
=

About PyTables: http://www.pytables.org

About the HDF5 library: http://hdfgroup.org/HDF5/

About NumPy: http://numpy.scipy.org/


Acknowledgments
===

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy makers.
Without them, PyTables simply would not exist.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.




  **Enjoy data!**

  -- The PyTables Developers
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.4.1 released

2015-04-14 Thread Francesc Alted
=
 Announcing Numexpr 2.4.1
=

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

In this version there is improved support for newer MKL library as well
as other minor improvements.  This version is meant for production.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Introductory mail and GSoc Project "Vector math library integration"

2015-03-11 Thread Francesc Alted
2015-03-08 21:47 GMT+01:00 Dp Docs :

> Hi all,
> I am a CS 3rd Undergrad. Student from an Indian Institute (III T). I
> believe I am good in Programming languages like C/C++, Python as I
> have already done Some Projects using these language as a part of my
> academics. I really like Coding (Competitive as well as development).
> I really want to get involved in Numpy Development Project and want to
> take  "Vector math library integration" as a part of my project. I
> want to here any idea from your side for this project.
> Thanks For your time for reading this email and responding back.
>

As Sturla and Gregor suggested, there are quite a few attempts to solve
this shortcoming in NumPy.  In particular Gregor integrated MKL/VML support
in numexpr quite a long time ago, and when combined with my own
implementation of pooled threads (behaving better than Intel's
implementation in VML), then the thing literally flies:

 https://github.com/pydata/numexpr/wiki/NumexprMKL

numba is also another interesting option and it shows much better compiling
times than the integrated compiler in numexpr.  You can see a quick
comparison about expected performances between numexpr and numba:

http://nbviewer.ipython.org/gist/anonymous/4117896

In general, numba wins for small arrays, but numexpr can achieve very good
performance for larger ones.  I think there are interesting things to
discover in both projects, as for example, how they manage memory in order
to avoid temporaries or how they deal with unaligned data efficiently.  I
would advise to look at existing docs and presentations explaining things
in more detail too.

All in all, I would really love to see such a vector math library support
integrated in NumPy because frankly, I don't have bandwidth for maintaining
numexpr anymore (and I am afraid that nobody else would jump in this ship
;).

Good luck!

Francesc


>
> My IRCnickname: dp
>
> Real Name: Durgesh Pandey.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Vectorizing computation

2015-02-13 Thread Francesc Alted
2015-02-13 13:25 GMT+01:00 Julian Taylor :

> On 02/13/2015 01:03 PM, Francesc Alted wrote:
> > 2015-02-13 12:51 GMT+01:00 Julian Taylor  > <mailto:jtaylor.deb...@googlemail.com>>:
> >
> > On 02/13/2015 11:51 AM, Francesc Alted wrote:
> > > Hi,
> > >
> > > I would like to vectorize the next computation:
> > >
> > > nx, ny, nz = 720, 180, 3
> > > outheight = np.arange(nz) * 3
> > > oro = np.arange(nx * ny).reshape((nx, ny))
> > >
> > > def compute1(outheight, oro):
> > > result = np.zeros((nx, ny, nz))
> > > for ix in range(nx):
> > > for iz in range(nz):
> > > result[ix, :, iz] = outheight[iz] + oro[ix, :]
> > > return result
> > >
> > > I think this should be possible by using an advanced use of
> > broadcasting
> > > in numpy.  Anyone willing to post a solution?
> >
> >
> > result = outheight + oro.reshape(nx, ny, 1)
> >
> >
> > And 4x faster for my case.  Oh my, I am afraid that my mind will never
> > scratch all the amazing possibilities that broadcasting is offering :)
> >
> > Thank you very much for such an elegant solution!
> >
>
>
> if speed is a concern this is faster as it has a better data layout for
> numpy during the computation, but the result may be worse layed out for
> further processing
>
> result = outheight.reshape(nz, 1, 1) + oro
> return np.rollaxis(result, 0, 3)
>
>
Holly cow, this makes for another 4x speed improvement!  I don't think I
need that much in my scenario, so I will stick with the first one (more
readable and the expected data layout), but thanks a lot!

Francesc
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Vectorizing computation

2015-02-13 Thread Francesc Alted
2015-02-13 12:51 GMT+01:00 Julian Taylor :

> On 02/13/2015 11:51 AM, Francesc Alted wrote:
> > Hi,
> >
> > I would like to vectorize the next computation:
> >
> > nx, ny, nz = 720, 180, 3
> > outheight = np.arange(nz) * 3
> > oro = np.arange(nx * ny).reshape((nx, ny))
> >
> > def compute1(outheight, oro):
> > result = np.zeros((nx, ny, nz))
> > for ix in range(nx):
> > for iz in range(nz):
> > result[ix, :, iz] = outheight[iz] + oro[ix, :]
> > return result
> >
> > I think this should be possible by using an advanced use of broadcasting
> > in numpy.  Anyone willing to post a solution?
>
>
> result = outheight + oro.reshape(nx, ny, 1)
>
>
And 4x faster for my case.  Oh my, I am afraid that my mind will never
scratch all the amazing possibilities that broadcasting is offering :)

Thank you very much for such an elegant solution!

Francesc
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Vectorizing computation

2015-02-13 Thread Francesc Alted
Hi,

I would like to vectorize the next computation:

nx, ny, nz = 720, 180, 3
outheight = np.arange(nz) * 3
oro = np.arange(nx * ny).reshape((nx, ny))

def compute1(outheight, oro):
result = np.zeros((nx, ny, nz))
for ix in range(nx):
for iz in range(nz):
result[ix, :, iz] = outheight[iz] + oro[ix, :]
return result

I think this should be possible by using an advanced use of broadcasting in
numpy.  Anyone willing to post a solution?

Thanks,
-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 0.7.1 released

2014-07-30 Thread Francesc Alted
==
Announcing bcolz 0.7.1
==

What's new
==

This is maintenance release, where bcolz got rid of the nose dependency
for Python 2.6 (only unittest2 should be required).  Also, some small
fixes for the test suite, specially in 32-bit has been done.  Thanks to
Ilan Schnell for pointing out the problems and for suggesting fixes.

``bcolz`` is a renaming of the ``carray`` project.  The new goals for
the project are to create simple, yet flexible compressed containers,
that can live either on-disk or in-memory, and with some
high-performance iterators (like `iter()`, `where()`) for querying them.

Together, bcolz and the Blosc compressor, are finally fullfilling the
promise of accelerating memory I/O, at least for some real scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

For more detailed info, see the release notes in:
https://github.com/Blosc/bcolz/wiki/Release-Notes


What it is
==

bcolz provides columnar and compressed data containers.  Column storage
allows for efficiently querying tables with a large number of columns.
It also allows for cheap addition and removal of column.  In addition,
bcolz objects are compressed by default for reducing memory/disk I/O
needs.  The compression process is carried out internally by Blosc, a
high-performance compressor that is optimized for binary data.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.


Installing
==

bcolz is in the PyPI repository, so installing it is easy:

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt




   **Enjoy data!**

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: bcolz 0.7.0 released

2014-07-22 Thread Francesc Alted
==
Announcing bcolz 0.7.0
==

What's new
==

In this release, support for Python 3 has been added, Pandas and
HDF5/PyTables conversion, support for different compressors via latest
release of Blosc, and a new `iterblocks()` iterator.

Also, intensive benchmarking has lead to an important tuning of buffer
sizes parameters so that compression and evaluation goes faster than
ever.  Together, bcolz and the Blosc compressor, are finally fullfilling
the promise of accelerating memory I/O, at least for some real
scenarios:

http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots

``bcolz`` is a renaming of the ``carray`` project.  The new goals for
the project are to create simple, yet flexible compressed containers,
that can live either on-disk or in-memory, and with some
high-performance iterators (like `iter()`, `where()`) for querying them.

For more detailed info, see the release notes in:
https://github.com/Blosc/bcolz/wiki/Release-Notes


What it is
==

bcolz provides columnar and compressed data containers.  Column storage
allows for efficiently querying tables with a large number of columns.
It also allows for cheap addition and removal of column.  In addition,
bcolz objects are compressed by default for reducing memory/disk I/O
needs.  The compression process is carried out internally by Blosc, a
high-performance compressor that is optimized for binary data.

bcolz can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too).
numexpr optimizes the memory usage and use several cores for doing the
computations, so it is blazing fast.  Moreover, the carray/ctable
containers can be disk-based, and it is possible to use them for
seamlessly performing out-of-memory computations.

bcolz has minimal dependencies (NumPy), comes with an exhaustive test
suite and fully supports both 32-bit and 64-bit platforms.  Also, it is
typically tested on both UNIX and Windows operating systems.


Installing
==

bcolz is in the PyPI repository, so installing it is easy:

$ pip install -U bcolz


Resources
=

Visit the main bcolz site repository at:
http://github.com/Blosc/bcolz

Manual:
http://bcolz.blosc.org

Home of Blosc compressor:
http://blosc.org

User's mail list:
bc...@googlegroups.com
http://groups.google.com/group/bcolz

License is the new BSD:
https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt




   **Enjoy data!**

-- Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [CORRECTION] python-blosc 1.2.4 released (Was: ANN: python-blosc 1.2.7 released)

2014-07-07 Thread Francesc Alted
Indeed it was 1.2.4 the version just released and not 1.2.7.  Sorry for 
the typo!

Francesc

On 7/7/14, 8:20 PM, Francesc Alted wrote:
> =
> Announcing python-blosc 1.2.4
> =
>
> What is new?
> 
>
> This is a maintenance release, where included c-blosc sources have been
> updated to 1.4.0.  This adds support for non-Intel architectures, most
> specially those not supporting unaligned access.
>
> For more info, you can have a look at the release notes in:
>
> https://github.com/Blosc/python-blosc/wiki/Release-notes
>
> More docs and examples are available in the documentation site:
>
> http://python-blosc.blosc.org
>
>
> What is it?
> ===
>
> Blosc (http://www.blosc.org) is a high performance compressor
> optimized for binary data.  It has been designed to transmit data to
> the processor cache faster than the traditional, non-compressed,
> direct memory fetch approach via a memcpy() OS call.
>
> Blosc is the first compressor that is meant not only to reduce the size
> of large datasets on-disk or in-memory, but also to accelerate object
> manipulations that are memory-bound
> (http://www.blosc.org/docs/StarvingCPUs.pdf).  See
> http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on
> how much speed it can achieve in some datasets.
>
> Blosc works well for compressing numerical arrays that contains data
> with relatively low entropy, like sparse data, time series, grids with
> regular-spaced values, etc.
>
> python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
> the Blosc compression library.
>
> There is also a handy command line and Python library for Blosc called
> Bloscpack (https://github.com/Blosc/bloscpack) that allows you to
> compress large binary datafiles on-disk.
>
>
> Installing
> ==
>
> python-blosc is in PyPI repository, so installing it is easy:
>
> $ pip install -U blosc  # yes, you should omit the python- prefix
>
>
> Download sources
> 
>
> The sources are managed through github services at:
>
> http://github.com/Blosc/python-blosc
>
>
> Documentation
> =
>
> There is Sphinx-based documentation site at:
>
> http://python-blosc.blosc.org/
>
>
> Mailing list
> 
>
> There is an official mailing list for Blosc at:
>
> bl...@googlegroups.com
> http://groups.google.es/group/blosc
>
>
> Licenses
> 
>
> Both Blosc and its Python wrapper are distributed using the MIT license.
> See:
>
> https://github.com/Blosc/python-blosc/blob/master/LICENSES
>
> for more details.
>
> 
>
>   **Enjoy data!**
>


-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.2.7 released

2014-07-07 Thread Francesc Alted
=
Announcing python-blosc 1.2.4
=

What is new?


This is a maintenance release, where included c-blosc sources have been
updated to 1.4.0.  This adds support for non-Intel architectures, most
specially those not supporting unaligned access.

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc is the first compressor that is meant not only to reduce the size
of large datasets on-disk or in-memory, but also to accelerate object
manipulations that are memory-bound
(http://www.blosc.org/docs/StarvingCPUs.pdf).  See
http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on
how much speed it can achieve in some datasets.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library.

There is also a handy command line and Python library for Blosc called
Bloscpack (https://github.com/Blosc/bloscpack) that allows you to
compress large binary datafiles on-disk.


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the python- prefix


Download sources


The sources are managed through github services at:

http://github.com/Blosc/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://python-blosc.blosc.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/Blosc/python-blosc/blob/master/LICENSES

for more details.



   **Enjoy data!**

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] IDL vs Python parallel computing

2014-05-05 Thread Francesc Alted
On 5/3/14, 11:56 PM, Siegfried Gonzi wrote:
> Hi all
>
> I noticed IDL uses at least 400% (4 processors or cores) out of the box
> for simple things like reading and processing files, calculating the
> mean etc.
>
> I have never seen this happening with numpy except for the linalgebra
> stuff (e.g lapack).

Well, this might be because it is the place where using several 
processes makes more sense.  Normally, when you are reading files, the 
bottleneck is the I/O subsystem (at least if you don't have to convert 
from text to numbers), and for calculating the mean, normally the 
bottleneck is memory throughput.

Having said this, there are several packages that work on top of NumPy 
that can use multiple cores when performing numpy operations, like 
numexpr (https://github.com/pydata/numexpr), or Theano 
(http://deeplearning.net/software/theano/tutorial/multi_cores.html)

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-18 Thread Francesc Alted
El 18/04/14 13:39, Francesc Alted ha escrit:
> So, sqrt in numpy has barely the same speed than the one in MKL. 
> Again, I wonder why :)

So by peeking into the code I have seen that you implemented sqrt using 
SSE2 intrinsics.  Cool!

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] About the npz format

2014-04-18 Thread Francesc Alted
El 18/04/14 13:01, Valentin Haenel ha escrit:
> Hi again,
>
> * onefire  [2014-04-18]:
>> I think your workaround might help, but a better solution would be to not
>> use Python's zipfile module at all. This would make it possible to, say,
>> let the user choose the checksum algorithm or to turn that off.
>> Or maybe the compression stuff makes this route too complicated to be worth
>> the trouble? (after all, the zip format is not that hard to understand)
> Just to give you an idea of what my aforementioned Bloscpack library can
> do in the case of linspace:
>
> In [1]: import numpy as np
>
> In [2]: import bloscpack as bp
>
> In [3]: import bloscpack.sysutil as bps
>
> In [4]: x = np.linspace(1, 10, 5000)
>
> In [5]: %timeit np.save("x.npy", x) ; bps.sync()
> 1 loops, best of 3: 2.12 s per loop
>
> In [6]: %timeit bp.pack_ndarray_file(x, 'x.blp') ; bps.sync()
> 1 loops, best of 3: 627 ms per loop
>
> In [7]: %timeit -n 3 -r 3 np.save("x.npy", x) ; bps.sync()
> 3 loops, best of 3: 1.92 s per loop
>
> In [8]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x.blp') ; bps.sync()
> 3 loops, best of 3: 564 ms per loop
>
> In [9]: ls -lah x.npy x.blp
> -rw-r--r-- 1 root root  49M Apr 18 12:53 x.blp
> -rw-r--r-- 1 root root 382M Apr 18 12:52 x.npy
>
> However, this is a bit of special case, since Blosc does extremely well
> -- both speed and size wise -- on the linspace data, your milage may
> vary.

Exactly, and besides, Blosc can use different codes inside it.  Just for 
completeness, here it is a small benchmark of what you can expect from 
them (my laptop does not have a SSD, so my figures are a bit slow 
compared with Valentin's):

In [50]: %timeit -n 3 -r 3 np.save("x.npy", x) ; bps.sync()
3 loops, best of 3: 5.7 s per loop

In [51]: cargs = bp.args.DEFAULT_BLOSC_ARGS

In [52]: cargs['cname'] = 'blosclz'

In [53]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-blosclz.blp', 
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 1.12 s per loop

In [54]: cargs['cname'] = 'lz4'

In [55]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-lz4.blp', 
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 985 ms per loop

In [56]: cargs['cname'] = 'lz4hc'

In [57]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-lz4hc.blp', 
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 1.95 s per loop

In [58]: cargs['cname'] = 'snappy'

In [59]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-snappy.blp', 
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 1.11 s per loop

In [60]: cargs['cname'] = 'zlib'

In [61]: %timeit -n 3 -r 3 bp.pack_ndarray_file(x, 'x-zlib.blp', 
blosc_args=cargs) ; bps.sync()
3 loops, best of 3: 3.12 s per loop

so all the codecs can make the storage go faster than a pure np.save(), 
and most specially blosclz, lz4 and snappy.  However, lz4hc and zlib 
achieve the best compression ratios:

In [62]: ls -lht x*.*
-rw-r--r-- 1 faltet users 7,0M 18 abr 13:49 x-zlib.blp
-rw-r--r-- 1 faltet users  54M 18 abr 13:48 x-snappy.blp
-rw-r--r-- 1 faltet users 7,0M 18 abr 13:48 x-lz4hc.blp
-rw-r--r-- 1 faltet users  48M 18 abr 13:47 x-lz4.blp
-rw-r--r-- 1 faltet users  49M 18 abr 13:47 x-blosclz.blp
-rw-r--r-- 1 faltet users 382M 18 abr 13:42 x.npy

But again, we are talking about a specially nice compression case.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-18 Thread Francesc Alted
El 17/04/14 21:19, Julian Taylor ha escrit:
> On 17.04.2014 20:30, Francesc Alted wrote:
>> El 17/04/14 19:28, Julian Taylor ha escrit:
>>> On 17.04.2014 18:06, Francesc Alted wrote:
>>>
>>>> In [4]: x_unaligned = np.zeros(shape,
>>>> dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x']
>>> on arrays of this size you won't see alignment issues you are dominated
>>> by memory bandwidth. If at all you will only see it if the data fits
>>> into the cache.
>>> Its also about unaligned to simd vectors not unaligned to basic types.
>>> But it doesn't matter anymore on modern x86 cpus. I guess for array data
>>> cache line splits should also not be a big concern.
>> Yes, that was my point, that in x86 CPUs this is not such a big
>> problem.  But still a factor of 2 is significant, even for CPU-intensive
>> tasks.  For example, computing sin() is affected similarly (sin() is
>> using SIMD, right?):
>>
>> In [6]: shape = (1, 1)
>>
>> In [7]: x_aligned = np.zeros(shape,
>> dtype=[('x',np.float64),('y',np.int64)])['x']
>>
>> In [8]: x_unaligned = np.zeros(shape,
>> dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x']
>>
>> In [9]: %timeit res = np.sin(x_aligned)
>> 1 loops, best of 3: 654 ms per loop
>>
>> In [10]: %timeit res = np.sin(x_unaligned)
>> 1 loops, best of 3: 1.08 s per loop
>>
>> and again, numexpr can deal with that pretty well (using 8 physical
>> cores here):
>>
>> In [6]: %timeit res = ne.evaluate('sin(x_aligned)')
>> 10 loops, best of 3: 149 ms per loop
>>
>> In [7]: %timeit res = ne.evaluate('sin(x_unaligned)')
>> 10 loops, best of 3: 151 ms per loop
> in this case the unaligned triggers a strided memcpy calling loop to
> copy the data into a aligned buffer which is terrible for performance,
> even compared to the expensive sin call.
> numexpr handles this well as it allows the compiler to replace the
> memcpy with inline assembly (a mov instruction).
> We could fix that in numpy, though I don't consider it very important,
> you usually always have base type aligned memory.

Well, that *could* be important for evaluating conditions in structured 
arrays, as it is pretty easy to get unaligned 'columns'. But apparently 
this does not affect very much to numpy:

In [23]: na_aligned = np.fromiter((("", i, i*2) for i in xrange(N)), 
dtype="S16,i4,i8")

In [24]: na_unaligned = np.fromiter((("", i, i*2) for i in xrange(N)), 
dtype="S15,i4,i8")

In [25]: %time sum((r['f1'] for r in na_aligned[na_aligned['f2'] > 10]))
CPU times: user 10.2 s, sys: 93 ms, total: 10.3 s
Wall time: 10.3 s
Out[25]: 499485

In [26]: %time sum((r['f1'] for r in na_unaligned[na_unaligned['f2'] > 10]))
CPU times: user 10.2 s, sys: 82 ms, total: 10.3 s
Wall time: 10.3 s
Out[26]: 499485

probably because the bottleneck is in another place.  So yeah, probably 
not worth to worry about that.


> (sin is not a SIMD using function unless you use a vector math library
> not supported by numpy directly yet)

Ah, so MKL is making use of SIMD for computing the sin(), but not in 
general.  But you later said that numpy's sqrt *is* making use of SIMD.  
I wonder why.

>
>>
>>> Aligned allocators are not the only allocator which might be useful in
>>> numpy. Modern CPUs also support larger pages than 4K (huge pages up to
>>> 1GB in size) which reduces TLB cache misses. Memory of this type
>>> typically needs to be allocated with special mmap flags, though newer
>>> kernel versions can now also provide this memory to transparent
>>> anonymous pages (normal non-file mmaps).
>> That's interesting.  In which scenarios do you think that could improve
>> performance?
> it might improve all numpy operations dealing with big arrays.
> big arrays trigger many large temporaries meaning glibc uses mmap
> meaning lots of moving of address space between the kernel and userspace.
> but I haven't benchmarked it, so it could also be completely irrelevant.

I was curious about this and apparently the speedups that typically 
bring large page caches is around 5%:

http://stackoverflow.com/questions/14275170/performance-degradation-with-large-pages

not a big deal, but it is something.

>
> Also memory fragments really fast, so after a few hours of operation you
> often can't allocate any huge pages anymore, so you need to reserve
> space for them which requires special 

Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-17 Thread Francesc Alted
El 17/04/14 19:28, Julian Taylor ha escrit:
> On 17.04.2014 18:06, Francesc Alted wrote:
>
>> In [4]: x_unaligned = np.zeros(shape,
>> dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x']
> on arrays of this size you won't see alignment issues you are dominated
> by memory bandwidth. If at all you will only see it if the data fits
> into the cache.
> Its also about unaligned to simd vectors not unaligned to basic types.
> But it doesn't matter anymore on modern x86 cpus. I guess for array data
> cache line splits should also not be a big concern.

Yes, that was my point, that in x86 CPUs this is not such a big 
problem.  But still a factor of 2 is significant, even for CPU-intensive 
tasks.  For example, computing sin() is affected similarly (sin() is 
using SIMD, right?):

In [6]: shape = (1, 1)

In [7]: x_aligned = np.zeros(shape, 
dtype=[('x',np.float64),('y',np.int64)])['x']

In [8]: x_unaligned = np.zeros(shape, 
dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x']

In [9]: %timeit res = np.sin(x_aligned)
1 loops, best of 3: 654 ms per loop

In [10]: %timeit res = np.sin(x_unaligned)
1 loops, best of 3: 1.08 s per loop

and again, numexpr can deal with that pretty well (using 8 physical 
cores here):

In [6]: %timeit res = ne.evaluate('sin(x_aligned)')
10 loops, best of 3: 149 ms per loop

In [7]: %timeit res = ne.evaluate('sin(x_unaligned)')
10 loops, best of 3: 151 ms per loop


> Aligned allocators are not the only allocator which might be useful in
> numpy. Modern CPUs also support larger pages than 4K (huge pages up to
> 1GB in size) which reduces TLB cache misses. Memory of this type
> typically needs to be allocated with special mmap flags, though newer
> kernel versions can now also provide this memory to transparent
> anonymous pages (normal non-file mmaps).

That's interesting.  In which scenarios do you think that could improve 
performance?

>> In [8]: import numexpr as ne
>>
>> In [9]: %timeit res = ne.evaluate('x_aligned ** 2')
>> 10 loops, best of 3: 133 ms per loop
>>
>> In [10]: %timeit res = ne.evaluate('x_unaligned ** 2')
>> 10 loops, best of 3: 134 ms per loop
>>
>> i.e. there is not a significant difference between aligned and unaligned
>> access to data.
>>
>> I wonder if the same technique could be applied to NumPy.
>
> you already can do so with relatively simple means:
> http://nbviewer.ipython.org/gist/anonymous/10942132
>
> If you change the blocking function to get a function as input and use
> inplace operations numpy can even beat numexpr (though I used the
> numexpr Ubuntu package which might not be compiled optimally)
> This type of transformation can probably be applied on the AST quite easily.

That's smart.  Yeah, I don't see a reason why numexpr would be 
performing badly on Ubuntu.  But I am not getting your performance for 
blocked_thread on my AMI linux vbox:

http://nbviewer.ipython.org/gist/anonymous/11000524

oh well, threads are always tricky beasts.  By the way, apparently the 
optimal block size for my machine is something like 1 MB, not 128 KB, 
although the difference is not big:

http://nbviewer.ipython.org/gist/anonymous/11002751

(thanks to Stefan Van der Walt for the script).

-- Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] High-quality memory profiling for numpy in python 3.5 / volunteers needed

2014-04-17 Thread Francesc Alted
Uh, 15x slower for unaligned access is quite a lot.  But Intel (and AMD) 
arquitectures are much more tolerant in this aspect (and improving).  
For example, with a Xeon(R) CPU E5-2670 (2 years old) I get:


In [1]: import numpy as np

In [2]: shape = (1, 1)

In [3]: x_aligned = np.zeros(shape, 
dtype=[('x',np.float64),('y',np.int64)])['x']


In [4]: x_unaligned = np.zeros(shape, 
dtype=[('y1',np.int8),('x',np.float64),('y2',np.int8,(7,))])['x']


In [5]: %timeit res = x_aligned ** 2
1 loops, best of 3: 289 ms per loop

In [6]: %timeit res = x_unaligned ** 2
1 loops, best of 3: 664 ms per loop

so the added cost in this case is just a bit more than 2x.  But you can 
also alleviate this overhead if you do a copy that fits in cache prior 
to do computations.  numexpr does this:


https://github.com/pydata/numexpr/blob/master/numexpr/interp_body.cpp#L203

and the results are pretty good:

In [8]: import numexpr as ne

In [9]: %timeit res = ne.evaluate('x_aligned ** 2')
10 loops, best of 3: 133 ms per loop

In [10]: %timeit res = ne.evaluate('x_unaligned ** 2')
10 loops, best of 3: 134 ms per loop

i.e. there is not a significant difference between aligned and unaligned 
access to data.


I wonder if the same technique could be applied to NumPy.

Francesc


El 17/04/14 16:26, Aron Ahmadia ha escrit:

Hmnn, I wasn't being clear :)

The default malloc on BlueGene/Q only returns 8 byte alignment, but 
the SIMD units need 32-byte alignment for loads, stores, and 
operations or performance suffers.  On the /P the required alignment 
was 16-bytes, but malloc only gave you 8, and trying to perform 
vectorized loads/stores generated alignment exceptions on unaligned 
memory.


See https://wiki.alcf.anl.gov/parts/index.php/Blue_Gene/Q and 
https://computing.llnl.gov/tutorials/bgp/BGP-usage.Walkup.pdf (slides 
14 for overview, 15 for the effective performance difference between 
the unaligned/aligned code) for some notes on this.


A




On Thu, Apr 17, 2014 at 10:18 AM, Nathaniel Smith <mailto:n...@pobox.com>> wrote:


On 17 Apr 2014 15:09, "Aron Ahmadia" mailto:a...@ahmadia.net>> wrote:
>
> > On the one hand it would be nice to actually know whether
posix_memalign is important, before making api decisions on this
basis.
>
> FWIW: On the lightweight IBM cores that the extremely popular
BlueGene machines were based on, accessing unaligned memory raised
system faults.  The default behavior of these machines was to
terminate the program if more than 1000 such errors occurred on a
given process, and an environment variable allowed you to
terminate the program if *any* unaligned memory access occurred.
 This is because unaligned memory accesses were 15x (or more)
slower than aligned memory access.
>
> The newer /Q chips seem to be a little more forgiving of this,
but I think one can in general expect allocated memory alignment
to be an important performance technique for future high
performance computing architectures.

Right, this much is true on lots of architectures, and so malloc
is careful to always return values with sufficient alignment (e.g.
8 bytes) to make sure that any standard operation can succeed.

The question here is whether it will be important to have *even
more* alignment than malloc gives us by default. A 16 or 32 byte
wide SIMD instruction might prefer that data have 16 or 32 byte
alignment, even if normal memory access for the types being
operated on only requires 4 or 8 byte alignment.

-n


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
http://mail.scipy.org/mailman/listinfo/numpy-discussion




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



--
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.4 is out

2014-04-13 Thread Francesc Alted

  Announcing Numexpr 2.4


Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

A new `contains()` function has been added for detecting substrings in
strings.  Only plain strings (bytes) are supported for now (see ticket
#142).  Thanks to Marcin Krol.

You can have a glimpse on how `contains()` works in this notebook:

http://nbviewer.ipython.org/gist/FrancescAlted/10595974

where it can be seen that this can make substring searches more
than 10x faster than with regular Python.

You can find the source for the notebook here:

https://github.com/FrancescAlted/ngrams

Also, there is a new version of setup.py that allows better management
of the NumPy dependency during pip installs.  Thanks to Aleks Bunin.

Windows related bugs have been addressed and (hopefully) squashed.
Thanks to Christoph Gohlke.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] PEP 465 has been accepted / volunteers needed

2014-04-10 Thread Francesc Alted
On 4/9/14, 10:46 PM, Chris Barker wrote:
> On Tue, Apr 8, 2014 at 11:14 AM, Nathaniel Smith  <mailto:n...@pobox.com>> wrote:
>
> Thank you! Though I suspect that the most important part of my
> contribution may have just been my high tolerance for writing emails
> ;-).
>
>
> no -- it's your high tolerance for _reading_ emails...
>
> Far too many of us have a high tolerance for writing them!

Ha ha, very true!

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.4 RC2

2014-04-07 Thread Francesc Alted
===
  Announcing Numexpr 2.4 RC2
===

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

A new `contains()` function has been added for detecting substrings in
strings.  Only plain strings (bytes) are supported for now (see ticket
#142).  Thanks to Marcin Krol.

Also, there is a new version of setup.py that allows better management
of the NumPy dependency during pip installs.  Thanks to Aleks Bunin.

Windows related bugs have been addressed and (hopefully) squashed.
Thanks to Christoph Gohlke.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.4 RC1

2014-04-06 Thread Francesc Alted
===
  Announcing Numexpr 2.4 RC1
===

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

A new `contains()` function has been added for detecting substrings in
strings.  Thanks to Marcin Krol.

Also, there is a new version of setup.py that allows better management
of the NumPy dependency during pip installs.  Thanks to Aleks Bunin.

This is the first release candidate before 2.4 final would be out,
so please give it a go and report back any problems you may have.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] last call for fixes for numpy 1.8.1rc1

2014-02-28 Thread Francesc Alted
On 2/28/14, 3:00 PM, Charles R Harris wrote:
>
>
>
> On Fri, Feb 28, 2014 at 5:52 AM, Julian Taylor 
> mailto:jtaylor.deb...@googlemail.com>> 
> wrote:
>
> performance should not be impacted as long as we stay on the stack, it
> just increases offset of a stack pointer a bit more.
> E.g. nditer and einsum use temporary stack arrays of this type for its
> initialization:
> op_axes_arrays[NPY_MAXARGS][NPY_MAXDIMS]; // both 32 currently
> The resulting nditer structure is then in heap space and dependent on
> the real amount of arguments it got.
> So I'm more worried about running out of stack space, though the limit
> is usually 8mb so taking 128kb for a short while should be ok.
>
> On 28.02.2014 13:32, Francesc Alted wrote:
> > Well, what numexpr is using is basically NpyIter_AdvancedNew:
> >
> >
> 
> https://github.com/pydata/numexpr/blob/master/numexpr/interpreter.cpp#L1178
> >
> > and nothing else.  If NPY_MAXARGS could be increased just for
> that, and
> > without ABI breaking, then fine.  If not, we should have to wait
> until
> > 1.9 I am afraid.
> >
> > On the other hand, increasing the temporary arrays in nditer
> from 32kb
> > to 128kb is a bit worrying, but probably we should do some
> benchmarks
> > and see how much performance would be compromised (if any).
> >
> > Francesc
> >
> > On 2/28/14, 1:09 PM, Julian Taylor wrote:
> >> hm increasing it for PyArrayMapIterObject would break the
> public ABI.
> >> While nobody should be using this part of the ABI its not
> appropriate
> >> for a bugfix release.
> >> Note that as it currently stands in numpy 1.9.dev we will break
> this ABI
> >> for the indexing improvements.
> >>
> >> Though for nditer and some other functions we could change it
> if thats
> >> enough.
> >> It would bump some temporary arrays of nditer from 32kb to 128kb, I
> >> think that would still be fine, but getting to the point where
> we should
> >> move them onto the heap.
>
>
> These sort of changes can have subtle side effects and need lots of 
> testing in a release cycle. Bugfix release cycles are kept short by 
> restricting changes to those that look simple and safe.

Agreed.  I have just opened a ticket for having this in mind for NumPy 1.9:

https://github.com/numpy/numpy/issues/4398

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] last call for fixes for numpy 1.8.1rc1

2014-02-28 Thread Francesc Alted
Well, what numexpr is using is basically NpyIter_AdvancedNew:

https://github.com/pydata/numexpr/blob/master/numexpr/interpreter.cpp#L1178

and nothing else.  If NPY_MAXARGS could be increased just for that, and 
without ABI breaking, then fine.  If not, we should have to wait until 
1.9 I am afraid.

On the other hand, increasing the temporary arrays in nditer from 32kb 
to 128kb is a bit worrying, but probably we should do some benchmarks 
and see how much performance would be compromised (if any).

Francesc

On 2/28/14, 1:09 PM, Julian Taylor wrote:
> hm increasing it for PyArrayMapIterObject would break the public ABI.
> While nobody should be using this part of the ABI its not appropriate
> for a bugfix release.
> Note that as it currently stands in numpy 1.9.dev we will break this ABI
> for the indexing improvements.
>
> Though for nditer and some other functions we could change it if thats
> enough.
> It would bump some temporary arrays of nditer from 32kb to 128kb, I
> think that would still be fine, but getting to the point where we should
> move them onto the heap.
>
> On 28.02.2014 12:41, Francesc Alted wrote:
>> Hi Julian,
>>
>> Any chance that NPY_MAXARGS could be increased to something more than
>> the current value of 32?  There is a discussion about this in:
>>
>> https://github.com/numpy/numpy/pull/226
>>
>> but I think that, as Charles was suggesting, just increasing NPY_MAXARGS
>> to something more reasonable (say 256) should be enough for a long while.
>>
>> This issue limits quite a bit the number of operands in numexpr
>> expressions, and hence, to other projects that depends on it, like
>> PyTables or pandas.  See for example this bug report:
>>
>> https://github.com/PyTables/PyTables/issues/286
>>
>> Thanks,
>> Francesc
>>
>> On 2/27/14, 9:05 PM, Julian Taylor wrote:
>>> hi,
>>>
>>> We want to start preparing the release candidate for the bugfix release
>>> 1.8.1rc1 this weekend, I'll start preparing the changelog tomorrow.
>>>
>>> So if you want a certain issue fixed please scream now or better create
>>> a pull request/patch on the maintenance/1.8.x branch.
>>> Please only consider bugfixes, no enhancements (unless they are really
>>> really simple), new features or invasive changes.
>>>
>>> I just finished my list of issues I want backported to numpy 1.8
>>> (gh-4390, gh-4388). Please check if its already included in these PRs.
>>> I'm probably still going to add gh-4284 after some though tomorrow.
>>>
>>> Cheers,
>>> Julian
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] last call for fixes for numpy 1.8.1rc1

2014-02-28 Thread Francesc Alted
Hi Julian,

Any chance that NPY_MAXARGS could be increased to something more than 
the current value of 32?  There is a discussion about this in:

https://github.com/numpy/numpy/pull/226

but I think that, as Charles was suggesting, just increasing NPY_MAXARGS 
to something more reasonable (say 256) should be enough for a long while.

This issue limits quite a bit the number of operands in numexpr 
expressions, and hence, to other projects that depends on it, like 
PyTables or pandas.  See for example this bug report:

https://github.com/PyTables/PyTables/issues/286

Thanks,
Francesc

On 2/27/14, 9:05 PM, Julian Taylor wrote:
> hi,
>
> We want to start preparing the release candidate for the bugfix release
> 1.8.1rc1 this weekend, I'll start preparing the changelog tomorrow.
>
> So if you want a certain issue fixed please scream now or better create
> a pull request/patch on the maintenance/1.8.x branch.
> Please only consider bugfixes, no enhancements (unless they are really
> really simple), new features or invasive changes.
>
> I just finished my list of issues I want backported to numpy 1.8
> (gh-4390, gh-4388). Please check if its already included in these PRs.
> I'm probably still going to add gh-4284 after some though tomorrow.
>
> Cheers,
> Julian
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.3.1 released

2014-02-18 Thread Francesc Alted
==
  Announcing Numexpr 2.3.1
==

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.

What's new
==

* Added support for shift-left (<<) and shift-right (>>) binary operators.
   See PR #131. Thanks to fish2000!

* Removed the rpath flag for the GCC linker, because it is probably
   not necessary and it chokes to clang.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] argsort speed

2014-02-17 Thread Francesc Alted
On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote:
> On Sun, Feb 16, 2014 at 6:12 PM, Daπid  wrote:
>> On 16 February 2014 23:43,  wrote:
>>> What's the fastest argsort for a 1d array with around 28 Million
>>> elements, roughly uniformly distributed, random order?
>>
>> On numpy latest version:
>>
>> for kind in ['quicksort', 'mergesort', 'heapsort']:
>>  print kind
>>  %timeit np.sort(data, kind=kind)
>>  %timeit np.argsort(data, kind=kind)
>>
>>
>> quicksort
>> 1 loops, best of 3: 3.55 s per loop
>> 1 loops, best of 3: 10.3 s per loop
>> mergesort
>> 1 loops, best of 3: 4.84 s per loop
>> 1 loops, best of 3: 9.49 s per loop
>> heapsort
>> 1 loops, best of 3: 12.1 s per loop
>> 1 loops, best of 3: 39.3 s per loop
>>
>>
>> It looks quicksort is quicker sorting, but mergesort is marginally faster
>> sorting args. The diference is slim, but upon repetition, it remains
>> significant.
>>
>> Why is that? Probably part of the reason is what Eelco said, and part is
>> that in sort comparison are done accessing the array elements directly, but
>> in argsort you have to index the array, introducing some overhead.
> Thanks, both.
>
> I also gain a second with mergesort.
>
> matlab would be nicer in my case, it returns both.
> I still need to use the argsort to index into the array to also get
> the sorted array.

Many years ago I needed something similar, so I made some functions for 
sorting and argsorting in one single shot.  Maybe you want to reuse 
them.  Here it is an example of the C implementation:

https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619

and here the Cython wrapper for all of them:

https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129

Francesc

>
> Josef
>
>
>> I seem unable to find the code for ndarray.sort, so I can't check. I have
>> tried to grep it tring all possible combinations of "def ndarray",
>> "self.sort", etc. Where is it?
>>
>>
>> /David.
>>
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: numexpr 2.3 (final) released

2014-01-27 Thread Francesc Alted
Not really.  numexpr is mostly about element-wise operations in dense 
matrices.  You should look to another package for that.

Francesc

On 1/27/14, 10:18 AM, Dinesh Vadhia wrote:
> Francesc: Does numexpr support scipy sparse matrices?
>   
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: BLZ 0.6.1 has been released

2014-01-25 Thread Francesc Alted
Announcing BLZ 0.6 series
=

What it is
--

BLZ is a chunked container for numerical data.  Chunking allows for
efficient enlarging/shrinking of data container.  In addition, it can
also be compressed for reducing memory/disk needs.  The compression
process is carried out internally by Blosc, a high-performance
compressor that is optimized for binary data.

The main objects in BLZ are `barray` and `btable`.  `barray` is meant
for storing multidimensional homogeneous datasets efficiently.
`barray` objects provide the foundations for building `btable`
objects, where each column is made of a single `barray`.  Facilities
are provided for iterating, filtering and querying `btables` in an
efficient way.  You can find more info about `barray` and `btable` in
the tutorial:

http://blz.pydata.org/blz-manual/tutorial.html

BLZ can use numexpr internally so as to accelerate many vector and
query operations (although it can use pure NumPy for doing so too)
either from memory or from disk.  In the future, it is planned to use
Numba as the computational kernel and to provide better Blaze
(http://blaze.pydata.org) integration.


What's new
--

BLZ has been branched off from the Blaze project
(http://blaze.pydata.org).  BLZ was meant as a persistent format and
library for I/O in Blaze.  BLZ in Blaze is based on previous carray
0.5 and this is why this new version is labeled 0.6.

BLZ supports completely transparent storage on-disk in addition to
memory.  That means that *everything* that can be done with the
in-memory container can be done using the disk as well.

The advantages of a disk-based container is that the addressable space
is much larger than just your available memory.  Also, as BLZ is based
on a chunked and compressed data layout based on the super-fast Blosc
compression library, the data access speed is very good.

The format chosen for the persistence layer is based on the
'bloscpack' library and described in the "Persistent format for BLZ"
chapter of the user manual ('docs/source/persistence-format.rst').
More about Bloscpack here: https://github.com/esc/bloscpack

You may want to know more about BLZ in this blog entry:
http://continuum.io/blog/blz-format

In this version, support for Blosc 1.3 has been added, that meaning
that a new `cname` parameter has been added to the `bparams` class, so
that you can select you preferred compressor from 'blosclz', 'lz4',
'lz4hc', 'snappy' and 'zlib'.

Also, many bugs have been fixed, providing a much smoother experience.

CAVEAT: The BLZ/bloscpack format is still evolving, so don't trust on
forward compatibility of the format, at least until 1.0, where the
internal format will be declared frozen.


Resources
-

Visit the main BLZ site repository at:
http://github.com/ContinuumIO/blz

Read the online docs at:
http://blz.pydata.org/blz-manual/index.html

Home of Blosc compressor:
http://www.blosc.org

User's mail list:
blaze-...@continuum.io



Enjoy!

Francesc Alted
Continuum Analytics, Inc.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.2.0 released

2014-01-25 Thread Francesc Alted
ugh github services at:

http://github.com/ContinuumIO/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://blosc.pydata.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/ContinuumIO/python-blosc/blob/master/LICENSES

for more details.

--
Francesc Alted
Continuum Analytics, Inc.


-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.3 (final) released

2014-01-25 Thread Francesc Alted
==
  Announcing Numexpr 2.3
==

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
MKL (Math Kernel Library), which allows an extremely fast evaluation
of transcendental functions (sin, cos, tan, exp, log...)  while
squeezing the last drop of performance out of your multi-core
processors.  Look here for a some benchmarks of numexpr using MKL:

https://github.com/pydata/numexpr/wiki/NumexprMKL

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational engine for projects that
don't want to adopt other solutions requiring more heavy dependencies.
Numexpr is already being used in a series of packages (PyTables, pandas,
BLZ...) for helping doing computations faster.


What's new
==

The repository has been migrated to https://github.com/pydata/numexpr.
All new tickets and PR should be directed there.

Also, a `conj()` function for computing the conjugate of complex arrays 
has been added.
Thanks to David Menéndez.  See PR #125.

Finallly, we fixed a DeprecationWarning derived of using ``oa_ndim ==
0`` and ``op_axes == NULL`` with `NpyIter_AdvancedNew()` and
NumPy 1.8.  Thanks to Mark Wiebe for advise on how to fix this
properly.

Many thanks to Christoph Gohlke and Ilan Schnell for his help during
the testing of this release in all kinds of possible combinations of
platforms and MKL.

In case you want to know more in detail what has changed in this
version, see:

https://github.com/pydata/numexpr/wiki/Release-Notes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at GitHub in:

https://github.com/pydata/numexpr

You can get the packages from PyPI as well (but not for RC releases):

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Francesc Alted

Yeah, numexpr is pretty cool for avoiding temporaries in an easy way:

https://github.com/pydata/numexpr

Francesc

El 24/01/14 16:30, Nathaniel Smith ha escrit:


There is no reliable way to predict how much memory an arbitrary numpy 
operation will need, no. However, in most cases the main memory cost 
will be simply the need to store the input and output arrays; for 
large arrays, all other allocations should be negligible.


The most effective way to avoid running out of memory, therefore, is 
to avoid creating temporary arrays, by using only in-place operations.


E.g., if a and b each require N bytes of ram, then memory requirements 
(roughly).


c = a + b: 3N
c = a + 2*b: 4N
a += b: 2N
np.add(a, b, out=a): 2N
b *= 2; a += b: 2N

Note that simply loading a and b requires 2N memory, so the latter 
code samples are near-optimal.


Of course some calculations do require the use of temporary storage 
space...


-n

On 24 Jan 2014 15:19, "Dinesh Vadhia" <mailto:dineshbvad...@hotmail.com>> wrote:


I want to write a general exception handler to warn if too much
data is being loaded for the ram size in a machine for a
successful numpy array operation to take place.  For example, the
program multiplies two floating point arrays A and B which are
populated with loadtext.  While the data is being loaded, want to
continuously check that the data volume doesn't pass a threshold
that will cause on out-of-memory error during the A*B operation.
The known variables are the amount of memory available, data type
(floats in this case) and the numpy array operation to be
performed. It seems this requires knowledge of the internal memory
requirements of each numpy operation.  For sake of simplicity, can
ignore other memory needs of program.  Is this possible?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



--
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] -ffast-math

2013-12-03 Thread Francesc Alted
On 12/2/13, 12:14 AM, Dan Goodman wrote:
> Dan Goodman  thesamovar.net> writes:
> ...
>> I got around 5x slower. Using numexpr 'dumbly' (i.e. just putting the
>> expression in directly) was slower than the function above, but doing a
>> hybrid between the two approaches worked well:
>>
>> def timefunc_numexpr_smart():
>>  _sin_term = sin(2.0*freq*pi*t)
>>  _exp_term = exp(-dt/tau)
>>  _a_term = (_sin_term-_sin_term*_exp_term)
>>  _const_term = -b*_exp_term + b
>>  v[:] = numexpr.evaluate('a*_a_term+v*_exp_term+_const_term')
>>  #numexpr.evaluate('a*_a_term+v*_exp_term+_const_term', out=v)
>>
>> This was about 3.5x slower than weave. If I used the commented out final
>> line then it was only 1.5x slower than weave, but it also gives wrong
>> results. I reported this as a bug in numexpr a long time ago but I guess it
>> hasn't been fixed yet (or maybe I didn't upgrade my version recently).
> I just upgraded numexpr to 2.2 where they did fix this bug, and now the
> 'smart' numexpr version runs exactly as fast as weave (so I guess there were
> some performance enhancements in numexpr as well).

Err no, there have not been performance improvements in numexpr since 
2.0 (that I am aware of).  Maybe you are running in a multi-core machine 
now and you are seeing better speedup because of this?  Also, your 
expressions are made of transcendental functions, so linking numexpr 
with MKL could accelerate computations a good deal too.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [ANN] numexpr 2.2 released

2013-08-31 Thread Francesc Alted
==
 Announcing Numexpr 2.2
==

Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
VML library (included in Intel MKL), which allows an extremely fast
evaluation of transcendental functions (sin, cos, tan, exp, log...)
while squeezing the last drop of performance out of your multi-core
processors.

Its only dependency is NumPy (MKL is optional), so it works well as an
easy-to-deploy, easy-to-use, computational kernel for projects that
don't want to adopt other solutions that require more heavy
dependencies.

What's new
==

This release is mainly meant to fix a problem with the license the
numexpr/win32/pthread.{c,h} files emulating pthreads on Windows. After
persmission from the original authors is granted, these files adopt
the MIT license and can be redistributed without problems.  See issue
#109 for details
(https://code.google.com/p/numexpr/issues/detail?id=110).

Another important improvement is the new algorithm to decide the initial
number of threads to be used.  This was necessary because by default,
numexpr was using a number of threads equal to the detected number of
cores, and this can be just too much for moder systems where this
number can be too high (and counterporductive for performance in many
cases).  Now, the 'NUMEXPR_NUM_THREADS' environment variable is
honored, and in case this is not present, a maximum number of *8*
threads are setup initially.  The new algorithm is fully described in
the Users Guide, in the note of 'General routines' section:
https://code.google.com/p/numexpr/wiki/UsersGuide#General_routines.
Closes #110.

In case you want to know more in detail what has changed in this
version, see:

http://code.google.com/p/numexpr/wiki/ReleaseNotes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

You can get the packages from PyPI as well:

http://pypi.python.org/pypi/numexpr

Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy data!

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] RAM problem during code execution - Numpya arrays

2013-08-23 Thread Francesc Alted
  S_nuevo=sum(X_nuevo)
>
> std_dev_nuevo_exp_u = np.sqrt(S_nuevo/(tipos_nuevo-1.))
>
>
> componente_longitudinal=Longitud*np.abs(np.cos(array_angle))
> comp_y=np.append(comp_y, sum(componente_longitudinal)/N)
>
> componente_transversal=Longitud*np.abs(np.sin(array_angle))
> comp_x=np.append(comp_x, sum(componente_transversal)/N)
>
> std_dev_size_medio_intuitivo=np.append(std_dev_size_medio_intuitivo, 
> std_dev_exp_u)
>
> std_dev_size_medio_nuevo=np.append(std_dev_size_medio_nuevo, 
> std_dev_nuevo_exp_u)
>
> size_medio_intuitivo=np.append(size_medio_intuitivo, 
> S_medio_intuitivo_exp_u)
>
> size_medio_nuevo=np.append(size_medio_nuevo, S_medio_nuevo_exp_u)
>
>
> percolation_probability=sum(PERCOLACION)/numero_experimentos
>
> prob_perc=np.append(prob_perc, percolation_probability)
>
> S_int = np.append (S_int, sum(size_medio_intuitivo)/numero_experimentos)
>
> S_medio=np.append (S_medio, sum(size_medio_nuevo)/numero_experimentos)
>
> desviacion_standard = np.append (desviacion_standard, 
> sum(std_dev_size_medio_intuitivo)/numero_experimentos)
>
> desviacion_standard_nuevo=np.append (desviacion_standard_nuevo, 
> sum(std_dev_size_medio_nuevo)/numero_experimentos)
>
> tiempos=np.append(tiempos, time.clock()-empieza)
>
> componente_y=np.append(componente_y, sum(comp_y)/numero_experimentos)
> componente_x=np.append(componente_x, sum(comp_x)/numero_experimentos)
>
> anisotropia_macroscopica_porcentual=100*(1-(componente_y/componente_x))
>
> I tryed with gc and gc.collect() and 'del'command for deleting arrays
> after his use and nothing work!
>
> What am I doing wrong? Why the memory becomes full while running (starts
> with 10% of RAM used and in 1-2hour is totally full used)?
>
> Please help me, I'm totally stuck!
> Thanks a lot!
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.1 (final) released

2013-05-24 Thread Francesc Alted
===
Announcing python-blosc 1.1
===

What is it?
===

python-blosc (http://blosc.pydata.org/) is a Python wrapper for the
Blosc compression library.

Blosc (http://blosc.org) is a high performance compressor optimized for
binary data.  It has been designed to transmit data to the processor
cache faster than the traditional, non-compressed, direct memory fetch
approach via a memcpy() OS call.  Whether this is achieved or not
depends of the data compressibility, the number of cores in the system,
and other factors.  See a series of benchmarks conducted for many
different systems: http://blosc.org/trac/wiki/SyntheticBenchmarks.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

There is also a handy command line for Blosc called Bloscpack
(https://github.com/esc/bloscpack) that allows you to compress large
binary datafiles on-disk.  Although the format for Bloscpack has not
stabilized yet, it allows you to effectively use Blosc from your
favorite shell.


What is new?


- Added new `compress_ptr` and `decompress_ptr` functions that allows to
   compress and decompress from/to a data pointer, avoiding an
   itermediate copy for maximum speed.  Be careful, as these are low
   level calls, and user must make sure that the pointer data area is
   safe.

- Since Blosc (the C library) already supports to be installed as an
   standalone library (via cmake), it is also possible to link
   python-blosc against a system Blosc library.

- The Python calls to Blosc are now thread-safe (another consequence of
   recent Blosc library supporting this at C level).

- Many checks on types and ranges of values have been added.  Most of
   the calls will now complain when passed the wrong values.

- Docstrings are much improved. Also, Sphinx-based docs are available
   now.

Many thanks to Valentin Hänel for his impressive work for this release.

For more info, you can see the release notes in:

https://github.com/FrancescAlted/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://blosc.pydata.org


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the python- prefix


Download sources


The sources are managed through github services at:

http://github.com/FrancescAlted/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://blosc.pydata.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/FrancescAlted/python-blosc/blob/master/LICENSES

for more details.

Enjoy!

--
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: python-blosc 1.1 RC1 available for testing

2013-05-17 Thread Francesc Alted

Announcing python-blosc 1.1 RC1


What is it?
===

python-blosc (http://blosc.pydata.org) is a Python wrapper for the
Blosc compression library.

Blosc (http://blosc.org) is a high performance compressor optimized for
binary data.  It has been designed to transmit data to the processor
cache faster than the traditional, non-compressed, direct memory fetch
approach via a memcpy() OS call.  Whether this is achieved or not
depends of the data compressibility, the number of cores in the system,
and other factors.  See a series of benchmarks conducted for many
different systems: http://blosc.org/trac/wiki/SyntheticBenchmarks.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

There is also a handy command line for Blosc called Bloscpack
(https://github.com/esc/bloscpack) that allows you to compress large
binary datafiles on-disk.  Although the format for Bloscpack has not
stabilized yet, it allows you to effectively use Blosc from your
favorite shell.


What is new?


- Added new `compress_ptr` and `decompress_ptr` functions that allows to
   compress and decompress from/to a data pointer.  These are low level
   calls and user must make sure that the pointer data area is safe.

- Since Blosc (the C library) already supports to be installed as an
   standalone library (via cmake), it is also possible to link
   python-blosc against a system Blosc library.

- The Python calls to Blosc are now thread-safe (another consequence of
   recent Blosc library supporting this at C level).

- Many checks on types and ranges of values have been added.  Most of
   the calls will now complain when passed the wrong values.

- Docstrings are much improved. Also, Sphinx-based docs are available
   now.

Many thanks to Valentin Hänel for his impressive work for this release.

For more info, you can see the release notes in:

https://github.com/FrancescAlted/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://blosc.pydata.org


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the blosc- prefix


Download sources


The sources are managed through github services at:

http://github.com/FrancescAlted/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://blosc.pydata.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/FrancescAlted/python-blosc/blob/master/LICENSES

for more details.

--
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Profiling (was GSoC : Performance parity between numpy arrays and Python scalars)

2013-05-02 Thread Francesc Alted
On 5/2/13 3:58 PM, Nathaniel Smith wrote:
> callgrind has the *fabulous* kcachegrind front-end, but it only
> measures memory access performance on a simulated machine, which is
> very useful sometimes (if you're trying to optimize cache locality),
> but there's no guarantee that the bottlenecks on its simulated machine
> are the same as the bottlenecks on your real machine.

Agreed, there is no guarantee, but my experience is that kcachegrind 
normally gives you a pretty decent view of cache faults and hence it can 
do pretty good predictions on how this affects your computations.  I 
have used this feature extensively for optimizing parts of the Blosc 
compressor, and I cannot be more happier (to the point that, if it were 
not for Valgrind, I could not figure out many interesting memory access 
optimizations).

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: numexpr 2.1 RC1

2013-04-14 Thread Francesc Alted


 Announcing Numexpr 2.1RC1


Numexpr is a fast numerical expression evaluator for NumPy.  With it,
expressions that operate on arrays (like "3*a+4*b") are accelerated
and use less memory than doing the same calculation in Python.

It wears multi-threaded capabilities, as well as support for Intel's
VML library, which allows for squeezing the last drop of performance
out of your multi-core processors.

What's new
==

This version adds compatibility for Python 3.  A bunch of thanks to 
Antonio Valentino for his excellent work on this.I apologize for taking 
so long in releasing his contributions.


In case you want to know more in detail what has changed in this
version, see:

http://code.google.com/p/numexpr/wiki/ReleaseNotes

or have a look at RELEASE_NOTES.txt in the tarball.

Where I can find Numexpr?
=

The project is hosted at Google code in:

http://code.google.com/p/numexpr/

This is a release candidate 1, so it will not be available on the PyPi 
repository.  I'll post it there when the final version will be released.


Share your experience
=

Let us know of any bugs, suggestions, gripes, kudos, etc. you may
have.


Enjoy!

--
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] timezones and datetime64

2013-04-04 Thread Francesc Alted
On 4/4/13 8:56 PM, Chris Barker - NOAA Federal wrote:
> On Thu, Apr 4, 2013 at 10:54 AM, Francesc Alted  wrote:
>
>> That makes a difference.  This can be specially important for creating
>> user-defined time origins:
>>
>> In []: np.array(int(1.5e9), dtype='datetime64[s]') + np.array(1,
>> dtype='timedelta64[ns]')
>> Out[]: numpy.datetime64('2017-07-14T04:40:00.1+0200')
> but that's worthless if you try it higher-resolution:
>
> In [40]: np.array(int(1.5e9), dtype='datetime64[s]')
> Out[40]: array(datetime.datetime(2017, 7, 14, 2, 40), dtype='datetime64[s]')
>
> # Start at 2017
>
> # add a picosecond:
> In [41]: np.array(int(1.5e9), dtype='datetime64[s]') + np.array(1,
> dtype='timedelta64[ps]')
> Out[41]: numpy.datetime64('1970-03-08T22:55:30.029526319105-0800')
>
> # get 1970???

This is clearly a bug.  Could you file a ticket please?

Also, using attoseconds is giving a weird behavior:

In []: np.array(int(1.5e9), dtype='datetime64[s]') + np.array(1, 
dtype='timedelta64[as]')
---
OverflowError Traceback (most recent call last)
 in ()
> 1 np.array(int(1.5e9), dtype='datetime64[s]') + np.array(1, 
dtype='timedelta64[as]')

OverflowError: Integer overflow getting a common metadata divisor for 
NumPy datetime metadata [s] and [as]

I would expect the attosecond to be happily ignored and nothing would be 
added.

>
> And even with nanoseconds, given the leap-second issues, etc, you
> really wouldn't want to do this anyway -- rather, keep your epoch
> close by.
>
> Now that I think about it -- being able to set your epoch could lessen
> the impact of leap-seconds for second-resolution as well.

Probably this is the way to go, yes.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] timezones and datetime64

2013-04-04 Thread Francesc Alted
On 4/4/13 7:01 PM, Chris Barker - NOAA Federal wrote:
> Francesc Alted wrote:
>> When Ivan and me were discussing that, I remember us deciding that such
>> a small units would be useful mainly for the timedelta datatype, which
>> is a relative, not absolute time.  We did not want to make short for
>> very precise time measurements, and this is why we decided to go with
>> attoseconds.
> I thought about that -- but if you have timedelta without datetime,
> you really just have an integer -- we haven't bought anything.

Well, it is not just an integer.  It is an integer with a time scale:

In []: np.array(1, dtype='timedelta64[us]') + np.array(1, 
dtype='timedelta64[ns]')
Out[]: numpy.timedelta64(1001,'ns')

That makes a difference.  This can be specially important for creating 
user-defined time origins:

In []: np.array(int(1.5e9), dtype='datetime64[s]') + np.array(1, 
dtype='timedelta64[ns]')
Out[]: numpy.datetime64('2017-07-14T04:40:00.1+0200')

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] timezones and datetime64

2013-04-04 Thread Francesc Alted
On 4/4/13 1:54 PM, Nathaniel Smith wrote:
> On Thu, Apr 4, 2013 at 12:52 AM, Chris Barker - NOAA Federal
>  wrote:
>> Thanks all for taking an interest. I need to think a bot more about
>> the options before commenting more, but:
>>
>> while we're at it:
>>
>> It seems very odd to me that datetime64 supports different units
>> (right down to  attosecond) but not different epochs. How can it
>> possible be useful to use nanoseconds, etc, but only right around
>> 1970? For that matter, why all the units at all? I can see the need
>> for nanosecond resolution, but not without changing the epoch -- so if
>> the epoch is fixed, why bother with different units? Using days (for
>> instance) rather than seconds doesn't save memory, as we're always
>> using 64 bits. It can't be common to need more than 2.9e12 years (OK,
>> that's not quite as old as the universe, so some cosmologists may need
>> it...)
> Another reason why it might be interesting to support different epochs
> is that many timeseries (e.g., the ones I work with) aren't linked to
> absolute time, but are instead "milliseconds since we turned on the
> recording equipment". You can reasonably represent these as timedeltas
> of course, but it'd be even more elegant to be able to be able to
> represent them as absolute times against an opaque epoch. In
> particular, when you have multiple recording tracks, only those which
> were recorded against the same epoch are actually commensurable --
> trying to do
>recording1_times[10] - recording2_times[10]
> is meaningless and should be an error.

I remember to be discussing this with some level of depth 5 years ago in 
this list, as we asked people about the convenience of including an 
user-defined 'epoch'.  We were calling it 'origin'. But apparently it 
was decided that this was not needed because timestamps+timedelta would 
be enough.  The NEP still reflects this discussion:

https://github.com/numpy/numpy/blob/master/doc/neps/datetime-proposal.rst#why-the-origin-metadata-disappeared

This is just an historical note, not that we can't change that again.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] timezones and datetime64

2013-04-04 Thread Francesc Alted
On 4/4/13 1:52 AM, Chris Barker - NOAA Federal wrote:
> Thanks all for taking an interest. I need to think a bot more about
> the options before commenting more, but:
>
> while we're at it:
>
> It seems very odd to me that datetime64 supports different units
> (right down to  attosecond) but not different epochs. How can it
> possible be useful to use nanoseconds, etc, but only right around
> 1970? For that matter, why all the units at all? I can see the need
> for nanosecond resolution, but not without changing the epoch -- so if
> the epoch is fixed, why bother with different units?


When Ivan and me were discussing that, I remember us deciding that such 
a small units would be useful mainly for the timedelta datatype, which 
is a relative, not absolute time.  We did not want to make short for 
very precise time measurements, and this is why we decided to go with 
attoseconds.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks

2013-03-13 Thread Francesc Alted
On 3/13/13 3:53 PM, Francesc Alted wrote:
> On 3/13/13 2:45 PM, Andrea Cimatoribus wrote:
>> Hi everybody, I hope this has not been discussed before, I couldn't 
>> find a solution elsewhere.
>> I need to read some binary data, and I am using numpy.fromfile to do 
>> this. Since the files are huge, and would make me run out of memory, 
>> I need to read data skipping some records (I am reading data recorded 
>> at high frequency, so basically I want to read subsampling).
> [clip]
>
> You can do a fid.seek(offset) prior to np.fromfile() and the it will 
> read from offset.  See the docstrings for `file.seek()` on how to use it.
>

Ups, you were already using file.seek().  Disregard, please.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] fast numpy.fromfile skipping data chunks

2013-03-13 Thread Francesc Alted
On 3/13/13 2:45 PM, Andrea Cimatoribus wrote:
> Hi everybody, I hope this has not been discussed before, I couldn't find a 
> solution elsewhere.
> I need to read some binary data, and I am using numpy.fromfile to do this. 
> Since the files are huge, and would make me run out of memory, I need to read 
> data skipping some records (I am reading data recorded at high frequency, so 
> basically I want to read subsampling).
[clip]

You can do a fid.seek(offset) prior to np.fromfile() and the it will 
read from offset.  See the docstrings for `file.seek()` on how to use it.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] aligned / unaligned structured dtype behavior

2013-03-08 Thread Francesc Alted
On 3/7/13 7:26 PM, Frédéric Bastien wrote:
> Hi,
>
> It is normal that unaligned access are slower. The hardware have been
> optimized for aligned access. So this is a user choice space vs speed.
> We can't go around that.

Well, my benchmarks apparently say that numexpr can get better 
performance when tackling computations on unaligned arrays (30% 
faster).  This puzzled me a bit yesterday, but after thinking a bit 
about what was happening, the explanation is clear to me now.

The aligned and unaligned arrays were not contiguous, as they had a gap 
between elements (a consequence of the layout of structure arrays): 8 
bytes for the aligned case and 1 byte for the packed one.  The hardware 
of modern machines fetches a complete cache line (64 bytes typically) 
whenever an element is accessed and that means that, even though we are 
only making use of one field in the computations, both fields are 
brought into cache.  That means that, for aligned object, 16 MB (16 
bytes * 1 million elements) are transmitted to the cache, while the 
unaligned object only have to transmit 9 MB (9 bytes * 1 million).  Of 
course, transmitting 16 MB is pretty much work than just 9 MB.

Now, the elements land in cache aligned for the aligned case and 
unaligned for the packed case, and as you say, unaligned access in cache 
is pretty slow for the CPU, and this is the reason why NumPy can take up 
to 4x more time to perform the computation.  So why numexpr is 
performing much better for the packed case?  Well, it turns out that 
numexpr has machinery to detect that an array is unaligned, and does an 
internal copy for every block that is brought to the cache to be 
computed.  This block size is between 1024 elements (8 KB for double 
precision) and 4096 elements when linked with VML support, and that 
means that a copy normally happens at L1 or L2 cache speed, which is 
much faster than memory-to-memory copy. After the copy numexpr can 
perform operations with aligned data at full CPU speed.  The paradox is 
that, by doing more copies, you may end performing faster computations.  
This is the joy of programming with memory hierarchy in mind.

This is to say that there is more in the equation than just if an array 
is aligned or not.  You must take in account how (and how much!) data 
travels from storage to CPU before making assumptions on the performance 
of your programs.

>   We can only minimize the cost of unaligned
> access in some cases, but not all and those optimization depend of the
> CPU. But newer CPU have lowered in cost of unaligned access.
>
> I'm surprised that Theano worked with the unaligned input. I added
> some check to make this raise an error, as we do not support that!
> Francesc, can you check if Theano give the good result? It is possible
> that someone (maybe me), just copy the input to an aligned ndarray
> when we receive an not aligned one. That could explain why it worked,
> but my memory tell me that we raise an error.

It seems to work for me:

In [10]: f = theano.function([a], a**2)

In [11]: f(baligned)
Out[11]: array([ 1.,  1.,  1., ...,  1.,  1.,  1.])

In [12]: f(bpacked)
Out[12]: array([ 1.,  1.,  1., ...,  1.,  1.,  1.])

In [13]: f2 = theano.function([a], a.sum())

In [14]: f2(baligned)
Out[14]: array(100.0)

In [15]: f2(bpacked)
Out[15]: array(100.0)


>
> As you saw in the number, this is a bad example for Theano as the
> function compiled is too fast . Their is more Theano overhead then
> computation time in that example. We have reduced recently the
> overhead, but we can do more to lower it.

Yeah.  I was mainly curious about how different packages handle 
unaligned arrays.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] aligned / unaligned structured dtype behavior

2013-03-07 Thread Francesc Alted
On 3/7/13 6:47 PM, Francesc Alted wrote:
> On 3/6/13 7:42 PM, Kurt Smith wrote:
>> And regarding performance, doing simple timings shows a 30%-ish
>> slowdown for unaligned operations:
>>
>> In [36]: %timeit packed_arr['b']**2
>> 100 loops, best of 3: 2.48 ms per loop
>>
>> In [37]: %timeit aligned_arr['b']**2
>> 1000 loops, best of 3: 1.9 ms per loop
>
> Hmm, that clearly depends on the architecture.  On my machine:
>
> In [1]: import numpy as np
>
> In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True)
>
> In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False)
>
> In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt)
>
> In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt)
>
> In [6]: baligned = aligned_arr['b']
>
> In [7]: bpacked = packed_arr['b']
>
> In [8]: %timeit baligned**2
> 1000 loops, best of 3: 1.96 ms per loop
>
> In [9]: %timeit bpacked**2
> 100 loops, best of 3: 7.84 ms per loop
>
> That is, the unaligned column is 4x slower (!).  numexpr allows 
> somewhat better results:
>
> In [11]: %timeit numexpr.evaluate('baligned**2')
> 1000 loops, best of 3: 1.13 ms per loop
>
> In [12]: %timeit numexpr.evaluate('bpacked**2')
> 1000 loops, best of 3: 865 us per loop

Just for completeness, here it is what Theano gets:

In [18]: import theano

In [20]: a = theano.tensor.vector()

In [22]: f = theano.function([a], a**2)

In [23]: %timeit f(baligned)
100 loops, best of 3: 7.74 ms per loop

In [24]: %timeit f(bpacked)
100 loops, best of 3: 12.6 ms per loop

So yeah, Theano is also slower for the unaligned case (but less than 2x 
in this case).

>
> Yes, in this case, the unaligned array goes faster (as much as 30%).  
> I think the reason is that numexpr optimizes the unaligned access by 
> doing a copy of the different chunks in internal buffers that fits in 
> L1 cache.  Apparently this is very beneficial in this case (not sure 
> why, though).
>
>>
>> Whereas summing shows just a 10%-ish slowdown:
>>
>> In [38]: %timeit packed_arr['b'].sum()
>> 1000 loops, best of 3: 1.29 ms per loop
>>
>> In [39]: %timeit aligned_arr['b'].sum()
>> 1000 loops, best of 3: 1.14 ms per loop
>
> On my machine:
>
> In [14]: %timeit baligned.sum()
> 1000 loops, best of 3: 1.03 ms per loop
>
> In [15]: %timeit bpacked.sum()
> 100 loops, best of 3: 3.79 ms per loop
>
> Again, the 4x slowdown is here.  Using numexpr:
>
> In [16]: %timeit numexpr.evaluate('sum(baligned)')
> 100 loops, best of 3: 2.16 ms per loop
>
> In [17]: %timeit numexpr.evaluate('sum(bpacked)')
> 100 loops, best of 3: 2.08 ms per loop

And with Theano:

In [26]: f2 = theano.function([a], a.sum())

In [27]: %timeit f2(baligned)
100 loops, best of 3: 2.52 ms per loop

In [28]: %timeit f2(bpacked)
100 loops, best of 3: 7.43 ms per loop

Again, the unaligned case is significantly slower (as much as 3x here!).

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] aligned / unaligned structured dtype behavior (was: GSOC 2013)

2013-03-07 Thread Francesc Alted
On 3/6/13 7:42 PM, Kurt Smith wrote:
> And regarding performance, doing simple timings shows a 30%-ish
> slowdown for unaligned operations:
>
> In [36]: %timeit packed_arr['b']**2
> 100 loops, best of 3: 2.48 ms per loop
>
> In [37]: %timeit aligned_arr['b']**2
> 1000 loops, best of 3: 1.9 ms per loop

Hmm, that clearly depends on the architecture.  On my machine:

In [1]: import numpy as np

In [2]: aligned_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=True)

In [3]: packed_dt = np.dtype([('a', 'i1'), ('b', 'i8')], align=False)

In [4]: aligned_arr = np.ones((10**6,), dtype=aligned_dt)

In [5]: packed_arr = np.ones((10**6,), dtype=packed_dt)

In [6]: baligned = aligned_arr['b']

In [7]: bpacked = packed_arr['b']

In [8]: %timeit baligned**2
1000 loops, best of 3: 1.96 ms per loop

In [9]: %timeit bpacked**2
100 loops, best of 3: 7.84 ms per loop

That is, the unaligned column is 4x slower (!).  numexpr allows somewhat 
better results:

In [11]: %timeit numexpr.evaluate('baligned**2')
1000 loops, best of 3: 1.13 ms per loop

In [12]: %timeit numexpr.evaluate('bpacked**2')
1000 loops, best of 3: 865 us per loop

Yes, in this case, the unaligned array goes faster (as much as 30%).  I 
think the reason is that numexpr optimizes the unaligned access by doing 
a copy of the different chunks in internal buffers that fits in L1 
cache.  Apparently this is very beneficial in this case (not sure why, 
though).

>
> Whereas summing shows just a 10%-ish slowdown:
>
> In [38]: %timeit packed_arr['b'].sum()
> 1000 loops, best of 3: 1.29 ms per loop
>
> In [39]: %timeit aligned_arr['b'].sum()
> 1000 loops, best of 3: 1.14 ms per loop

On my machine:

In [14]: %timeit baligned.sum()
1000 loops, best of 3: 1.03 ms per loop

In [15]: %timeit bpacked.sum()
100 loops, best of 3: 3.79 ms per loop

Again, the 4x slowdown is here.  Using numexpr:

In [16]: %timeit numexpr.evaluate('sum(baligned)')
100 loops, best of 3: 2.16 ms per loop

In [17]: %timeit numexpr.evaluate('sum(bpacked)')
100 loops, best of 3: 2.08 ms per loop

Again, the unaligned case is (sligthly better).  In this case numexpr is 
a bit slower that NumPy because sum() is not parallelized internally.  
Hmm, provided that, I'm wondering if some internal copies to L1 in NumPy 
could help improving unaligned performance. Worth a try?

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] GSOC 2013

2013-03-06 Thread Francesc Alted
On 3/5/13 7:14 PM, Kurt Smith wrote:
> On Tue, Mar 5, 2013 at 1:45 AM, Eric Firing  wrote:
>> On 2013/03/04 9:01 PM, Nicolas Rougier wrote:
>>>>> This made me think of a serious performance limitation of structured 
>>>>> dtypes: a
>>>>> structured dtype is always "packed", which may lead to terrible byte 
>>>>> alignment
>>>>> for common types.  For instance, `dtype([('a', 'u1'), ('b',
>>>>> 'u8')]).itemsize == 9`,
>>>>> meaning that the 8-byte integer is not aligned as an equivalent C-struct's
>>>>> would be, leading to all sorts of horrors at the cache and register level.
>> Doesn't the "align" kwarg of np.dtype do what you want?
>>
>> In [2]: dt = np.dtype(dict(names=['a', 'b'], formats=['u1', 'u8']),
>> align=True)
>>
>> In [3]: dt.itemsize
>> Out[3]: 16
> Thanks!  That's what I get for not checking before posting.
>
> Consider this my vote to make `aligned=True` the default.

I would not run too much.  The example above takes 9 bytes to host the 
structure, while a `aligned=True` will take 16 bytes.  I'd rather let 
the default as it is, and in case performance is critical, you can 
always copy the unaligned field to a new (homogeneous) array.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] pip install numpy throwing a lot of output.

2013-02-12 Thread Francesc Alted
On 2/12/13 3:18 PM, Daπid wrote:
> On 12 February 2013 14:58, Francesc Alted  wrote:
>> Yes, I think that's expected. Just to make sure, can you send some
>> excerpts of the errors that you are getting?
> Actually the errors are at the beginning of the process, so they are
> out of the reach of my terminal right now. Seems like pip doesn't keep
> a log in case of success.

Well, I think these errors are part of the auto-discovering process of 
the functions supported by the libraries in the hosting OS (kind of 
`autoconf`for Python), so they can be considered 'normal'.

>
> The ones I can see are mostly warnings of unused variables and
> functions, maybe this is the expected behaviour for a library? This
> errors come from a complete reinstall instead of the original upgrade
> (the cat closed the terminal, worst excuse ever!):
[clip]

These ones are not errors, but warnings. While it should be desirable to 
avoid any warning during the compilation process, not many libraries 
fulfill this (but patches for removing them are accepted).

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] pip install numpy throwing a lot of output.

2013-02-12 Thread Francesc Alted
On 2/12/13 1:37 PM, Daπid wrote:
> I have just upgraded numpy with pip on Linux 64 bits with Python 2.7,
> and I got *a lot* of output, so much it doesn't fit in the terminal.
> Most of it are gcc commands, but there are many different errors
> thrown by the compiler. Is this expected?

Yes, I think that's expected. Just to make sure, can you send some 
excerpts of the errors that you are getting?

>
> I am not too worried as the test suite passes, but pip is supposed to
> give only meaningful output (or at least, this is what the creators
> intended).

Well, pip needs to compile the libraries prior to install them, so 
compile messages are meaningful. Another question would be to reduce the 
amount of compile messages by default in NumPy, but I don't think this 
is realistic (and even not desirable).

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.7.0 release

2013-02-10 Thread Francesc Alted
Exciting stuff. Thanks a lot to you and everybody implied in the release
for an amazing job.

Francesc
El 10/02/2013 2:25, "Ondřej Čertík"  va escriure:

> Hi,
>
> I'm pleased to announce the availability of the final release of
> NumPy 1.7.0.
>
> Sources and binary installers can be found at
> https://sourceforge.net/projects/numpy/files/NumPy/1.7.0/
>
> This release is equivalent to the 1.7.0rc2 release, since no more problems
> were found. For release notes see below.
>
> I would like to thank everybody who contributed to this release.
>
> Cheers,
> Ondrej
>
>
> =
> NumPy 1.7.0 Release Notes
> =
>
> This release includes several new features as well as numerous bug fixes
> and
> refactorings. It supports Python 2.4 - 2.7 and 3.1 - 3.3 and is the last
> release that supports Python 2.4 - 2.5.
>
> Highlights
> ==
>
> * ``where=`` parameter to ufuncs (allows the use of boolean arrays to
> choose
>   where a computation should be done)
> * ``vectorize`` improvements (added 'excluded' and 'cache' keyword, general
>   cleanup and bug fixes)
> * ``numpy.random.choice`` (random sample generating function)
>
>
> Compatibility notes
> ===
>
> In a future version of numpy, the functions np.diag, np.diagonal, and the
> diagonal method of ndarrays will return a view onto the original array,
> instead of producing a copy as they do now. This makes a difference if you
> write to the array returned by any of these functions. To facilitate this
> transition, numpy 1.7 produces a FutureWarning if it detects that you may
> be attempting to write to such an array. See the documentation for
> np.diagonal for details.
>
> Similar to np.diagonal above, in a future version of numpy, indexing a
> record array by a list of field names will return a view onto the original
> array, instead of producing a copy as they do now. As with np.diagonal,
> numpy 1.7 produces a FutureWarning if it detects that you may be attempting
> to write to such an array. See the documentation for array indexing for
> details.
>
> In a future version of numpy, the default casting rule for UFunc out=
> parameters will be changed from 'unsafe' to 'same_kind'. (This also applies
> to in-place operations like a += b, which is equivalent to np.add(a, b,
> out=a).) Most usages which violate the 'same_kind' rule are likely bugs, so
> this change may expose previously undetected errors in projects that depend
> on NumPy. In this version of numpy, such usages will continue to succeed,
> but will raise a DeprecationWarning.
>
> Full-array boolean indexing has been optimized to use a different,
> optimized code path.   This code path should produce the same results,
> but any feedback about changes to your code would be appreciated.
>
> Attempting to write to a read-only array (one with ``arr.flags.writeable``
> set to ``False``) used to raise either a RuntimeError, ValueError, or
> TypeError inconsistently, depending on which code path was taken. It now
> consistently raises a ValueError.
>
> The .reduce functions evaluate some reductions in a different order
> than in previous versions of NumPy, generally providing higher performance.
> Because of the nature of floating-point arithmetic, this may subtly change
> some results, just as linking NumPy to a different BLAS implementations
> such as MKL can.
>
> If upgrading from 1.5, then generally in 1.6 and 1.7 there have been
> substantial code added and some code paths altered, particularly in the
> areas of type resolution and buffered iteration over universal functions.
> This might have an impact on your code particularly if you relied on
> accidental behavior in the past.
>
> New features
> 
>
> Reduction UFuncs Generalize axis= Parameter
> ---
>
> Any ufunc.reduce function call, as well as other reductions like sum, prod,
> any, all, max and min support the ability to choose a subset of the axes to
> reduce over. Previously, one could say axis=None to mean all the axes or
> axis=# to pick a single axis.  Now, one can also say axis=(#,#) to pick a
> list of axes for reduction.
>
> Reduction UFuncs New keepdims= Parameter
> 
>
> There is a new keepdims= parameter, which if set to True, doesn't throw
> away the reduction axes but instead sets them to have size one.  When this
> option is set, the reduction result will broadcast correctly to the
> original operand which was reduced.
>
> Datetime support
> 
>
> .. note:: The datetime API is *experimental* in 1.7.0, and may undergo
> changes
>in future versions of NumPy.
>
> There have been a lot of fixes and enhancements to datetime64 compared
> to NumPy 1.6:
>
> * the parser is quite strict about only accepting ISO 8601 dates, with a
> few
>   convenience extensions
> * converts between units correctly
> * datetime arithmetic works correctly
> * business day functionality (allows t

Re: [Numpy-discussion] Byte aligned arrays

2012-12-21 Thread Francesc Alted
On 12/21/12 1:35 PM, Dag Sverre Seljebotn wrote:
> On 12/20/2012 03:23 PM, Francesc Alted wrote:
>> On 12/20/12 9:53 AM, Henry Gomersall wrote:
>>> On Wed, 2012-12-19 at 19:03 +0100, Francesc Alted wrote:
>>>> The only scenario that I see that this would create unaligned arrays
>>>> is
>>>> for machines having AVX.  But provided that the Intel architecture is
>>>> making great strides in fetching unaligned data, I'd be surprised
>>>> that
>>>> the difference in performance would be even noticeable.
>>>>
>>>> Can you tell us which difference in performance are you seeing for an
>>>> AVX-aligned array and other that is not AVX-aligned?  Just curious.
>>> Further to this point, from an Intel article...
>>>
>>> http://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors
>>>
>>> "Aligning data to vector length is always recommended. When using Intel
>>> SSE and Intel SSE2 instructions, loaded data should be aligned to 16
>>> bytes. Similarly, to achieve best results use Intel AVX instructions on
>>> 32-byte vectors that are 32-byte aligned. The use of Intel AVX
>>> instructions on unaligned 32-byte vectors means that every second load
>>> will be across a cache-line split, since the cache line is 64 bytes.
>>> This doubles the cache line split rate compared to Intel SSE code that
>>> uses 16-byte vectors. A high cache-line split rate in memory-intensive
>>> code is extremely likely to cause performance degradation. For that
>>> reason, it is highly recommended to align the data to 32 bytes for use
>>> with Intel AVX."
>>>
>>> Though it would be nice to put together a little example of this!
>> Indeed, an example is what I was looking for.  So provided that I have
>> access to an AVX capable machine (having 6 physical cores), and that MKL
>> 10.3 has support for AVX, I have made some comparisons using the
>> Anaconda Python distribution (it ships with most packages linked against
>> MKL 10.3).
>>
>> Here it is a first example using a DGEMM operation.  First using a NumPy
>> that is not turbo-loaded with MKL:
>>
>> In [34]: a = np.linspace(0,1,1e7)
>>
>> In [35]: b = a.reshape(1000, 1)
>>
>> In [36]: c = a.reshape(1, 1000)
>>
>> In [37]: time d = np.dot(b,c)
>> CPU times: user 7.56 s, sys: 0.03 s, total: 7.59 s
>> Wall time: 7.63 s
>>
>> In [38]: time d = np.dot(c,b)
>> CPU times: user 78.52 s, sys: 0.18 s, total: 78.70 s
>> Wall time: 78.89 s
>>
>> This is getting around 2.6 GFlop/s.  Now, with a MKL 10.3 NumPy and
>> AVX-unaligned data:
>>
>> In [7]: p = ctypes.create_string_buffer(int(8e7)); hex(ctypes.addressof(p))
>> Out[7]: '0x7fcdef3b4010'  # 16 bytes alignment
>>
>> In [8]: a = np.ndarray(1e7, "f8", p)
>>
>> In [9]: a[:] = np.linspace(0,1,1e7)
>>
>> In [10]: b = a.reshape(1000, 1)
>>
>> In [11]: c = a.reshape(1, 1000)
>>
>> In [37]: %timeit d = np.dot(b,c)
>> 10 loops, best of 3: 164 ms per loop
>>
>> In [38]: %timeit d = np.dot(c,b)
>> 1 loops, best of 3: 1.65 s per loop
>>
>> That is around 120 GFlop/s (i.e. almost 50x faster than without MKL/AVX).
>>
>> Now, using MKL 10.3 and AVX-aligned data:
>>
>> In [21]: p2 = ctypes.create_string_buffer(int(8e7+16));
>> hex(ctypes.addressof(p))
>> Out[21]: '0x7f8cb9598010'
>>
>> In [22]: a2 = np.ndarray(1e7+2, "f8", p2)[2:]  # skip the first 16 bytes
>> (now is 32-bytes aligned)
>>
>> In [23]: a2[:] = np.linspace(0,1,1e7)
>>
>> In [24]: b2 = a2.reshape(1000, 1)
>>
>> In [25]: c2 = a2.reshape(1, 1000)
>>
>> In [35]: %timeit d2 = np.dot(b2,c2)
>> 10 loops, best of 3: 163 ms per loop
>>
>> In [36]: %timeit d2 = np.dot(c2,b2)
>> 1 loops, best of 3: 1.67 s per loop
>>
>> So, again, around 120 GFlop/s, and the difference wrt to unaligned AVX
>> data is negligible.
>>
>> One may argue that DGEMM is CPU-bounded and that memory access plays
>> little role here, and that is certainly true.  So, let's go with a more
>> memory-bounded problem, like computing a transcendental function with
>> numexpr.  First with a with NumPy and numexpr with no MKL support:
>>
>> In [8]: a = np.linspace(0,1,1e8)
>>
>> In [9]: %time b = np.sin(a)
>> CPU times: user 1.20 s, sys: 0.22 s, total: 1.42 s
>> Wall time: 1.4

Re: [Numpy-discussion] Byte aligned arrays

2012-12-21 Thread Francesc Alted
On 12/21/12 11:58 AM, Henry Gomersall wrote:
> On Fri, 2012-12-21 at 11:34 +0100, Francesc Alted wrote:
>>> Also this convolution code:
>>> https://github.com/hgomersall/SSE-convolution/blob/master/convolve.c
>>>
>>> Shows a small but repeatable speed-up (a few %) when using some
>> aligned
>>> loads (as many as I can work out to use!).
>> Okay, so a 15% is significant, yes.  I'm still wondering why I did
>> not
>> get any speedup at all using MKL, but probably the reason is that it
>> manages the unaligned corners of the datasets first, and then uses an
>> aligned access for the rest of the data (but just guessing here).
> With SSE in that convolution code example above (in which all alignments
> need be considered for each output element), I note a significant
> speedup by creating 4 copies of the float input array using memcopy,
> each shifted by 1 float (so the 5th element is aligned again). Despite
> all the extra copies its still quicker than using an unaligned load.
> However, when one tries the same trick with 8 copies for AVX it's
> actually slower than the SSE case.
>
> The fastest AVX (and any) implementation I have so far is with
> 16-aligned arrays (made with 4 copies as with SSE), with alternate
> aligned and unaligned loads (which is always at worst 16-byte aligned).
>
> Fascinating stuff!

Yes, to say the least.  And it supports the fact that, when fine tuning 
memory access performance, there is no replacement for experimentation 
(in some weird ways many times :)

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Byte aligned arrays

2012-12-21 Thread Francesc Alted
On 12/20/12 7:35 PM, Henry Gomersall wrote:
> On Thu, 2012-12-20 at 15:23 +0100, Francesc Alted wrote:
>> On 12/20/12 9:53 AM, Henry Gomersall wrote:
>>> On Wed, 2012-12-19 at 19:03 +0100, Francesc Alted wrote:
>>>> The only scenario that I see that this would create unaligned
>> arrays
>>>> is
>>>> for machines having AVX.  But provided that the Intel architecture
>> is
>>>> making great strides in fetching unaligned data, I'd be surprised
>>>> that
>>>> the difference in performance would be even noticeable.
>>>>
>>>> Can you tell us which difference in performance are you seeing for
>> an
>>>> AVX-aligned array and other that is not AVX-aligned?  Just curious.
>>> Further to this point, from an Intel article...
>>>
>>>
>> http://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors
>>> "Aligning data to vector length is always recommended. When using
>> Intel
>>> SSE and Intel SSE2 instructions, loaded data should be aligned to 16
>>> bytes. Similarly, to achieve best results use Intel AVX instructions
>> on
>>> 32-byte vectors that are 32-byte aligned. The use of Intel AVX
>>> instructions on unaligned 32-byte vectors means that every second
>> load
>>> will be across a cache-line split, since the cache line is 64 bytes.
>>> This doubles the cache line split rate compared to Intel SSE code
>> that
>>> uses 16-byte vectors. A high cache-line split rate in
>> memory-intensive
>>> code is extremely likely to cause performance degradation. For that
>>> reason, it is highly recommended to align the data to 32 bytes for
>> use
>>> with Intel AVX."
>>>
>>> Though it would be nice to put together a little example of this!
>> Indeed, an example is what I was looking for.  So provided that I
>> have
>> access to an AVX capable machine (having 6 physical cores), and that
>> MKL
>> 10.3 has support for AVX, I have made some comparisons using the
>> Anaconda Python distribution (it ships with most packages linked
>> against
>> MKL 10.3).
> 
>
>> All in all, it is not clear that AVX alignment would have an
>> advantage,
>> even for memory-bounded problems.  But of course, if Intel people are
>> saying that AVX alignment is important is because they have use cases
>> for asserting this.  It is just that I'm having a difficult time to
>> find
>> these cases.
> Thanks for those examples, they were very interesting. I managed to
> temporarily get my hands on a machine with AVX and I have shown some
> speed-up with aligned arrays.
>
> FFT (using my wrappers) gives about a 15% speedup.
>
> Also this convolution code:
> https://github.com/hgomersall/SSE-convolution/blob/master/convolve.c
>
> Shows a small but repeatable speed-up (a few %) when using some aligned
> loads (as many as I can work out to use!).

Okay, so a 15% is significant, yes.  I'm still wondering why I did not 
get any speedup at all using MKL, but probably the reason is that it 
manages the unaligned corners of the datasets first, and then uses an 
aligned access for the rest of the data (but just guessing here).

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Byte aligned arrays

2012-12-20 Thread Francesc Alted
On 12/20/12 9:53 AM, Henry Gomersall wrote:
> On Wed, 2012-12-19 at 19:03 +0100, Francesc Alted wrote:
>> The only scenario that I see that this would create unaligned arrays
>> is
>> for machines having AVX.  But provided that the Intel architecture is
>> making great strides in fetching unaligned data, I'd be surprised
>> that
>> the difference in performance would be even noticeable.
>>
>> Can you tell us which difference in performance are you seeing for an
>> AVX-aligned array and other that is not AVX-aligned?  Just curious.
> Further to this point, from an Intel article...
>
> http://software.intel.com/en-us/articles/practical-intel-avx-optimization-on-2nd-generation-intel-core-processors
>
> "Aligning data to vector length is always recommended. When using Intel
> SSE and Intel SSE2 instructions, loaded data should be aligned to 16
> bytes. Similarly, to achieve best results use Intel AVX instructions on
> 32-byte vectors that are 32-byte aligned. The use of Intel AVX
> instructions on unaligned 32-byte vectors means that every second load
> will be across a cache-line split, since the cache line is 64 bytes.
> This doubles the cache line split rate compared to Intel SSE code that
> uses 16-byte vectors. A high cache-line split rate in memory-intensive
> code is extremely likely to cause performance degradation. For that
> reason, it is highly recommended to align the data to 32 bytes for use
> with Intel AVX."
>
> Though it would be nice to put together a little example of this!

Indeed, an example is what I was looking for.  So provided that I have 
access to an AVX capable machine (having 6 physical cores), and that MKL 
10.3 has support for AVX, I have made some comparisons using the 
Anaconda Python distribution (it ships with most packages linked against 
MKL 10.3).

Here it is a first example using a DGEMM operation.  First using a NumPy 
that is not turbo-loaded with MKL:

In [34]: a = np.linspace(0,1,1e7)

In [35]: b = a.reshape(1000, 1)

In [36]: c = a.reshape(1, 1000)

In [37]: time d = np.dot(b,c)
CPU times: user 7.56 s, sys: 0.03 s, total: 7.59 s
Wall time: 7.63 s

In [38]: time d = np.dot(c,b)
CPU times: user 78.52 s, sys: 0.18 s, total: 78.70 s
Wall time: 78.89 s

This is getting around 2.6 GFlop/s.  Now, with a MKL 10.3 NumPy and 
AVX-unaligned data:

In [7]: p = ctypes.create_string_buffer(int(8e7)); hex(ctypes.addressof(p))
Out[7]: '0x7fcdef3b4010'  # 16 bytes alignment

In [8]: a = np.ndarray(1e7, "f8", p)

In [9]: a[:] = np.linspace(0,1,1e7)

In [10]: b = a.reshape(1000, 1)

In [11]: c = a.reshape(1, 1000)

In [37]: %timeit d = np.dot(b,c)
10 loops, best of 3: 164 ms per loop

In [38]: %timeit d = np.dot(c,b)
1 loops, best of 3: 1.65 s per loop

That is around 120 GFlop/s (i.e. almost 50x faster than without MKL/AVX).

Now, using MKL 10.3 and AVX-aligned data:

In [21]: p2 = ctypes.create_string_buffer(int(8e7+16)); 
hex(ctypes.addressof(p))
Out[21]: '0x7f8cb9598010'

In [22]: a2 = np.ndarray(1e7+2, "f8", p2)[2:]  # skip the first 16 bytes 
(now is 32-bytes aligned)

In [23]: a2[:] = np.linspace(0,1,1e7)

In [24]: b2 = a2.reshape(1000, 1)

In [25]: c2 = a2.reshape(1, 1000)

In [35]: %timeit d2 = np.dot(b2,c2)
10 loops, best of 3: 163 ms per loop

In [36]: %timeit d2 = np.dot(c2,b2)
1 loops, best of 3: 1.67 s per loop

So, again, around 120 GFlop/s, and the difference wrt to unaligned AVX 
data is negligible.

One may argue that DGEMM is CPU-bounded and that memory access plays 
little role here, and that is certainly true.  So, let's go with a more 
memory-bounded problem, like computing a transcendental function with 
numexpr.  First with a with NumPy and numexpr with no MKL support:

In [8]: a = np.linspace(0,1,1e8)

In [9]: %time b = np.sin(a)
CPU times: user 1.20 s, sys: 0.22 s, total: 1.42 s
Wall time: 1.42 s

In [10]: import numexpr as ne

In [12]: %time b = ne.evaluate("sin(a)")
CPU times: user 1.42 s, sys: 0.27 s, total: 1.69 s
Wall time: 0.37 s

This time is around 4x faster than regular 'sin' in libc, and about the 
same speed than a memcpy():

In [13]: %time c = a.copy()
CPU times: user 0.19 s, sys: 0.20 s, total: 0.39 s
Wall time: 0.39 s

Now, with a MKL-aware numexpr and non-AVX alignment:

In [8]: p = ctypes.create_string_buffer(int(8e8)); hex(ctypes.addressof(p))
Out[8]: '0x7fce435da010'  # 16 bytes alignment

In [9]: a = np.ndarray(1e8, "f8", p)

In [10]: a[:] = np.linspace(0,1,1e8)

In [11]: %time b = ne.evaluate("sin(a)")
CPU times: user 0.44 s, sys: 0.27 s, total: 0.71 s
Wall time: 0.15 s

That is, more than 2x faster than a memcpy() in this system, meaning 
that the problem is truly memory-bounded.  So now, with an AVX aligned 
buffer:

In [14]: a2 = a[2:]  # skip the first 16 bytes

In [15]: %time b = ne.evaluate(&qu

Re: [Numpy-discussion] Byte aligned arrays

2012-12-19 Thread Francesc Alted
On 12/19/12 5:47 PM, Henry Gomersall wrote:
> On Wed, 2012-12-19 at 15:57 +, Nathaniel Smith wrote:
>> Not sure which interface is more useful to users. On the one hand,
>> using funny dtypes makes regular non-SIMD access more cumbersome, and
>> it forces your array size to be a multiple of the SIMD word size,
>> which might be inconvenient if your code is smart enough to handle
>> arbitrary-sized arrays with partial SIMD acceleration (i.e., using
>> SIMD for most of the array, and then a slow path to handle any partial
>> word at the end). OTOH, if your code *is* that smart, you should
>> probably just make it smart enough to handle a partial word at the
>> beginning as well and then you won't need any special alignment in the
>> first place, and representing each SIMD word as a single numpy scalar
>> is an intuitively appealing model of how SIMD works. OTOOH, just
>> adding a single argument np.array() is a much simpler to explain than
>> some elaborate scheme involving the creation of special custom dtypes.
> If it helps, my use-case is in wrapping the FFTW library. This _is_
> smart enough to deal with unaligned arrays, but it just results in a
> performance penalty. In the case of an FFT, there are clearly going to
> be issues with the powers of two indices in the array not lying on a
> suitable n-byte boundary (which would be the case with a misaligned
> array), but I imagine it's not unique.
>
> The other point is that it's easy to create a suitable power of two
> array that should always bypass any special case unaligned code (e.g.
> with floats, any multiple of 4 array length will fill every 16-byte
> word).
>
> Finally, I think there is significant value in auto-aligning the array
> based on an appropriate inspection of the cpu capabilities (or
> alternatively, a function that reports back the appropriate SIMD
> alignment). Again, this makes it easier to wrap libraries that may
> function with any alignment, but benefit from optimum alignment.

Hmm, NumPy seems to return data blocks that are aligned to 16 bytes on 
systems (Linux and Mac OSX):

In []: np.empty(1).data
Out[]: 

In []: np.empty(1).data
Out[]: 

In []: np.empty(1).data
Out[]: 

In []: np.empty(1).data
Out[]: 

[Check that the last digit in the addresses above is always 0]

The only scenario that I see that this would create unaligned arrays is 
for machines having AVX.  But provided that the Intel architecture is 
making great strides in fetching unaligned data, I'd be surprised that 
the difference in performance would be even noticeable.

Can you tell us which difference in performance are you seeing for an 
AVX-aligned array and other that is not AVX-aligned?  Just curious.

-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


  1   2   3   4   5   6   >