Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-06-01 Thread Travis Oliphant
Hi all,

At Continuum we are trying to coordinate with Intel about releasing our
patches from Accelerate upstream as well rather than having them redo
things we have already done but have just not been able to open source yet.


Accelerate also uses GPU accelerated FFTs and it would be nice if there
were a supported NumPy-way of plugging in these optimized approaches.
 This is not a trivial thing to do, though and there are a lot of design
choices.

We have been giving away Accelerate to academics since it was released but
have asked companies to pay for it as a means of generating money to
support open source.Several things that used to be in Accelerate only
are now already in open-source (e.g. cuda.jit, guvectorize, target='cuda'
and target='parallel' in numba.vectorize). I expect this trend will
continue.   The FFT enhancements are another thing that are on the list of
things to make open source.

I for one, welcome Intel's contributions and am enthusiastic about their
joining the Python development community.   In many cases it would be
better if they would just pay a company that already has built and tested
this capability to release it then develop things themselves yet again.
 Any encouragement that can be provided to Intel to encourage them in this
direction would help.

Many companies are now supporting open-source.   Even those that sell some
software are still contributing overall to ensure that the total amount of
useful open-source software available is increasing.

Best,

-Travis





On Wed, Jun 1, 2016 at 7:42 PM, Nathaniel Smith  wrote:

> On Jun 1, 2016 4:47 PM, "David Cournapeau"  wrote:
> >
> >
> >
> > On Tue, May 31, 2016 at 10:36 PM, Sturla Molden 
> wrote:
> >>
> >> Joseph Martinot-Lagarde  wrote:
> >>
> >> > The problem with FFTW is that its license is more restrictive (GPL),
> and
> >> > because of this may not be suitable everywhere numpy.fft is.
> >>
> >> A lot of us use NumPy linked with MKL or Accelerate, both of which have
> >> some really nifty FFTs. And the license issue is hardly any worse than
> >> linking with them for BLAS and LAPACK, which we do anyway. We could
> extend
> >> numpy.fft to use MKL or Accelerate when they are available.
> >
> >
> > That's what we used to do in scipy, but it was a PITA to maintain.
> Contrary to blas/lapack, fft does not have a standard API, hence exposing a
> consistent API in python, including data layout involved quite a bit of
> work.
> >
> > It is better to expose those through 3rd party APIs.
>
> Fwiw Intel's new python distribution thing has numpy patched to use mkl
> for fft, and they're interested in pushing the relevant changes upstream.
>
> I have no idea how maintainable their patches are, since I haven't seen
> them -- this is just from taking to people here at pycon.
>
> -n
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 

*Travis Oliphant, PhD*
*Co-founder and CEO*


@teoliphant
512-222-5440
http://www.continuum.io
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-06-01 Thread Nathaniel Smith
On Jun 1, 2016 4:47 PM, "David Cournapeau"  wrote:
>
>
>
> On Tue, May 31, 2016 at 10:36 PM, Sturla Molden 
wrote:
>>
>> Joseph Martinot-Lagarde  wrote:
>>
>> > The problem with FFTW is that its license is more restrictive (GPL),
and
>> > because of this may not be suitable everywhere numpy.fft is.
>>
>> A lot of us use NumPy linked with MKL or Accelerate, both of which have
>> some really nifty FFTs. And the license issue is hardly any worse than
>> linking with them for BLAS and LAPACK, which we do anyway. We could
extend
>> numpy.fft to use MKL or Accelerate when they are available.
>
>
> That's what we used to do in scipy, but it was a PITA to maintain.
Contrary to blas/lapack, fft does not have a standard API, hence exposing a
consistent API in python, including data layout involved quite a bit of
work.
>
> It is better to expose those through 3rd party APIs.

Fwiw Intel's new python distribution thing has numpy patched to use mkl for
fft, and they're interested in pushing the relevant changes upstream.

I have no idea how maintainable their patches are, since I haven't seen
them -- this is just from taking to people here at pycon.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-06-01 Thread David Cournapeau
On Tue, May 31, 2016 at 10:36 PM, Sturla Molden 
wrote:

> Joseph Martinot-Lagarde  wrote:
>
> > The problem with FFTW is that its license is more restrictive (GPL), and
> > because of this may not be suitable everywhere numpy.fft is.
>
> A lot of us use NumPy linked with MKL or Accelerate, both of which have
> some really nifty FFTs. And the license issue is hardly any worse than
> linking with them for BLAS and LAPACK, which we do anyway. We could extend
> numpy.fft to use MKL or Accelerate when they are available.
>

That's what we used to do in scipy, but it was a PITA to maintain. Contrary
to blas/lapack, fft does not have a standard API, hence exposing a
consistent API in python, including data layout involved quite a bit of
work.

It is better to expose those through 3rd party APIs.

David


> Sturla
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-06-01 Thread Lion Krischer
Seems so.

numpy/fft/__init__.py

when installed with conda contains a thin optional wrapper around
mklfft, e.g. this here:

https://docs.continuum.io/accelerate/mkl_fft

It is part of the accelerate package from continuum and thus not free.


Cheers!

Lion



On 01/06/16 09:44, Gregor Thalhammer wrote:
> 
>> Am 31.05.2016 um 23:36 schrieb Sturla Molden :
>>
>> Joseph Martinot-Lagarde  wrote:
>>
>>> The problem with FFTW is that its license is more restrictive (GPL), and
>>> because of this may not be suitable everywhere numpy.fft is.
>>
>> A lot of us use NumPy linked with MKL or Accelerate, both of which have
>> some really nifty FFTs. And the license issue is hardly any worse than
>> linking with them for BLAS and LAPACK, which we do anyway. We could extend
>> numpy.fft to use MKL or Accelerate when they are available.
> 
> It seems the anaconda numpy binaries do already use MKL for fft:
> 
> In [2]: np.fft.using_mklfft
> Out[2]: True
> 
> Is this based on a proprietary patch of numpy?
> 
> Gregor 
> 
>>
>> Sturla
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-06-01 Thread Gregor Thalhammer

> Am 31.05.2016 um 23:36 schrieb Sturla Molden :
> 
> Joseph Martinot-Lagarde  wrote:
> 
>> The problem with FFTW is that its license is more restrictive (GPL), and
>> because of this may not be suitable everywhere numpy.fft is.
> 
> A lot of us use NumPy linked with MKL or Accelerate, both of which have
> some really nifty FFTs. And the license issue is hardly any worse than
> linking with them for BLAS and LAPACK, which we do anyway. We could extend
> numpy.fft to use MKL or Accelerate when they are available.

It seems the anaconda numpy binaries do already use MKL for fft:

In [2]: np.fft.using_mklfft
Out[2]: True

Is this based on a proprietary patch of numpy?

Gregor 

> 
> Sturla
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-05-31 Thread Marten van Kerkwijk
>
> A lot of us use NumPy linked with MKL or Accelerate, both of which have
> some really nifty FFTs. And the license issue is hardly any worse than
> linking with them for BLAS and LAPACK, which we do anyway. We could extend
> numpy.fft to use MKL or Accelerate when they are available.
>

That would be wonderful! Especially if one can also remove the automatic
cast to double (as I'm analysing 2-bit VLBI data, getting to 64 bit is
overkill...).

-- Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-05-31 Thread Sturla Molden
Joseph Martinot-Lagarde  wrote:
 
> The problem with FFTW is that its license is more restrictive (GPL), and
> because of this may not be suitable everywhere numpy.fft is.

A lot of us use NumPy linked with MKL or Accelerate, both of which have
some really nifty FFTs. And the license issue is hardly any worse than
linking with them for BLAS and LAPACK, which we do anyway. We could extend
numpy.fft to use MKL or Accelerate when they are available.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-05-31 Thread Sturla Molden
Lion Krischer  wrote:

> I added a slightly more comprehensive benchmark to the PR. Please have a
> look. It tests the total time for 100 FFTs with and without cache. It is
> over 30 percent faster with cache which it totally worth it in my
> opinion as repeated FFTs of the same size are a very common use case.

All the calls to trancendental functions are stored in the cache. Without a
cache, we get excessive calls to sin(x) and cos(x) whenever FFTs of the
same size are repeated. This can indeed matter at lot.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-05-30 Thread Lion Krischer


On 30/05/16 10:07, Joseph Martinot-Lagarde wrote:
> Marten van Kerkwijk  gmail.com> writes:
> 
>> I did a few simple timing tests (see comment in PR), which suggests it is
> hardly worth having the cache. Indeed, if one really worries about speed,
> one should probably use pyFFTW (scipy.fft is a bit faster too, but at least
> for me the way real FFT values are stored is just too inconvenient). So, my
> suggestion would be to do away with the cache altogether. 


I added a slightly more comprehensive benchmark to the PR. Please have a
look. It tests the total time for 100 FFTs with and without cache. It is
over 30 percent faster with cache which it totally worth it in my
opinion as repeated FFTs of the same size are a very common use case.

Also many people will not have enough knowledge to use FFTW or some
other FFT implementation.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-05-30 Thread Lion Krischer

> You can backport the pure Python version of lru_cache for Python 2 (or
> vendor the backport done here:
> https://pypi.python.org/pypi/backports.functools_lru_cache/).
> The advantage is that lru_cache is C-accelerated in Python 3.5 and
> upwards...

That's a pretty big back-port. The speed also does not matter for this
particular use-case: Time for the actual FFT will dominate by far. The
lru_cache decorator can furthermore only limit the cache size by item
count and not size in memory as the proposed solution does. I think the
downsides outweigh the advantages of being able to use functionality
from the stdlib.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-05-30 Thread Joseph Martinot-Lagarde
Marten van Kerkwijk  gmail.com> writes:

> I did a few simple timing tests (see comment in PR), which suggests it is
hardly worth having the cache. Indeed, if one really worries about speed,
one should probably use pyFFTW (scipy.fft is a bit faster too, but at least
for me the way real FFT values are stored is just too inconvenient). So, my
suggestion would be to do away with the cache altogether. 

The problem with FFTW is that its license is more restrictive (GPL), and
because of this may not be suitable everywhere numpy.fft is.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-05-30 Thread Antoine Pitrou
On Sat, 28 May 2016 20:19:27 +0200
Sebastian Berg  wrote:
> 
> The complexity addition is a bit annoying I must admit, on python 3
> functools.lru_cache could be another option, but only there.

You can backport the pure Python version of lru_cache for Python 2 (or
vendor the backport done here:
https://pypi.python.org/pypi/backports.functools_lru_cache/).
The advantage is that lru_cache is C-accelerated in Python 3.5 and
upwards...

Regards

Antoine.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-05-29 Thread Marten van Kerkwijk
Hi,

I did a few simple timing tests (see comment in PR), which suggests it is
hardly worth having the cache. Indeed, if one really worries about speed,
one should probably use pyFFTW (scipy.fft is a bit faster too, but at least
for me the way real FFT values are stored is just too inconvenient). So, my
suggestion would be to do away with the cache altogether.

If we do keep it, I think the approach in the PR is nice, but I would
advocate setting both a size and number limit (e.g., by default no more
than 8 entries or so, which should cover most repetitive use cases).

All the best,

Marten

p.s. I do like having a quick fft routine in numpy. My main gripe is that
it always casts to float64/complex128 rather than sticking with the input
dtype. Hope to get around to making  a PR for that...
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-05-28 Thread Sebastian Berg
On Fr, 2016-05-27 at 22:51 +0200, Lion Krischer wrote:
> Hi all,
> 
> I was told to take this to the mailing list. Relevant pull request:
> https://github.com/numpy/numpy/pull/7686
> 
> 
> NumPy's FFT implementation caches some form of execution plan for
> each
> encountered input data length. This is currently implemented as a
> simple
> dictionary which can grow without bounds. Calculating lots of
> different
> FFTs thus cause a memory leak from the users' perspective. We
> encountered a real world situation where this is an issue.
> 
> The PR linked above proposes to replace the simple dictionary with an
> LRU (least recently used) cache. It will remove the least recently
> used
> pieces of data if it grows beyond a specified size (currently an
> arbitrary limit of 100 MB per cache). Thus almost all users will
> still
> benefit from the caches but their total memory size is now limited.
> 
> 
> Things to consider:
> 
> * This adds quite some additional complexity but I could not find a
> simple way to achieve the same result.
> * What is a good limit on cache size? I used 100 MB because it works
> for
> my uses cases.
> 

I am +1 in principle, since I don't like that the cache might just grow
forever and I see that as a bug right now personally. One that rarely
hits maybe, but a bug.

I guess if you have the time, the perfect thing would be if you could
time how big the cache difference is anyway, etc.?
The cache mostly gets a working array I think, so does it even help for
large arrays or is the time spend for the allocation negligible anway
then?
We also have a small array cache in numpy anyway nowadays (not sure how
small small is here). Maybe this already achieves everything that the
fftcache was designed for and we could even just disable it as default?

The complexity addition is a bit annoying I must admit, on python 3
functools.lru_cache could be another option, but only there.

- Sebastian


> 
> Cheers!
> 
> Lion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
> 

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Changing FFT cache to a bounded LRU cache

2016-05-27 Thread Lion Krischer
Hi all,

I was told to take this to the mailing list. Relevant pull request:
https://github.com/numpy/numpy/pull/7686


NumPy's FFT implementation caches some form of execution plan for each
encountered input data length. This is currently implemented as a simple
dictionary which can grow without bounds. Calculating lots of different
FFTs thus cause a memory leak from the users' perspective. We
encountered a real world situation where this is an issue.

The PR linked above proposes to replace the simple dictionary with an
LRU (least recently used) cache. It will remove the least recently used
pieces of data if it grows beyond a specified size (currently an
arbitrary limit of 100 MB per cache). Thus almost all users will still
benefit from the caches but their total memory size is now limited.


Things to consider:

* This adds quite some additional complexity but I could not find a
simple way to achieve the same result.
* What is a good limit on cache size? I used 100 MB because it works for
my uses cases.


Cheers!

Lion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion