Re: [Numpy-discussion] Numpy 1.10

2015-03-12 Thread Ralf Gommers
On Fri, Mar 13, 2015 at 7:29 AM, Jaime Fernández del Río <
jaime.f...@gmail.com> wrote:

> On Thu, Mar 12, 2015 at 10:16 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>> On Sun, Mar 8, 2015 at 3:43 PM, Ralf Gommers 
>> wrote:
>>
>>>
>>>
>>> On Sat, Mar 7, 2015 at 12:40 AM, Charles R Harris <
>>> charlesr.har...@gmail.com> wrote:
>>>
 Hi All,

 Time to start thinking about numpy 1.10.

>>>
>>> Sounds good. Do we have a volunteer for release manager already?
>>>
>>
>> I guess it is my turn, unless someone else wants the experience.
>>
>
> What does a release manager do? I will eventually want to be able to tell
> my grandchildren that I once managed a numpy release, but am not sure if I
> can successfully handle it on my own right now. I will probably need to up
> my git foo, which is nothing to write home about...
>
> Maybe for this one I can sign up for release minion, so you have someone
> to offload menial tasks?
>

I have no doubt that you can do this job well right now - you are vastly
more experienced than I was when I picked up that role. It's not rocket
science.

I have to run now, but here's a start to give you an idea of what it
entails (some details and version numbers may be slightly outdated):
https://github.com/numpy/numpy/blob/master/doc/HOWTO_RELEASE.rst.txt

Cheers,
Ralf


> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
> de dominación mundial.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.10

2015-03-12 Thread Jaime Fernández del Río
On Thu, Mar 12, 2015 at 10:16 PM, Charles R Harris <
charlesr.har...@gmail.com> wrote:

>
>
> On Sun, Mar 8, 2015 at 3:43 PM, Ralf Gommers 
> wrote:
>
>>
>>
>> On Sat, Mar 7, 2015 at 12:40 AM, Charles R Harris <
>> charlesr.har...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Time to start thinking about numpy 1.10.
>>>
>>
>> Sounds good. Do we have a volunteer for release manager already?
>>
>
> I guess it is my turn, unless someone else wants the experience.
>

What does a release manager do? I will eventually want to be able to tell
my grandchildren that I once managed a numpy release, but am not sure if I
can successfully handle it on my own right now. I will probably need to up
my git foo, which is nothing to write home about...

Maybe for this one I can sign up for release minion, so you have someone to
offload menial tasks?

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy 1.10

2015-03-12 Thread Charles R Harris
On Sun, Mar 8, 2015 at 3:43 PM, Ralf Gommers  wrote:

>
>
> On Sat, Mar 7, 2015 at 12:40 AM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>> Hi All,
>>
>> Time to start thinking about numpy 1.10.
>>
>
> Sounds good. Do we have a volunteer for release manager already?
>

I guess it is my turn, unless someone else wants the experience.



Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy where

2015-03-12 Thread Benjamin Root
I think the question is if scalars should be acceptable for the first
argument, not if it should be for the 2nd and 3rd argument.

If scalar can be given for the first argument, the the first three makes
sense. Although, I have no clue why we would allow that.

Ben Root
On Mar 12, 2015 9:25 PM, "Nathaniel Smith"  wrote:

> On Mar 12, 2015 5:02 PM, "Charles R Harris" 
> wrote:
> >
> > Hi All,
> >
> > This is apropos gh-5582 dealing with some corner cases of np.where. The
> following are the current behavior
> >
> > >>> import numpy
> > >>> numpy.where(True)  # case 1
> > ... (array([0]),)
> > >>> numpy.where(True, None, None)  # case 2
> > ... array(None, dtype=object)
> > >>> numpy.ma.where(True)  # case 3
> > ... (array([0]),)
> > >>> numpy.ma.where(True, None, None)  # case 4
> > ... (array([0]),)
> >
> > The question is, what exactly should be done in these cases? I'd be
> inclined to raise an error for cases 1 and 3. Case two looks correct to me
> if we agree that scalar inputs are acceptable. Case 4 looks wrong.
>
> I can't think of any reason scalars wouldn't be acceptable. So everything
> you suggest sounds right to me.
>
> -n
> Hi All,
>
> This is apropos gh-5582 
> dealing with some corner cases of np.where. The following are the current
> behavior
>
> >>> import numpy
> >>> numpy.where(True)  # case 1
> ... (array([0]),)
> >>> numpy.where(True, None, None)  # case 2
> ... array(None, dtype=object)
> >>> numpy.ma.where(True)  # case 3
> ... (array([0]),)
> >>> numpy.ma.where(True, None, None)  # case 4
> ... (array([0]),)
>
> The question is, what exactly should be done in these cases? I'd be
> inclined to raise an error for cases 1 and 3. Case two looks correct to me
> if we agree that scalar inputs are acceptable. Case 4 looks wrong.
>
> Thoughts?
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy where

2015-03-12 Thread Nathaniel Smith
On Mar 12, 2015 5:02 PM, "Charles R Harris" 
wrote:
>
> Hi All,
>
> This is apropos gh-5582 dealing with some corner cases of np.where. The
following are the current behavior
>
> >>> import numpy
> >>> numpy.where(True)  # case 1
> ... (array([0]),)
> >>> numpy.where(True, None, None)  # case 2
> ... array(None, dtype=object)
> >>> numpy.ma.where(True)  # case 3
> ... (array([0]),)
> >>> numpy.ma.where(True, None, None)  # case 4
> ... (array([0]),)
>
> The question is, what exactly should be done in these cases? I'd be
inclined to raise an error for cases 1 and 3. Case two looks correct to me
if we agree that scalar inputs are acceptable. Case 4 looks wrong.

I can't think of any reason scalars wouldn't be acceptable. So everything
you suggest sounds right to me.

-n
Hi All,

This is apropos gh-5582  dealing
with some corner cases of np.where. The following are the current behavior

>>> import numpy
>>> numpy.where(True)  # case 1
... (array([0]),)
>>> numpy.where(True, None, None)  # case 2
... array(None, dtype=object)
>>> numpy.ma.where(True)  # case 3
... (array([0]),)
>>> numpy.ma.where(True, None, None)  # case 4
... (array([0]),)

The question is, what exactly should be done in these cases? I'd be
inclined to raise an error for cases 1 and 3. Case two looks correct to me
if we agree that scalar inputs are acceptable. Case 4 looks wrong.

Thoughts?

Chuck

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Numpy where

2015-03-12 Thread Charles R Harris
Hi All,

This is apropos gh-5582  dealing
with some corner cases of np.where. The following are the current behavior

>>> import numpy
>>> numpy.where(True)  # case 1
... (array([0]),)
>>> numpy.where(True, None, None)  # case 2
... array(None, dtype=object)
>>> numpy.ma.where(True)  # case 3
... (array([0]),)
>>> numpy.ma.where(True, None, None)  # case 4
... (array([0]),)

The question is, what exactly should be done in these cases? I'd be
inclined to raise an error for cases 1 and 3. Case two looks correct to me
if we agree that scalar inputs are acceptable. Case 4 looks wrong.

Thoughts?

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] tie breaking for max, min, argmax, argmin

2015-03-12 Thread Julian Taylor
On 03/12/2015 02:42 PM, Robert Kern wrote:
> On Thu, Mar 12, 2015 at 1:31 PM, Johannes Kulick
>  > wrote:
>>
>> Hello,
>>
>> I wonder if it would be worth to enhance max, min, argmax and argmin
> (more?)
>> with a tie breaking parameter: If multiple entries have the same value
> the first
>> value is returned by now. It would be useful to have a parameter to
> alter this
>> behavior to an arbitrary tie-breaking. I would propose, that the
> tie-breaking
>> function gets a list with all indices of the max/mins.
>>
>> Example:
>> >>> a = np.array([ 1, 2, 5, 5, 2, 1])
>> >>> np.argmax(a, tie_breaking=random.choice)
>> 3
>>
>> >>> np.argmax(a, tie_breaking=random.choice)
>> 2
>>
>> >>> np.argmax(a, tie_breaking=random.choice)
>> 2
>>
>> >>> np.argmax(a, tie_breaking=random.choice)
>> 2
>>
>> >>> np.argmax(a, tie_breaking=random.choice)
>> 3
>>
>> Especially for some randomized experiments it is necessary that not
> always the
>> first maximum is returned, but a random optimum. Thus I end up writing
> these
>> things over and over again.
>>
>> I understand, that max and min are crucial functions, which shouldn't
> be slowed
>> down by the proposed changes. Adding new functions instead of altering the
>> existing ones would be a good option.
>>
>> Are there any concerns against me implementing these things and
> sending a pull
>> request? Should such a function better be included in scipy for example?
> 
> On the whole, I think I would prefer new functions for this. I assume
> you only need variants for argmin() and argmax() and not min() and
> max(), since all of the tied values for the latter two would be
> identical, so returning the first one is just as good as any other.
> 

is this such a common usecase that its worth a numpy function to replace
one liners like this?

np.random.choice(np.where(a == a.max())[0])

its also not that inefficient if the number of equal elements is not too
large.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] tie breaking for max, min, argmax, argmin

2015-03-12 Thread Robert Kern
On Thu, Mar 12, 2015 at 1:31 PM, Johannes Kulick <
johannes.kul...@ipvs.uni-stuttgart.de> wrote:
>
> Hello,
>
> I wonder if it would be worth to enhance max, min, argmax and argmin
(more?)
> with a tie breaking parameter: If multiple entries have the same value
the first
> value is returned by now. It would be useful to have a parameter to alter
this
> behavior to an arbitrary tie-breaking. I would propose, that the
tie-breaking
> function gets a list with all indices of the max/mins.
>
> Example:
> >>> a = np.array([ 1, 2, 5, 5, 2, 1])
> >>> np.argmax(a, tie_breaking=random.choice)
> 3
>
> >>> np.argmax(a, tie_breaking=random.choice)
> 2
>
> >>> np.argmax(a, tie_breaking=random.choice)
> 2
>
> >>> np.argmax(a, tie_breaking=random.choice)
> 2
>
> >>> np.argmax(a, tie_breaking=random.choice)
> 3
>
> Especially for some randomized experiments it is necessary that not
always the
> first maximum is returned, but a random optimum. Thus I end up writing
these
> things over and over again.
>
> I understand, that max and min are crucial functions, which shouldn't be
slowed
> down by the proposed changes. Adding new functions instead of altering the
> existing ones would be a good option.
>
> Are there any concerns against me implementing these things and sending a
pull
> request? Should such a function better be included in scipy for example?

On the whole, I think I would prefer new functions for this. I assume you
only need variants for argmin() and argmax() and not min() and max(), since
all of the tied values for the latter two would be identical, so returning
the first one is just as good as any other.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] tie breaking for max, min, argmax, argmin

2015-03-12 Thread Johannes Kulick
Hello,

I wonder if it would be worth to enhance max, min, argmax and argmin (more?)
with a tie breaking parameter: If multiple entries have the same value the first
value is returned by now. It would be useful to have a parameter to alter this
behavior to an arbitrary tie-breaking. I would propose, that the tie-breaking
function gets a list with all indices of the max/mins. 

Example:
>>> a = np.array([ 1, 2, 5, 5, 2, 1])
>>> np.argmax(a, tie_breaking=random.choice)
3

>>> np.argmax(a, tie_breaking=random.choice)
2

>>> np.argmax(a, tie_breaking=random.choice)
2

>>> np.argmax(a, tie_breaking=random.choice)
2

>>> np.argmax(a, tie_breaking=random.choice)
3

Especially for some randomized experiments it is necessary that not always the
first maximum is returned, but a random optimum. Thus I end up writing these
things over and over again.

I understand, that max and min are crucial functions, which shouldn't be slowed
down by the proposed changes. Adding new functions instead of altering the
existing ones would be a good option.

Are there any concerns against me implementing these things and sending a pull
request? Should such a function better be included in scipy for example?

Best,
Johannes
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Introductory mail and GSoc Project "Vector math library integration"

2015-03-12 Thread Julian Taylor
On 03/12/2015 10:15 AM, Gregor Thalhammer wrote:
> 
> Another note, numpy makes it easy to provide new ufuncs, see 
> http://docs.scipy.org/doc/numpy-dev/user/c-info.ufunc-tutorial.html
> from a C function that operates on 1D arrays, but this function needs to
> support arbitrary spacing (stride) between the items. Unfortunately, to
> achieve good performance, vector math libraries often expect that the
> items are laid out contiguously in memory. MKL/VML is a notable
> exception. So for non contiguous in- or output arrays you might need to
> copy the data to a buffer, which likely kills large amounts of the
> performance gain.

The elementary functions are very slow even compared to memory access,
they take in the orders of hundreds to tens of thousand cycles to
complete (depending on range and required accuracy).
Even in the case of strided access that gives the hardware prefetchers
plenty of time to load the data before the previous computation is done.

This also removes the requirement from the library to provide a strided
api, we can copy the strided data into a contiguous buffer and pass it
to the library without losing much performance. It may not be optimal
(e.g. a library can fine tune the prefetching better for the case where
the hardware is not ideal) but most likely sufficient.

Figuring out how to best do it to get the best performance and still
being flexible in what implementation is used is part of the challenge
the student will face for this project.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Introductory mail and GSoc Project "Vector math library integration"

2015-03-12 Thread Neal Becker
Ralf Gommers wrote:

> On Wed, Mar 11, 2015 at 11:20 PM, Dp Docs  wrote:
> 
>>
>>
>> On Thu, Mar 12, 2015 at 2:01 AM, Daπid  wrote:
>> >
>> > On 11 March 2015 at 16:51, Dp Docs  wrote:
>> >> On Wed, Mar 11, 2015 at 7:52 PM, Sturla Molden
>> >> 
>> wrote:
>> >> >
>> >> > There are at least two ways to proceed here. One is to only use
>> >> > vector math when strides and alignment allow it.
>> >> I didn't got it. can you explain in detail?
>> >
>> >
>> > One example, you can create a numpy 2D array using only the odd columns
>> of a matrix.
>> >
>> > odd_matrix = full_matrix[::2, ::2]
>> >
>> > This is just a view of the original data, so you save the time and the
>> memory of making a copy. The drawback is that you trash
>> ​>​
>> memory locality, as the elements are not contiguous in memory. If the
>> memory is guaranteed to be contiguous, a compiler can apply
>> ​>​
>> extra optimisations, and this is what vector libraries usually assume.
>> What I think Sturla is suggesting with "when strides and aligment
>> ​>​
>> allow it" is to use the fast version if the array is contiguous, and fall
>> back to the present implementation otherwise. Another would be to
>> ​>​
>> make an optimally aligned copy, but that could eat up whatever time we
>> save from using the faster library, and other problems.
>> >
>> > The difficulty with Numpy's strides is that they allow so many ways of
>> manipulating the data... (alternating elements, transpositions, different
>> precisions...).
>> >
>> >>
>> >> I think the actual problem is not "to choose which library to
>> integrate", it is how to integrate these libraries? as I have seen the
>> code ​>>​
>> base and been told the current implementation uses the c math library,
>> Can
>> we just use the current  implementation and whenever it
>> ​>>​
>> is calling C Maths functions, we will replace by these above fast library
>> functions?Then we have to modify the Numpy library (which
>> ​>>​
>> usually get imported for maths operation) by using some if else
>> conditions
>> like first work with the faster one  and if it is not available
>> ​>>​
>> the look for the Default one.
>> >
>> >
>> > At the moment, we are linking to whichever LAPACK is avaliable at
>> compile time, so no need for a runtime check. I guess it could
>> ​>​
>> (should?) be the same.
>> ​I didn't understand this. I was asking about let say I have chosen one
>> faster library, now I need to integrate this​ in *some way *without
>> changing the default functionality so that when Numpy will import "from
>> numpy import *",it should be able to access the integrated libraries
>> functions as well as default libraries functions, What should we be that*
>> some way*?​ Even at the Compile, it need to decide that which Function it
>> is going to use, right?
>>
> 
> Indeed, it should probably work similar to how BLAS/LAPACK functions are
> treated now. So you can support multiple libraries in numpy (pick only one
> to start with of course), but at compile time you'd pick the one to use.
> Then that library gets always called under the hood, i.e. no new public
> functions/objects in numpy but only improved performance of existing ones.
> 
> It have been discussed above about integration of MKL libraries but when
>> MKL is not available on the hardware Architecture, will the above library
>> support as default library? if yes, then the Above discussed integration
>> method may be the required one for integration in this project, right?
>> Can you please tell me a bit more or provide some link related to that?​
>> Availability of these faster Libraries depends on the Hardware
>> Architectures etc. or availability of hardware Resources in a System?
>> because if it is later one, this newly integrated library will support
>> operations some time while sometimes not?
>>
> 
> Not HW resources I'd think. Looking at http://www.yeppp.info, it supports
> all commonly used cpus/instruction sets.
> As long as the accuracy of the library is OK this should not be noticeable
> to users except for the difference in performance.
> 
> 
>> I believe it's the first one but it is better to clear any type of
>> confusion. For example, assuming availability of Hardware means later
>> one,
>>  let say if library A needed the A1 for it's support and A1 is busy then
>>  it
>> will not be able to support the operation. Meanwhile, library B, needs
>> Support of hardware type B1 , and it's not Busy then it will support
>> these operations. What I want to say is Assuming the Availability of
>> faster lib. means availability of hardware Resources in a System at a
>> particular time when we want to do operation, it's totally unpredictable
>> and Availability of these resources will be Random and even worse, if it
>> take a bit extra time between compile and running, and that h/d resource
>> have been allocated to other process in the meantime then it would be
>> very problematic to use these operations. So this leads to think that
>> Availability of l

Re: [Numpy-discussion] Introductory mail and GSoc Project "Vector math library integration"

2015-03-12 Thread Gregor Thalhammer

> Am 11.03.2015 um 23:18 schrieb Dp Docs :
> 
> 
> 
> On Wed, Mar 11, 2015 at 10:34 PM, Gregor Thalhammer 
> mailto:gregor.thalham...@gmail.com>> wrote:
> >
> >
> > On the scipy mailing list I also answered to Amine, who is also interested 
> > in this proposal.
> ​​ Can you provide the link of that discussion? I am getting trouble in 
> searching that.
> 
> ​>​Long time ago I wrote a package that ​
> >​provides fast math functions (ufuncs) for numpy, using Intel’s MKL/VML 
> >library, see  https://github.com/geggo/uvml  
> >and my comments ​>​there. This code could be easily ported to use other 
> >vector math libraries.
> 
> ​When MKL is not available for a System, will this integration work with 
> default numpy maths functions?
> ​
> ​>​ Would be interesting to evaluate other possibilities. Due to ​>​the fact 
> that MKL is non-free, there are concerns to use it with numpy, ​>​although 
> e.g. numpy and scipy using the MKL LAPACK ​>​routines are used frequently 
> (Anaconda or Christoph Gohlkes  binaries). 
> >
> > You can easily inject the fast math ufuncs into numpy, e.g. with 
> > set_numeric_ops() or np.sin = vml.sin. 
> 
> ​Can you explain in a bit detail or provide a link where i can see it?​

My approach for https://github.com/geggo/uvml  
was to provide a separate python extension that provides faster numpy ufuncs 
for math operations like exp, sin, cos, … To replace the standard numpy ufuncs 
by the optimized ones you don’t need to apply changes to the source code of 
numpy, instead at runtime you monkey patch it and get faster math everywhere. 
Numpy even offers an interface (set_numeric_ops) to modify it at runtime. 

Another note, numpy makes it easy to provide new ufuncs, see 
http://docs.scipy.org/doc/numpy-dev/user/c-info.ufunc-tutorial.html 

from a C function that operates on 1D arrays, but this function needs to 
support arbitrary spacing (stride) between the items. Unfortunately, to achieve 
good performance, vector math libraries often expect that the items are laid 
out contiguously in memory. MKL/VML is a notable exception. So for non 
contiguous in- or output arrays you might need to copy the data to a buffer, 
which likely kills large amounts of the performance gain. This does not 
completely rule out some of the libraries, since performance critical data is 
likely to be stored in contiguous arrays.

Using a library that supports only vector math for contiguous arrays is more 
difficult, but perhaps the numpy nditer provides everything needed. 

Gregor___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Introductory mail and GSoc Project "Vector math library integration"

2015-03-12 Thread Dp Docs
Thanks to all of you for such a nice Discussion and Suggestion. I think
most of my doubts have been resolved. If there will be something more i
will let you people Know.
Thanks again.

--
Durgesh pandey
IIIT Hyderabad,India
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Would like to patch docstring for numpy.random.normal

2015-03-12 Thread Sebastian Berg
On Di, 2015-03-10 at 11:22 -0700, Nathaniel Smith wrote:
> On Mar 10, 2015 11:15 AM, "Paul Hobson"  wrote:
> >
> >
> > On Mon, Mar 9, 2015 at 4:33 PM, Charles R Harris
>  wrote:
> >>
> >>
> >>
> >> On Mon, Mar 9, 2015 at 2:34 PM, Paul Hobson 
> wrote:
> >>>
> >>> I feel your pain. Making it worse, numpy.random.lognormal takes
> "mean" and "sigma" as input. If there's ever a backwards incompatible
> release, I hope these things will be cleared up.
> >>
> >>
> >> There is a numpy 2.0 milestone ;)
> >>
> >
> > Is it worth submitting PRs against the existing 2.X branch or is
> that so far away that the can should be kicked down the road?
> 
> Not sure what you mean by "the existing 2.X branch" (does such a thing
> exist somewhere?), but yeah, don't submit PRs like that. Best case
> they'd bit rot before we ever get around to 2.0, worst case 2.0 may
> never happen. (What, you liked python 2 -> 3 so much you want to go
> through that again?)
> 

We could try to maintain a list of things like this for others like
blaze to not fall into the same pits. But then I this is likely not
quite of that caliber ;).

> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Introductory mail and GSoc Project "Vector math library integration"

2015-03-12 Thread Ralf Gommers
On Wed, Mar 11, 2015 at 11:20 PM, Dp Docs  wrote:

>
>
> On Thu, Mar 12, 2015 at 2:01 AM, Daπid  wrote:
> >
> > On 11 March 2015 at 16:51, Dp Docs  wrote:
> >> On Wed, Mar 11, 2015 at 7:52 PM, Sturla Molden 
> wrote:
> >> >
> >> > There are at least two ways to proceed here. One is to only use vector
> >> > math when strides and alignment allow it.
> >> I didn't got it. can you explain in detail?
> >
> >
> > One example, you can create a numpy 2D array using only the odd columns
> of a matrix.
> >
> > odd_matrix = full_matrix[::2, ::2]
> >
> > This is just a view of the original data, so you save the time and the
> memory of making a copy. The drawback is that you trash
> ​>​
> memory locality, as the elements are not contiguous in memory. If the
> memory is guaranteed to be contiguous, a compiler can apply
> ​>​
> extra optimisations, and this is what vector libraries usually assume.
> What I think Sturla is suggesting with "when strides and aligment
> ​>​
> allow it" is to use the fast version if the array is contiguous, and fall
> back to the present implementation otherwise. Another would be to
> ​>​
> make an optimally aligned copy, but that could eat up whatever time we
> save from using the faster library, and other problems.
> >
> > The difficulty with Numpy's strides is that they allow so many ways of
> manipulating the data... (alternating elements, transpositions, different
> precisions...).
> >
> >>
> >> I think the actual problem is not "to choose which library to
> integrate", it is how to integrate these libraries? as I have seen the code
> ​>>​
> base and been told the current implementation uses the c math library, Can
> we just use the current  implementation and whenever it
> ​>>​
> is calling C Maths functions, we will replace by these above fast library
> functions?Then we have to modify the Numpy library (which
> ​>>​
> usually get imported for maths operation) by using some if else conditions
> like first work with the faster one  and if it is not available
> ​>>​
> the look for the Default one.
> >
> >
> > At the moment, we are linking to whichever LAPACK is avaliable at
> compile time, so no need for a runtime check. I guess it could
> ​>​
> (should?) be the same.
> ​I didn't understand this. I was asking about let say I have chosen one
> faster library, now I need to integrate this​ in *some way *without
> changing the default functionality so that when Numpy will import "from
> numpy import *",it should be able to access the integrated libraries
> functions as well as default libraries functions, What should we be that* some
> way*?​ Even at the Compile, it need to decide that which Function it is
> going to use, right?
>

Indeed, it should probably work similar to how BLAS/LAPACK functions are
treated now. So you can support multiple libraries in numpy (pick only one
to start with of course), but at compile time you'd pick the one to use.
Then that library gets always called under the hood, i.e. no new public
functions/objects in numpy but only improved performance of existing ones.

It have been discussed above about integration of MKL libraries but when
> MKL is not available on the hardware Architecture, will the above library
> support as default library? if yes, then the Above discussed integration
> method may be the required one for integration in this project, right?
> Can you please tell me a bit more or provide some link related to that?​
> Availability of these faster Libraries depends on the Hardware
> Architectures etc. or availability of hardware Resources in a System?
> because if it is later one, this newly integrated library will support
> operations some time while sometimes not?
>

Not HW resources I'd think. Looking at http://www.yeppp.info, it supports
all commonly used cpus/instruction sets.
As long as the accuracy of the library is OK this should not be noticeable
to users except for the difference in performance.


> I believe it's the first one but it is better to clear any type of
> confusion. For example, assuming availability of Hardware means later one,
>  let say if library A needed the A1 for it's support and A1 is busy then it
> will not be able to support the operation. Meanwhile, library B, needs
> Support of hardware type B1 , and it's not Busy then it will support these
> operations. What I want to say is Assuming the Availability of faster lib.
> means availability of hardware Resources in a System at a particular time
> when we want to do operation, it's totally unpredictable and Availability
> of these resources will be Random and even worse, if it take a bit extra
> time between compile and running, and that h/d resource have been allocated
> to other process in the meantime then it would be very problematic to use
> these operations. So this leads to think that Availability of lib. means
> type of h/d architecture whether it supports or not that lib. Since there
> are many kind of h/d architecture and it is not the case that one library
>