from:"Nathaniel Smith"

Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-03-31 Thread Nathaniel Smith

On 31 Mar 2014 19:47, Chris Barker chris.bar...@noaa.gov wrote:

 On Sat, Mar 29, 2014 at 3:08 PM, Nathaniel Smith n...@pobox.com wrote:

 On 29 Mar 2014 20:57, Chris Barker chris.bar...@noaa.gov wrote:
  I think this is somewhat open for discussion -- yes, it's odd, but in
the spirit of practicality beats purity, it seems OK. We could allow any TZ
specifier for that matter -- that's kind of how naive or local timezone
(non) handling works -- it's up to the user to make sure that all DTs are
in the same timezone.

 That isn't how naive timezone handling works in datetime.datetime,
though. If you try to mix a timezone (even a Zulu timezone) datetime with a
naive datetime, you get an exception.

 fari enough.

 The difference is that datetime.datetime doesn't provide any iso string
parsing.

Sure it does. datetime.strptime, with the %z modifier in particular.

 The use case I'm imagining is for folks with ISO strings with a Z on the
end -- they'll need to deal with pre-parsing the strings to strip off the
Z, when it wouldn't change the result.

 Maybe this is an argument for UTC always rather than naive?

Probably it is, but that approach seems a lot harder to extend to proper tz
support later, plus being more likely to cause trouble for pandas's proper
tz support now.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-03-29 Thread Nathaniel Smith

On Fri, Mar 28, 2014 at 9:30 PM, Sankarshan Mudkavi
smudk...@uwaterloo.ca wrote:

 Hi Nathaniel,

 1- You give as an example of naive datetime handling:

 np.datetime64('2005-02-25T03:00Z')
 np.datetime64('2005-02-25T03:00')

 This IIUC is incorrect. The Z modifier is a timezone offset, and for normal
 naive datetimes would cause an error.


 If what I understand from reading:
 http://thread.gmane.org/gmane.comp.python.numeric.general/53805

 It looks like anything other than Z, 00:00 or UTC that has a TZ adjustment
 would raise an error, and those specific conditions would not (I'm guessing
 this is because we assume it's UTC (or the same timezone) internally,
 anything that explicitly tells us it is UTC is acceptable, although that may
 be just my misreading of it.)

If we assume it's UTC, then that's proposal 2, I think :-).

My point is just that naive datetime already has a specific meaning
in Python, and as I understand that meaning, it says that trying to
pass a Z timezone to a naive datetime should be an error.

As a separate issue, we might decide that we want to continue to allow
Z modifiers (or all offset modifiers) temporarily in numpy, to avoid
breaking code without warning. Just if we do, then we shoudn't say
that this is because we are implementing naive datetimes and this is
how naive datetimes work. Instead we should either say that we're not
implementing naive datetimes, or else say that we're implementing
naive datetimes but have some temporary compatibility hacks on top of
that (and probably issue a DeprecationWarning if anyone passes a
timezone).

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-03-29 Thread Nathaniel Smith

On 29 Mar 2014 20:57, Chris Barker chris.bar...@noaa.gov wrote:
 I think this is somewhat open for discussion -- yes, it's odd, but in the
spirit of practicality beats purity, it seems OK. We could allow any TZ
specifier for that matter -- that's kind of how naive or local timezone
(non) handling works -- it's up to the user to make sure that all DTs are
in the same timezone.

That isn't how naive timezone handling works in datetime.datetime, though.
If you try to mix a timezone (even a Zulu timezone) datetime with a naive
datetime, you get an exception. I agree this is open for discussion, but
IMO deviating from the stdlib behavior this much would require some more
justification. Don't let errors pass silently, etc.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-03-28 Thread Nathaniel Smith

On 28 Mar 2014 05:00, Sankarshan Mudkavi smudk...@uwaterloo.ca wrote:

 Hi all,

 Apologies for the delay in following up, here is an expanded version of
the proposal, which hopefully clears up most of the details. I have not
included specific implementation details for the code, such as which
functions to modify etc. since I think those are not traditionally included
in NEPs?

The format seems fine to me. Really the point is just to have a document
that we can use as reference when deciding on behaviour, and this does that
:-).

Three quick comments:

1- You give as an example of naive datetime handling:

 np.datetime64('2005-02-25T03:00Z')
np.datetime64('2005-02-25T03:00')

This IIUC is incorrect. The Z modifier is a timezone offset, and for normal
naive datetimes would cause an error.

2- It would be good to include explicitly examples of conversion to and
from datetimes alongside the examples of conversions to and from strings.

3- It would be good to (eventually) include some discussion of the impact
of the preferred proposal on existing code. E.g., will this break a lot of
people's pipelines? (Are people currently *always* adding timezones to
their numpy input to avoid the problem, and now will have to switch to the
opposite behaviour depending on numpy version?) And we'll want to make sure
to get feedback from the pydata@ (pandas) list explicitly, though that can
wait until people here have had a chance to respond to the first draft.

Thanks for pushing this forward!
-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe

2014-03-28 Thread Nathaniel Smith

On Fri, Mar 28, 2014 at 4:58 PM, Robert Kern robert.k...@gmail.com wrote:
 On Fri, Mar 28, 2014 at 2:54 PM, Sturla Molden sturla.mol...@gmail.com 
 wrote:
 Matthew Brett matthew.br...@gmail.com wrote:

 I see it should be possible to build a full blas and partial lapack
 library with eigen [1] [2].

 Eigen has a licensing issue as well, unfortunately, MPL2.

 E.g. it requires recipients to be informed of the MPL requirements (cf.
 impossible with pip install numpy).

 That's not the relevant condition. That's easily taken care of by
 including the MPL2 license text in the binary alongside numpy's BSD
 license text. This is no different than numpy's BSD license itself,
 which requires that the license text be included. It's not like people
 can't distribute any MPL2 project on PyPI just because pip doesn't
 print out the license before installing.

 The extra-BSD conditions of the MPL2 are sections 3.1 and 3.2.

Those requirements just say that in addition to including the MPL2
license text, we also have to include a notice saying where the source
code is available, i.e. the package would have to somewhere include a
link to eigen.org.

https://www.mozilla.org/MPL/2.0/FAQ.html#distribute-my-binaries

I'm not sure why this would be a problem.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe

2014-03-28 Thread Nathaniel Smith

On Fri, Mar 28, 2014 at 8:01 PM, Sturla Molden sturla.mol...@gmail.com wrote:
 Matthew Brett matthew.br...@gmail.com wrote:

 So - is Eigen our best option for optimized blas / lapack binaries on
 64 bit Windows?

 Maybe not:

 http://gcdart.blogspot.de/2013/06/fast-matrix-multiply-and-ml.html

 With AVX the difference is possibly even larger.

But if we rule out closed-source BLAS, and we rule out OpenBLAS
because of our distrusting its accuracy, and we aren't going to
recompile ATLAS on every machine, then Eigen is the only library they
tested that is even an option for us.

It would be nice to see some comparison between our actual options --
Eigen, generically compiled ATLAS, anything else?

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe

2014-03-28 Thread Nathaniel Smith

On 28 Mar 2014 20:26, Robert Kern robert.k...@gmail.com wrote:

 It's only a problem in that the binary will not be BSD, and we do need to
communicate that appropriately. It will contain a significant component
that is MPL2 licensed. The terms that force us to include the link to the
Eigen source that we used forces downstream redistributors of the binary to
do the same. Now, of all the copyleft licenses, this is certainly the most
friendly, but it is not BSD.

AFAICT, the only way redistributers could violate the MPL would be if they
unpacked our binary and deleted the license file. But this would also be a
violation of the BSD. The only difference in terms of requirements on
redistributors between MPL and BSD seems to be exactly *which* text you
include in your license file.

I don't know if Eigen is a good choice on technical grounds (or even a
possible one - has anyone ever actually compiled numpy against it?), but
this license thing just doesn't seem like an important issue to me, if the
alternative is not providing useful binaries.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe

2014-03-28 Thread Nathaniel Smith

Yes, because they're distributing source. But *our* license file could
contain the text of the BSD, the text of the MPL, and the text Eigen
source is available at http://eigen.org.;

If the only problem with eigen turns out to be that we have to add a line
of text to a file then I think we can probably manage this somehow.

-n
On 28 Mar 2014 20:40, Robert Kern robert.k...@gmail.com wrote:

No, the license does not contain a pointer to the Eigen sources, which is
required.

https://bitbucket.org/eigen/eigen/src/fabd880592ac3343713cc07e7287098afd0f18ca/COPYING.MPL2?at=default
On Mar 28, 2014 7:34 PM, Nathaniel Smith n...@pobox.com wrote:

On 28 Mar 2014 20:26, Robert Kern robert.k...@gmail.com wrote:

It's only a problem in that the binary will not be BSD, and we do need
to communicate that appropriately. It will contain a significant component
that is MPL2 licensed. The terms that force us to include the link to the
Eigen source that we used forces downstream redistributors of the binary to
do the same. Now, of all the copyleft licenses, this is certainly the most
friendly, but it is not BSD.

AFAICT, the only way redistributers could violate the MPL would be if
they unpacked our binary and deleted the license file. But this would also
be a violation of the BSD. The only difference in terms of requirements on
redistributors between MPL and BSD seems to be exactly *which* text you
include in your license file.

I don't know if Eigen is a good choice on technical grounds (or even a
possible one - has anyone ever actually compiled numpy against it?), but
this license thing just doesn't seem like an important issue to me, if the
alternative is not providing useful binaries.

-n

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe

2014-03-28 Thread Nathaniel Smith

I thought OpenBLAS is usually used with reference lapack?
On 28 Mar 2014 22:16, Matthew Brett matthew.br...@gmail.com wrote:

 Hi,

 On Fri, Mar 28, 2014 at 1:28 PM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Nathaniel Smith n...@pobox.com wrote:
 
  If the only problem with eigen turns out to be that we have to add a
 line
  of text to a file then I think we can probably manage this somehow.
 
  We would also have to compile Eigen-BLAS for various architectures and
 CPU
  counts. It is not adaptive like MKL or OpenBLAS.

 Yes, I guess we currently have no idea how bad a default Eigen would be.

 We also have the soft constraint that any choice we make should also
 work for building scipy binaries - so adequate lapack coverage.

 I believe that means lapack_lite is not an option?

 So I guess the options are:

 * eigen - could it be slow?
 * openblas - could it be buggy?
 * reference blas / lapack [1] [2] [3]

 In [2] someone seems to be getting very good performance from the
 reference implementation.

 I guess we need to benchmark these guys on some standard systems, and
 decide how bad the performance / stability has to be before it's
 better not to provide binaries at all.

 Cheers,

 Matthew

 [1] http://icl.cs.utk.edu/lapack-for-windows/lapack/
 [2]
 http://ylzhao.blogspot.com/2013/10/blas-lapack-precompiled-binaries-for.html
 [3] http://www.fi.muni.cz/~xsvobod2/misc/lapack/
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe

2014-03-26 Thread Nathaniel Smith

On Wed, Mar 26, 2014 at 7:34 PM, Julian Taylor
jtaylor.deb...@googlemail.com wrote:
 as for using openblas by default in binary builds, no.
 pthread openblas build is now fork safe which is great but it is still
 not reliable enough for a default.
 E.g. the current latest release 0.2.8 still has one crash bug on
 dgemv[1], and wrong results zherk/zer2[2] and dgemv/cgemv[3].
 git head has the former four fixed bug still has wrong results for cgemv.
 The not so old 0.2.8 also fixed whole bunch more crashes and wrong
 result issues (crashes on QR, uninitialized data use in dgemm, ...).
 None of the fixes received unit tests, so I am somewhat pessimistic that
 it will improve, especially as the maintainer is dissertating (is that
 the right word?) and most of the code is assembler code only few people
 can write (it is simply not required anymore, we have code generators
 and intrinsics for that).

 Openblas is great if you do not have the patience to build ATLAS and
 only use a restricted set of functionality and platforms you can easily
 test.
 Currently it is in my opinion not suitable for a general purpose library
 like numpy.

Those problems you list are pretty damning, but neither is it
reasonable to expect everyone to manually build ATLAS on every machine
they use (or their students use, or...) :-(. So what other options do
we have for general purpose builds? Give up and use MKL? How's
eigen-blas doing these days? (I guess from skimming their docs they
use OpenMP?)

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Resolving the associativity/precedence debate for @

2014-03-24 Thread Nathaniel Smith

On Sat, Mar 22, 2014 at 6:13 PM, Nathaniel Smith n...@pobox.com wrote:
 After 88 emails we don't have a conclusion in the other thread (see
 [1] for background). But we have to come to some conclusion or another
 if we want @ to exist :-). So I'll summarize where the discussion
 stands and let's see if we can find some way to resolve this.

Response in this thread so far seems (AFAICT) to have pretty much
converged on same-left.

If you think that this would be terrible and there is some compelling
argument against it, then please speak up! Otherwise, if no-one
objects, then I'll go ahead in the next few days and put same-left
into the PEP.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Resolving the associativity/precedence debate for @

2014-03-24 Thread Nathaniel Smith

On Mon, Mar 24, 2014 at 11:58 PM, Charles R Harris
charlesr.har...@gmail.com wrote:
 On Mon, Mar 24, 2014 at 5:56 PM, Nathaniel Smith n...@pobox.com wrote:

 On Sat, Mar 22, 2014 at 6:13 PM, Nathaniel Smith n...@pobox.com wrote:
  After 88 emails we don't have a conclusion in the other thread (see
  [1] for background). But we have to come to some conclusion or another
  if we want @ to exist :-). So I'll summarize where the discussion
  stands and let's see if we can find some way to resolve this.

 Response in this thread so far seems (AFAICT) to have pretty much
 converged on same-left.

 If you think that this would be terrible and there is some compelling
 argument against it, then please speak up! Otherwise, if no-one
 objects, then I'll go ahead in the next few days and put same-left
 into the PEP.


 I think we should take a close look at broadcasting before deciding on the
 precedence.

Can you elaborate? Like what, concretely, do you think we need to do now?

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Resolving the associativity/precedence debate for @

2014-03-22 Thread Nathaniel Smith

Hi all,

After 88 emails we don't have a conclusion in the other thread (see
[1] for background). But we have to come to some conclusion or another
if we want @ to exist :-). So I'll summarize where the discussion
stands and let's see if we can find some way to resolve this.

The fundamental question is whether a chain like (a @ b @ c) should be
evaluated left-to-right (left-associativity) or right-to-left
(right-associativity).

DATA SOURCE 1:

This isn't a democratic vote, but it's useful to get a sense of
people's intuitions. Counting messages in the other thread, opinion
seems to be pretty evenly split:

== Votes for right-associativity ==
Weak-right: [2] [3] [5]
Tight-right: [4] [6]
Same-right: [11]

== Votes for left-associativity ==
Same-left: [7] [8] [14] [15] [16]
Tight-left: [9]
Weak-left: [12]

There's also the grouping option (described in [10]), but that's
received very little support (just [13]).

DATA SOURCE 2:

Several people have suggested that performance considerations mean
that right-to-left evaluation is more common in practice than
left-to-right evaluation. But, if we look at actual usage in Python
code, that's not what we find: when people call dot() in chains, then
they're about evenly split, and actually use the left-to-right,
left-associative order slightly more often than the right-to-left,
right-associative order:
  http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069578.html

DATA SOURCE 3:

And if we look at other languages, then we find:

== Votes for right-associativity ==
none

== Votes for left-associativity ==
Same-left: Matlab, Julia, IDL, GAUSS
Tight-left: R

And Mathematica uses the grouping approach.

ARGUMENTS:

The final outcome of this is that I need to write a piece of text that
says what our (at least rough) consensus is, and lays out the reasons.
So long as the vote is so evenly split, I can't really do this. But
I can imagine what the different pieces of text might look like.

THE CASE FOR LEFT-ASSOCIATIVITY:

If I were writing this text in favor of left-associativity, I'd point out:

- Special cases aren't special enough to break the rules. Every
single operator in Python besides ** is left-associative (and ** has
very compelling arguments for right associativity). @ does not have
similarly compelling arguments. If we were having this debate about
*, then it'd probably be much more lopsided towards
left-associativity. So sure, there's something about @ that makes
right-associativity *more* appealing than for most other operators.
But not *much* more appealing -- left-associativity still comes out at
least slightly ahead in all of the above measures. And there are a lot
of benefits to avoiding special cases -- it gives fewer rules to
memorize, fewer rules to remember, etc. So @ may be a special case,
but it's not special enough.

- Other languages with @ operators almost overwhelmingly use the
same-left rule, and I've never heard anyone complain about this, so
clearly nothing horrible will happen if we go this way. We have no
comparable experience for right-associativity.

- Given left-associativity, then there's good agreement about the
appropriate precedence. If we choose right-associativity then it's
much less clear (which will then make things harder for experts to
remember, harder for non-experts to guess, etc.). Note that one of the
votes for right-associativity even preferred the same-right rule,
which is not even technically possible...

This strikes me as a nice solid case.

THE CASE FOR RIGHT-ASSOCIATIVITY:

If I were writing this text in favor of right-associativity, I'd point out:

- Because matrix multiplication has a tight conceptual association
with function application/composition, many mathematically
sophisticated users have an intuition that a matrix expression like
R S x
proceeds from right-to-left, with first S transforming x, and then R
transforming the result. This isn't universally agreed, but at the
least this intuition is more common than for other operations like 2 *
3 * 4 that everyone reads as going from left-to-right.

- There might be some speed argument, if people often write things
like Mat @ Mat @ vec? But no-one has found any evidence that people
actually do write such things often.

- There's been discussion of how right-associativity might maybe
perhaps be nice for non-matmul applications? But I can't use those
arguments [17] [18].

- .. I got nothin'.

I am fine with any outcome here. (I'm actually listed under *both*
tight-right and same-left in the straw poll above ;-).) I'm totally
happy to go back to Guido et al and argue for right-associativity. BUT
if you all want me to do that then you need to give me some better
arguments to use :-).

One way to do this might be to go through the ((a @ b) @ c) and (a @
(b @ c)) examples I found (the scripts are available [19], and I can
help modify them to spit out more details), look at the actual code,
and demonstrate that the left-to-right ((a @ b) @ c) cases

Re: [Numpy-discussion] Resolving the associativity/precedence debate for @

2014-03-22 Thread Nathaniel Smith

On Sat, Mar 22, 2014 at 7:59 PM, Robert Kern robert.k...@gmail.com wrote:
 On Sat, Mar 22, 2014 at 6:13 PM, Nathaniel Smith n...@pobox.com wrote:
 Hi all,

 After 88 emails we don't have a conclusion in the other thread (see
 [1] for background). But we have to come to some conclusion or another
 if we want @ to exist :-). So I'll summarize where the discussion
 stands and let's see if we can find some way to resolve this.

 The numpy community has no consensus strongly preferring one option
 over another is a perfectly fine conclusion to this thread on
 numpy-discussion, IMO. Actually deciding what goes into the PEP given
 that input and merged with other concerns should probably happen on
 python-ideas.

Yep, if we converge on deadlock then that's what we'll do, but I'm not
yet convinced that we've converged at all. In the last few hours the
vote deltas are right -1, left +3...

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-03-21 Thread Nathaniel Smith

On Thu, Mar 20, 2014 at 11:27 PM, Chris Barker chris.bar...@noaa.gov wrote:
 * I think there are more or less three options:
1)  a) don't have any timezone handling at all -- all datetime64s are UTC. 
 Always
  b) don't have any timezone handling at all -- all datetime64s are 
 naive
  (the only difference between these two is I/O of strings, and 
 maybe I/O of datetime objects with a time zone)
 2) Have a time zone associated with the array -- defaulting to either UTC 
 or None, but don't provide any implementation other than the tagging, with 
 the ability to add in TZ handler if you want (can this be done efficiently?)
 3) Full on proper TZ handling.

 I think (3) is off the table for now.

 I think (2) is what the NEP proposes, but I'd need more details, examples to 
 know.

 I prefer 1(b), but 1(a) is close enough that I'd be happy with that, too.

I think the first goal is to define what a plain vanilla datetime64
does, without any extra attributes. This is for two practical reasons:
First, our overriding #1 goal is to fix the nasty I/O problems that
default datetime64's show, so until that's done any other bells and
whistles are a distraction. And second, adding parameters to dtypes
right now is technically messy.

This rules out (2) and (3).

If we additionally want to keep the option of adding a timezone
parameter later, and have the result end up looking like stdlib
datetime, then I think 1(b) is the obvious choice. My guess is that
this is also what's most compatible with pandas, which is currently
keeping its own timezone object outside of the dtype.

Any downsides? I guess this would mean that we start raising an error
on ISO 8601's with offsets attached, which might annoy some people?

 Writing this made me think of a third option -- tracking, but no real 
 manipulation, of TZ. This would be analogous to the ISO 8601 does -- all it 
 does is note an offset. A given DateTime64 array would have a given offset 
 assigned to it, and the appropriate addition and subtraction would happen at 
 I/O. Offset of 0.00 would be UTC, and there would be a None option for naive.

Please no! An integer offset is a terrible way to represent timezones,
and hardcoding this would just get in the way of a proper solution.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Dates and times and Datetime64 (again)

2014-03-20 Thread Nathaniel Smith

On 20 Mar 2014 02:07, Sankarshan Mudkavi smudk...@uwaterloo.ca wrote:
 I've written a rather rudimentary NEP, (lacking in technical details
which I will hopefully add after some further discussion and receiving
clarification/help on this thread).

 Please let me know how to proceed and what you think should be added to
the current proposal (attached to this mail).

 Here is a rendered version of the same:

https://github.com/Sankarshan-Mudkavi/numpy/blob/Enhance-datetime64/doc/neps/datetime-improvement-proposal.rst

Your NEP suggests making all datetime64s be in UTC, and treating string
representations from unknown timezones as UTC. How does this differ from,
and why is it superior to, making all datetime64s be naive?

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-20 Thread Nathaniel Smith

On Thu, Mar 20, 2014 at 9:07 AM, Robert Kern robert.k...@gmail.com wrote:
 I think the operator-overload-as-DSL use cases actually argue somewhat
 for right-associativity. There is no lack of left-associative
 operators for these use cases to choose from since they usually don't
 have numeric or bitwise operations defined for them.
 Right-associativity adds some diversity into the ecosystem and opens
 up some design space.

Whether or not this is true, I think we should assign this argument
~zero weight for purposes of the present discussion. That's because:
- We haven't been asked to figure out the best design of @ for Python
overall, we've been asked to report back on what design of @ will be
best for the numeric community, since that's where we have special
expertise that python-dev lacks. Python-dev is entirely capable of
then taking our report as input and then having a debate about how
much weight to give to these other possible uses.

- And anyway, my impression is that python-dev will give these other
possible uses ~zero weight anyway -- if they thought random DSL
operators were important for their own sake, they would have added @
long ago :-). Maybe if we say we literally do not care at all what
@'s precedence and associativity are, then it will matter as a
tie-breaker, but first I don't think it's true that we don't care, and
second even if it were then my guess is that the argument for
consistency with the other operators would be a stronger tie-breaker.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-20 Thread Nathaniel Smith

On Wed, Mar 19, 2014 at 7:45 PM, Nathaniel Smith n...@pobox.com wrote:

 Okay, I wrote a little script [1] to scan Python source files look for
 things like 'dot(a, dot(b, c))' or 'dot(dot(a, b), c)', or the ndarray.dot
 method equivalents. So what we get out is:
 - a count of how many 'dot' calls there are
 - a count of how often we see left-associative nestings: dot(dot(a, b), c)
 - a count of how often we see right-associative nestings: dot(a, dot(b, c))

 Running it on a bunch of projects, I get:

 | project  | dots | left | right | right/left |
 |--+--+--+---+|
 | scipy|  796 |   53 |27 |   0.51 |
 | nipy |  275 |3 |19 |   6.33 |
 | scikit-learn |  472 |   11 |10 |   0.91 |
 | statsmodels  |  803 |   46 |38 |   0.83 |
 | astropy  |   17 |0 | 0 |nan |
 | scikit-image |   15 |1 | 0 |   0.00 |
 |--+--+--+---+|
 | total| 2378 |  114 |94 |   0.82 |


Another way to visualize this, converting each contiguous chain of calls
to np.dot into a parenthesized expression, and then counting how often we
see each pattern.

  1943  (_ @ _)
   100  ((_ @ _) @ _) # left
86  (_ @ (_ @ _)) # right
 2  (_ @ ((_ @ _) @ _))
 2  (((_ @ _) @ _) @ _) # left
 1  ((_ @ (_ @ _)) @ _)
 1  ((_ @ _) @ (_ @ _))
 1  (((_ @ _) @ _) @ (_ @ _))
 1  ((_ @ ((_ @ _) @ _)) @ _)
 1  ((_ @ _) @ (_ @ (_ @ _)))

(This is pooling scipy/nipy/scikit-learn/statsmodels.) I've noted the 3
different patterns that have a consistent associativity.

From this I'm leaning towards the conclusions that:

- Expressions with complex parenthesization do happen, but probably not
often enough to justify elaborate stuff like my 'chaining' proposal -- only
8.7% of these cases involve more than one @.

- There's very little support here for the intuition that
right-associativity is more useful than left-associativity on a day-to-day
basis.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-20 Thread Nathaniel Smith

On Thu, Mar 20, 2014 at 1:36 PM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:
 On 03/20/2014 02:26 PM, Dag Sverre Seljebotn wrote:
 Order-of-matrix-multiplication is literally my textbook example of a
 dynamic programming problem with complexity O(n^2) where n is number of
 terms (as in, it's how dynamic programming is introduced in my textbook).

 I don't think adding sparse or diagonal matrices changes this as long as
 you only deal with chained @ and make some simple assumptions of the
 cost of a FLOP in sparse @ dense, sparse @ sparse, dense @ dense, and so on.

 Where you need anything more than very simple dynamic programming
 algorithms is when you add + into the mix (whether to use the
 distributive rule or not and so on).

 I'm positive to the chained @ idea, I think it's the answer to what we
 really want.

 Sorry, I totally misunderstood this. The question is of course how you
 dispatch technically (where the __matmul__ function lives and which one
 to use), not figuring out what you want done.

Or even more specifically, the question is whether getting the chance
to use dynamic programming on chains of @'s (and only @'s!) is so
valuable that we want to have a special parsing+dispatch rule to allow
it.

I have to say that after glancing at a few hundred 'dot' calls, I'm
not as convinced that this is useful in practice. There are lots of
complex expressions out there involving 'dot', and relatively few of
them involve long chains of 'dot' calls [1][2]. There are strategies
for doing whole-expression optimization that work for more general
expressions, not just @ -- e.g. numexpr, numba, theano -- at the cost
of a bit more intrusiveness. And as numpy gets better at supporting
non-ndarray types, then it'll be easier to seamlessly support
low-impact deferred computation APIs like:

 a, b, c = defer(a, b, c)
 d = np.sin(a) + a @ b @ c
 e = d / (a + b + c + d)
 return force(e)

Having a special dispatch for @ would only help with one of the
computations here.

-n

[1] http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069565.html
[2] http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069578.html

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-19 Thread Nathaniel Smith

On Tue, Mar 18, 2014 at 9:14 AM, Robert Kern robert.k...@gmail.com wrote:
 On Tue, Mar 18, 2014 at 12:54 AM, Nathaniel Smith n...@pobox.com wrote:
 On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith n...@pobox.com wrote:
 Mathematica: instead of having an associativity, a @ b @ c gets
 converted into mdot([a, b, c])

 So, I've been thinking about this (thanks to @rfateman for pointing it
 out), and wondering if Mathematica's approach is worth following up
 more. (It would need to make it past python-dev, of course, but worst
 case is just that they say no and we're back where we are now, so we
 might as well think it through.)

 I predict with near-certainty that this will be rejected,

I guess that's what everyone thought about @ too? ;-)

 but that
 doesn't prevent it from derailing the discussion. This proposal is
 unlike anything else in Python. Chained comparisons are *not* similar
 to this proposal. The chaining only happens at the syntax level, not
 the semantics. `a  b  c` gets compiled down to `a.__lt__(b) and
 b.__lt__(c)`, not `do_comparison([a, b, c], [lt, lt])`.

Yes, the syntax is the same as chained comparisons, and the dispatch
is a generalization of regular operators. It is unusual; OTOH, @ is
unusual in that no other operators in Python have the property that
evaluating in the wrong order can cost you seconds of time and
gigabytes of memory. Perhaps.

 We have approval for a binary @ operator. Take the win.

We have approval, and we have a request: that we figure out how @
should work in detail to be most useful to us. Maybe that's this
proposal; maybe not. Ultimately rejected-or-not-rejected comes down to
how strong the arguments for something are. And while we can make some
guesses about that, it's impossible to know how strong an argument
will be until one sits down and works it out. So I still would like to
hear what people think, even if it just ends in the conclusion that
it's a terrible idea ;-).

As for arguments against the grouping semantics, I did think of one
another case where @ is not associative, though it's pretty weird:

In [9]: a = np.arange(16, dtype=np.int8).reshape((4, 4))

In [10]: np.dot(a, np.dot(a, a.astype(float)))
Out[10]:
array([[  1680.,   1940.,   2200.,   2460.],
   [  4880.,   5620.,   6360.,   7100.],
   [  8080.,   9300.,  10520.,  11740.],
   [ 11280.,  12980.,  14680.,  16380.]])

In [12]: np.dot(np.dot(a, a), a.astype(float))
Out[12]:
array([[ 1680.,  1940.,  2200.,  2460.],
   [-1264., -1548., -1832., -2116.],
   [ 1936.,  2132.,  2328.,  2524.],
   [-1008., -1100., -1192., -1284.]])

(What's happening is that we have int8 @ int8 @ float, so (int8 @
int8) @ float has overflows in the first computation, but int8 @ (int8
@ float) does all the computations in float, with no overflows.)

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-19 Thread Nathaniel Smith

On Sat, Mar 15, 2014 at 3:41 AM, Nathaniel Smith n...@pobox.com wrote:
 I think we need to
 know something about how often the Mat @ Mat @ vec type cases arise in
 practice. How often do non-scalar * and np.dot show up in the same
 expression? How often does it look like a * np.dot(b, c), and how often
does
 it look like np.dot(a * b, c)? How often do we see expressions like
 np.dot(np.dot(a, b), c), and how often do we see expressions like
np.dot(a,
 np.dot(b, c))? This would really help guide the debate. I don't have this
 data, and I'm not sure the best way to get it. A super-fancy approach
would
 be to write a little script that uses the 'ast' module to count things
 automatically. A less fancy approach would be to just pick some code
you've
 written, or a well-known package, grep through for calls to 'dot', and
make
 notes on what you see. (An advantage of the less-fancy approach is that
as a
 human you might be able to tell the difference between scalar and
non-scalar
 *, or check whether it actually matters what order the 'dot' calls are
done
 in.)

Okay, I wrote a little script [1] to scan Python source files look for
things like 'dot(a, dot(b, c))' or 'dot(dot(a, b), c)', or the ndarray.dot
method equivalents. So what we get out is:
- a count of how many 'dot' calls there are
- a count of how often we see left-associative nestings: dot(dot(a, b), c)
- a count of how often we see right-associative nestings: dot(a, dot(b, c))

Running it on a bunch of projects, I get:

| project  | dots | left | right | right/left |
|--+--+--+---+|
| scipy|  796 |   53 |27 |   0.51 |
| nipy |  275 |3 |19 |   6.33 |
| scikit-learn |  472 |   11 |10 |   0.91 |
| statsmodels  |  803 |   46 |38 |   0.83 |
| astropy  |   17 |0 | 0 |nan |
| scikit-image |   15 |1 | 0 |   0.00 |
|--+--+--+---+|
| total| 2378 |  114 |94 |   0.82 |

(Any other projects worth trying? This is something that could vary a lot
between different projects, so it seems more important to get lots of
projects here than to get a few giant projects. Or if anyone wants to run
the script on their own private code, please do! Running it on my personal
pile of random junk finds 3 left-associative and 1 right.)

Two flaws with this approach:
1) Probably some proportion of those nested dot calls are places where it
doesn't actually matter which evaluation order one uses -- dot() forces you
to pick one, so you have to. If people prefer to, say, use the left form
in cases where it doesn't matter, then this could bias the left-vs-right
results -- hard to say. (Somewhere in this thread it was suggested that the
use of the .dot method could create such a bias, because a.dot(b).dot(c) is
more natural than a.dot(b.dot(c)), but only something like 6% of the dot
calls here use the method form, so this probably doesn't matter.)

OTOH, this also means that the total frequency of @ expressions where
associativity even matters at all is probably *over*-estimated by the above.

2) This approach misses cases where the cumbersomeness of dot has caused
people to introduce temporary variables, like 'foo = np.dot(a, b); bar =
np.dot(foo, c)'. So this causes us to *under*-estimate how often
associativity matters. I did read through the 'dot' uses in scikit-learn
and nipy, though, and only caught a handful of such cases, so I doubt it
changes anything much.

-n

[1] https://gist.github.com/njsmith/9157645#file-grep-dot-dot-py

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-18 Thread Nathaniel Smith

On Tue, Mar 18, 2014 at 9:50 AM, Eelco Hoogendoorn
hoogendoorn.ee...@gmail.com wrote:
 To elaborate a little on such a more general and explicit method of
 specifying linear operations (perhaps 'expressions with named axes' is a
 good nomer to cover this topic).
[...]

This is a good topic to bring up on numpy-discussion, but maybe you
should start a new thread? That way it's both more likely to be
noticed by interested parties, and also it will make it easier for me
to keep track of what's going on in this thread, which is about a
specific concrete decision we need to make ;-).

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-18 Thread Nathaniel Smith

On Tue, Mar 18, 2014 at 3:22 PM, Christophe Bal projet...@gmail.com wrote:
 About weak-left. You need to define a priority of @ the matrix product
 regarding to * the elementwise product because (A*B)@C  A*(B@C)

This doesn't follow.  (a / b) * c != a / (b * c), but / and * in
Python have the same priority.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] _gufuncs_linalg module

2014-03-18 Thread Nathaniel Smith

On Tue, Mar 18, 2014 at 5:26 PM, Jay Bourque jay.bour...@continuum.io wrote:
 I was just about to submit some pull requests for fixes to the
 _gufuncs_linalg module and discovered that it no longer exists. It looks
 like it was removed in this commit. Is there any reason why it was removed
 without any apparent discussion? It looks like it was originally added in
 this PR after a long discussion.

IIRC the functionality was merged into umath_linalg.c.src?

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-18 Thread Nathaniel Smith

On 18 Mar 2014 17:32, Christophe Bal projet...@gmail.com wrote:

 This is a different situation because / is indeed an hidden
multiplication : a/b = a*inv(b). The same is true for + and - :
a-b=a+opp(b). What I'm saying is that these operations * and / are indeed
of the very same j-kind.

 This is not the same for * and @.

// (floordiv) isn't equivalent to a multiplication, but it is also at the
same level.  and  aren't inverses, but they are at the same level. 'in'
and 'is' are not even the same type (they have totally different
requirements on their right argument) but they are at the same level.
Whatever choice we make needs to be something we can justify, and our
justification should probably not imply that all of python's other
operators are wrong ;-).

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [RFC] should we argue for a matrix power operator, @@?

2014-03-17 Thread Nathaniel Smith

On Sat, Mar 15, 2014 at 4:32 AM, Nathaniel Smith n...@pobox.com wrote:
 For this discussion let's assume @ can be taken for granted, and that
 we can freely choose to either add @@ or not add @@ to the language.
 The question is: which do we think makes Python a better language (for
 us and in general)?

The thread so far, it sounds like the consensus answer is meh,
whatever. So I'm thinking we should just drop @@ from the PEP, and if
it turns out that this is a problem we can always revisit it in the
~3.6/3.7 timeframe.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-17 Thread Nathaniel Smith

On Sat, Mar 15, 2014 at 7:01 PM, Alexander Belopolsky ndar...@mac.com wrote:

 On Sat, Mar 15, 2014 at 2:25 PM, Alexander Belopolsky ndar...@mac.com
 wrote:

 On Fri, Mar 14, 2014 at 11:41 PM, Nathaniel Smith n...@pobox.com wrote:

 Here's the main blocker for adding a matrix multiply operator '@' to
 Python: we need to decide what we think its precedence and associativity
 should be.


 I am not ready to form my own opinion, but I hope the following will help
 shaping the discussion.


 One more question that I think should be answered by the PEP and may
 influence the associativity decision is what happens if in an A @ B @ C
 expression, each operand has its own type that defines __matmul__ and
 __rmatmul__?  For example, A can be an ndarray, B a sympy expression and C a
 pyoperator.

The general rule in Python is that in a binary operation A # B, then
first we try A.__special__, and if that doesn't exist or it returns
NotImplemented, then we try B.__rspecial__. (The exception is that if
B.__class__ is a proper subclass of A.__class__, then we do it in the
reverse order.)

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-17 Thread Nathaniel Smith

On Mon, Mar 17, 2014 at 4:09 PM, Alexander Belopolsky ndar...@mac.com wrote:

 On Mon, Mar 17, 2014 at 11:48 AM, Nathaniel Smith n...@pobox.com wrote:

  One more question that I think should be answered by the PEP and may
  influence the associativity decision is what happens if in an A @ B @ C
  expression, each operand has its own type that defines __matmul__ and
  __rmatmul__?  For example, A can be an ndarray, B a sympy expression and
  C a
  pyoperator.

 The general rule in Python is that in a binary operation A # B, then
 first we try A.__special__, and if that doesn't exist or it returns
 NotImplemented, then we try B.__rspecial__. (The exception is that if
 B.__class__ is a proper subclass of A.__class__, then we do it in the
 reverse order.)

 This is the simple case.  My question was: what happens if in an A @ B @ C
 expression, each operand has its own type that defines __matmul__ and
 __rmatmul__?

The point of associativity is that the complex case A @ B @ C gets
turned into either A @ (B @ C) or else (A @ B) @ C, and then you're
back in the simple case.

 Are we going to recommend that other projects adopt numpy's
 __array_priority__?

 In mixed-type expressions, do you expect A @ B @ C to have type of A, B, or
 C?

 Does __matmul__ first then __rmatmul__ rule makes sense if @ becomes
 right-associative or should the order be reversed?

** is right-associative and uses the left-then-right rule, so it seems
fine to me.

In general the left-then-right rule has no particular logic behind it,
it's just chosen so as to have *some* rule. In practice all
well-behaved classes have to make sure that they implement __special__
methods in such a way that all the different variations work, no
matter which class ends up actually handling the operation.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-17 Thread Nathaniel Smith

On Mon, Mar 17, 2014 at 9:38 PM, Christophe Bal projet...@gmail.com wrote:
 Here is the translation. ;-)

 Hello,
 and what about something like that ?

 a @ b @ c  -  (a @ b) @ c
 a * b @ c  -  (a * b) @ c
 a @ b * c  -  a @ (b * c)

 Easy to remember: the *-product has priority regarding to the @-product, and
 we just do @-product from left to right.

In the terminology we've been using in this thread, this is weak-left.

 An advantage of this is that most parsers do analyze from left to right.

 So I really think that it is a better choice than the weak-right one.

We've mostly ignored this option because of assuming that if we want
left-associativity, we should go with same-left instead of
weak-left. Same-left is:

a @ b @ c - (a @ b) @ c
a * b @ c - (a * b) @ c
a @ b * c - (a @ b) * c

i.e., even more left-to-right than weak-left :-)

Do you think weak-left is better than same-left?

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-17 Thread Nathaniel Smith

On Mon, Mar 17, 2014 at 11:16 PM, Bago mrb...@gmail.com wrote:
 Speaking of `@@`, would the relative precedence of @ vs * be the same as @@
 vs **?

This is one of the concerns that made Guido leery of @@ (but only one
of them). Since we seem to be dropping @@:
   http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069502.html
we don't have to come up with an answer :-).

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-17 Thread Nathaniel Smith

On Mon, Mar 17, 2014 at 10:33 PM, Christophe Bal projet...@gmail.com wrote:
 I think that weak-left is a little strange, just think a little of the
 operators used by mathematicians that always follow a hierarchy.

Not sure what you mean -- I don't think most mathematicians think that
scalar and matrix multiplication are above or below each other in
precedence, for example. (Well, it's a strange question because scalar
multiplication commutes, but even so, people often forget that these
are even different operations.)

 A parser is mostly done using grammars : see
 http://docs.python.org/3.1/reference/grammar.html.

 Defining *-product to have stronger priority than the @-product, and this
 last having stronger priority than +, will make the changes in the grammar
 easier.

 I'm now convinced of the usefulness of @ and @@ too but I also think that
 you must think of other uses than only for numpy. In other words, numpy is a
 the good argument for this new operators, but this can also open new
 perspectives for other uses.

No, that's not how this game is played :-). The way it works is, we
figure out the best possible way to handle the use case that we've
demonstrated a need for (matrix multiplication), and then once we've
done that someone might or might not find some other uses too. If they
do then cool, if not then too bad. This follows the principle that
it's better to be great at some things than to be mediocre at
everything.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-17 Thread Nathaniel Smith

On Tue, Mar 18, 2014 at 12:16 AM, Christophe Bal projet...@gmail.com wrote:
  I think that weak-left is a little strange, just think
  a little of the operators used by mathematicians that
  always follow a hierarchy.

 Not sure what you mean -- I don't think most mathematicians
 think that scalar and matrix multiplication are above or below
 each other in precedence, for example.

 You're right but on the other hand, I've never seen mixed use of matrix and
 scalar products without parenthesis... Indeed in math, we can use  Au , Bv
 for the scalar product of two matrix-vector products.

Not scalar product, scalar multiplication -- you're saying (I think) that
   3 * Matrix1 * Matrix2
is just like
   3 * Matrix1 + Matrix2
in the sense that mathematicians think of the 3 * Matrix1 part is very
different from, and higher precedence than, the Matrix1 + Matrix2
part. And similarly that
   Matrix1 * Matrix2 * 3
is just like
   Matrix1 + Matrix2 * 3
But in fact I think if you asked most mathematicians which of the
*'s in Matrix1 * Matrix2 * 3 is higher precedence, they would think
this question very odd!

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-17 Thread Nathaniel Smith

On Sat, Mar 15, 2014 at 6:28 PM, Nathaniel Smith n...@pobox.com wrote:
 Mathematica: instead of having an associativity, a @ b @ c gets
 converted into mdot([a, b, c])

So, I've been thinking about this (thanks to @rfateman for pointing it
out), and wondering if Mathematica's approach is worth following up
more. (It would need to make it past python-dev, of course, but worst
case is just that they say no and we're back where we are now, so we
might as well think it through.)

Here's how it would work:

Currently Python has 3 different kinds of ops: left-associative (most
of them), right-associative (**), and chaining. Chaining is used for
comparison ops. Example:

   a  b  c

gets parsed to something like

   do_comparison(args=[a, b, c], ops=[lt, lt])

Notice this is very different from either of

  (a  b)  c
  a  (b  c)

Which means that comparisons aren't left- OR right-associative,
they're this other thing, chaining.

So we could propose adding a 4th kind of op, calling grouping, which
would be only @. And the idea is that

  a @ b @ c

would be equivalent to

  operator.matmul((a, b, c))

which eventually (see below) becomes a call to

  a.__matmul__((a, b, c))

We'd use exactly the same parsing rules as the chaining ops, so you
can still control evaluation order with parentheses if you want:

  a @ (b @ c) - matmul((a, matmul((b, c
  (a @ b) @ c - matmul((matmul((a, c)), c))

...but if you don't specify, then each contiguous group of @ operators
gets collected up and handed to __matmul__ together, and the
__matmul__ implementation gets to decide which evaluation strategy to
use.

It's trivially fast for the computer to figure out the best evaluation
order for matrix multiplication, so in practice I think this would
mean that you could just stop worrying about parentheses for multiple
contiguous calls to matmul. Fancier versions of __matmul__ defined on
more specialized non-ndarray classes might even take into account
their specialized knowledge of how expensive different evaluation
orders are for their specific type -- I'm not sure if this actually
happens in practice, but it might. (Like maybe the best way to
evaluate a @ b @ c depends on the sparsity pattern in the various
matrices, or maybe it depends on which matrices are currently on the
GPU and which are in memory? Anyone have any real examples of this?)

(Of course, this same evaluation-order problem arises for *all*
expressions using numpy; being able to optimize whole expressions at a
time is exactly what numexpr/numba/theano/etc. are useful for. So one
could argue that baking it in to @ is pointless, if anyone gets
tired of writing parentheses they should just use one of these
libraries. Or maybe evaluation order problems arise so rarely for @
that no-one cares. But OTOH it would be so nice for once to just have
a single best solution -- always use @ and be happy, it just works
-- instead of all the caveats we normally do -- @ is good in some
cases, but in other cases mdot is better, or if you know you can just
use @ with the right parentheses)

Of course, we still have to say something about what value a @ b @ c
actually computes. In the PEP semantics, it isn't always associative
-- specifically not if we do Mat @ vec @ Mat. So in this approach, we
still need to decide what
  matmul((Mat, vec, Mat))
should return.

But, this is actually a feature! Because obviously what *should* be
returned in this case is *not* (Mat @ vec) @ Mat, *or* Mat @ (vec @
Mat). Both of those answers are terrible; it's just, if you have an
ordinary left-/right-associative operator, those are your only
options. What *should* be returned is an error. And in this scheme we
get to see the whole @ expression at once, so we actually can raise an
error for such things.

So, this possibly has nicer performance characteristics, and is also
possibly has nicer semantics.

Now, how would this look in terms of the language definition?

As far as the parser and AST go, this would use exactly the same rules
as the chaining ops, so that's easy.

Having parsed, we must evaluate. Likely the most contentious part of
this approach is that we now have an n-arg operator, so the standard
__X__/__rX__ dichotomy won't work, we need to do something like
multiple dispatch. I haven't followed the debate on this issue in
detail, but what I'd propose for this limited context is not to do
anything like real multiple dispatch, but just directly generalize
the familiar __rX__ rule to n arguments. The __rX__ rule is how
Python's existing binary operators work: usually to evaluate a # b,
you try a.__foo__, and then b.__foo__ EXCEPT if b is a proper subclass
of a, you try b first. Generalized to 2 arguments, this looks like:

def operator.matmul(args):
candidates = list(args)
while candidates:
candidate = pop_next_candidate(candidates)
if hasattr(candidate, __matmul__):
result = candidate.__matmul__(args)
if result is not NotImplemented

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-17 Thread Nathaniel Smith

On Mon, Mar 17, 2014 at 8:37 PM, Russell E. Owen ro...@uw.edu wrote:
 After seeing all the traffic on this thread, I am in favor of
 same-left because it is easiest to remember:
 - It introduces no new rules.
 - It is unambiguous. If we pick option 2 or 3 we have no strong reason
 to favor one over the other, leaving users to guess.

 To my mind, being able to easily reason about code you are reading is
 more important that hoping to increase efficiency for one common case
 when not using parenthesis.

Personally I'm leaning in a similar direction (at least as far as
left- versus right-associativity goes; I'm not sure yet what I think
about the magic grouping thing I just posted :-)).

The more I think about it, the weaker I find the avoiding-parentheses
argument. If you're going to take the trouble to think about which
ordering is best, you should write that down with parentheses no
matter what the associativity is, so that when I have to read your
code I'll see the parentheses and know that you thought about it! And
certainly the slow part of this is not typing the parentheses, it's
figuring out what order is best. (The potential advantage of
grouping isn't that you don't have to write as many parentheses,
it's that you don't have to *think* about parentheses.)

The fact that Matlab et al get along fine with same-left also strikes
me as strong evidence that right-associativity's benefits are at least
not overwhelmingly compelling...

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] It looks like Py 3.5 will include a dedicated infix matrix multiply operator

2014-03-16 Thread Nathaniel Smith

On Sun, Mar 16, 2014 at 2:39 PM, Eelco Hoogendoorn
hoogendoorn.ee...@gmail.com wrote:
 Note that I am not opposed to extra operators in python, and only mildly
 opposed to a matrix multiplication operator in numpy; but let me lay out the
 case against, for your consideration.

 First of all, the use of matrix semantics relative to arrays semantics is
 extremely rare; even in linear algebra heavy code, arrays semantics often
 dominate. As such, the default of array semantics for numpy has been a great
 choice. Ive never looked back at MATLAB semantics.

Different people work on different code and have different experiences
here -- yours may or may be typical yours. Pauli did some quick checks
on scikit-learn  nipy  scipy, and found that in their test suites,
uses of np.dot and uses of elementwise-multiplication are ~equally
common: https://github.com/numpy/numpy/pull/4351#issuecomment-37717330h

 Secondly, I feel the urge to conform to a historical mathematical notation
 is misguided, especially for the problem domain of linear algebra. Perhaps
 in the world of mathematics your operation is associative or commutes, but
 on your computer, the order of operations will influence both outcomes and
 performance. Even for products, we usually care not only about the outcome,
 but also how that outcome is arrived at. And along the same lines, I don't
 suppose I need to explain how I feel about A@@-1 and the likes. Sure, it
 isn't to hard to learn or infer this implies a matrix inverse, but why on
 earth would I want to pretend the rich complexity of numerical matrix
 inversion can be mangled into one symbol? Id much rather write inv or pinv,
 or whatever particular algorithm happens to be called for given the
 situation. Considering this isn't the num-lisp discussion group, I suppose I
 am hardly the only one who feels so.


My impression from the other thread is that @@ probably won't end up
existing, so you're safe here ;-).

 On the whole, I feel the @ operator is mostly superfluous. I prefer to be
 explicit about where I place my brackets. I prefer to be explicit about the
 data layout and axes that go into a (multi)linear product, rather than rely
 on obtuse row/column conventions which are not transparent across function
 calls. When I do linear algebra, it is almost always vectorized over
 additional axes; how does a special operator which is only well defined for
 a few special cases of 2d and 1d tensors help me with that?

Einstein notation is coming up on its 100th birthday and is just as
blackboard-friendly as matrix product notation. Yet there's still a
huge number of domains where the matrix notation dominates. It's cool
if you aren't one of the people who find it useful, but I don't think
it's going anywhere soon.

 Note that I don't think there is much harm in an @ operator; but I don't see
 myself using it either. Aside from making textbook examples like a
 gram-schmidt orthogonalization more compact to write, I don't see it having
 much of an impact in the real world.

The analysis in the PEP found ~780 calls to np.dot, just in the two
projects I happened to look at. @ will get tons of use in the real
world. Maybe all those people who will be using it would be happier if
they were using einsum instead, I dunno, but it's an argument you'll
have to convince them of, not me :-).

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] It looks like Py 3.5 will include a dedicated infix matrix multiply operator

2014-03-16 Thread Nathaniel Smith

On Sun, Mar 16, 2014 at 4:33 PM, Eelco Hoogendoorn
hoogendoorn.ee...@gmail.com wrote:

 Different people work on different code and have different experiences
 here -- yours may or may be typical yours. Pauli did some quick checks
 on scikit-learn  nipy  scipy, and found that in their test suites,
 uses of np.dot and uses of elementwise-multiplication are ~equally
 common: https://github.com/numpy/numpy/pull/4351#issuecomment-37717330h

 Yeah; these are examples of linalg-heavy packages. Even there, dot does not
 dominate.

Not sure what makes them linalg-heavy -- they're just trying to
cover two application areas, machine learning and neuroscience. If
that turns out to involve a lot of linear algebra, well, then...

 780 calls is not tons of use, and these projects are outliers id argue.

But you haven't argued! You've just asserted. I admittedly didn't
spend a lot of time figuring out what the most representative
projects were, I just picked two high profile ones off the top of my
head, but I ran the numbers and they came out the way they did. (I
wasn't convinced @ was useful either when I started, I just figured it
would be good to settle the infix operator question one way or the
other. I was also surprised np.dot turned out to be used that
heavily.)

If you don't like my data, then show us yours :-).

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] NumPy-Discussion Digest, Vol 90, Issue 45

2014-03-16 Thread Nathaniel Smith

On Sun, Mar 16, 2014 at 4:37 PM, Colin J. Williams
cjwilliam...@gmail.com wrote:
 I would like to see the case made for @.  Yes, I know that Guido has
 accepted the idea, but he has changed his mind before.

I'm not sure how to usefully respond to this, since, I already wrote a
~20 page document making the case for @? Maybe if you think the
arguments in it aren't good, it would be more helpful to explain which
ones and why?

 The PEP seems neutral to retaining both np.matrix and @.

I'm not sure what gives you this impression. The main point of the
whole first section of the PEP is to explain why the existence of
np.matrix causes problems and why a substantial majority of developers
hate it, and how adding @ will let us solve these problems. Whether we
actually get rid of np.matrix is a more complicated question (we'll
need sort of compatibility/transition strategy, it will depend on how
quickly python versions with @ support are adopted, etc.), but at the
very least the goal is that @ eventually replace it in all new code.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-15 Thread Nathaniel Smith

On Sat, Mar 15, 2014 at 3:41 AM, Nathaniel Smith n...@pobox.com wrote:
 Hi all,

 Here's the main blocker for adding a matrix multiply operator '@' to Python:
 we need to decide what we think its precedence and associativity should be.

Another data point that might be useful:

Matlab: same-left

R: tight-left

IDL: same-left

GAUSS: same-left (IIUC -- any GAUSS experts please correct me if I
misunderstood the fine manual)

Mathematica: instead of having an associativity, a @ b @ c gets
converted into mdot([a, b, c])

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-15 Thread Nathaniel Smith

Hi Chris,

On Sat, Mar 15, 2014 at 4:15 AM, Chris Laumann chris.laum...@gmail.com wrote:
 Hi all,

 Let me preface my two cents by saying that I think the best part of @ being
 accepted is the potential for deprecating the matrix class — the syntactic
 beauty of infix for matrix multiply is a nice side effect IMHO :) This may
 be why my basic attitude is:

 I don’t think it matters very much but I would vote (weakly) for weak-right.
 Where there is ambiguity, I suspect most practitioners will just put in
 parentheses anyway — especially with combinations of * and @, where I don’t
 think there is a natural intuitive precedence relationship. At least,
 element-wise multiplication is very rare in math/physics texts as an
 explicitly defined elementary operation so I’d be surprised if anybody had a
 strong intuition about the precedence of the ‘*’ operator. And the binding
 order doesn’t matter if it is scalar multiplication.

It doesn't matter and no-one has strong intuitions are generally
arguments for same-left, since that allows everyone to reason about @
in the same way they reason about all of Python's operators.

 I have quite a bit of code with large matrices where the order of
 matrix-vector multiplies is an important optimization and I would certainly
 have a few simpler looking expressions for op @ op @ vec, hence the weak
 preference for right-associativity. That said, I routinely come across
 situations where the optimal matrix multiplication order is more complicated
 than can be expressed as left-right or right-left (because some matrices
 might be diagonal, CSR or CSC), which is why the preference is only weak. I
 don’t see a down-side in the use-case that it is actually associative (as in
 matrix-matrix-vector).

Would you mind taking a more systematic look through this code, or
sharing some examples so the rest of us can look? Certainly have a
few simpler looking expressions is a good start, but when we're
talking about changing the grammar of one of the most popular
programming languages in the world it seems worth the effort to gather
some more careful data :-).

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-15 Thread Nathaniel Smith

On Sat, Mar 15, 2014 at 6:33 PM, Joe Kington joferking...@gmail.com wrote:
 On Sat, Mar 15, 2014 at 1:28 PM, Nathaniel Smith n...@pobox.com wrote:

 On Sat, Mar 15, 2014 at 3:41 AM, Nathaniel Smith n...@pobox.com wrote:
  Hi all,
 
  Here's the main blocker for adding a matrix multiply operator '@' to
  Python:
  we need to decide what we think its precedence and associativity should
  be.

 Another data point that might be useful:

 Matlab: same-left


 R: tight-left



 I was going to ask this earlier, but I was worried I was missing something
 major.

 Why was tight-left not an option?


 This means that if you don't use parentheses, you get:
a @ b @ c  -  (a @ b) @ c
a * b @ c  -  a * (b @ c)
a @ b * c  -  (a @ b) * c


 In my (very inexperienced) opinion, it seems like the most intuitive option.

Because tight-left doesn't seem to have much to recommend it over
same-left, and all else being equal having fewer levels of precedence
is usually considered a good thing. Unless I'm missing something. If
we do decide that tight-left is best then we could certainly advocate
for it.

I wouldn't read too much into R's choice; they don't actually define a
separate precedence level for matrix multiplication specifically. They
have a single precedence level for all special (user-defined)
operators, and matrix multiplication happens to be one of these.
(Their versions of // and % are also special, but I don't think
anyone would expect // to bind more tightly than / if one were
choosing precedences on a case-by-case basis.)

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-15 Thread Nathaniel Smith

On 15 Mar 2014 19:02, Charles R Harris charlesr.har...@gmail.com wrote:
 Just to throw something new into the mix

  u@v@w = u@(v@w) -- u@v is a dyadic matrix

  u@v -- is a scalar

 It would be nice if u@v@None, or some such, would evaluate as a dyad. Or
else we will still need the concept of row and column 1-D matrices. I still
think v.T should set a flag so that one can distinguish u@v.T (dyad) from
u.T@v (inner product), where 1-D arrays are normally treated as column
vectors.

This sounds important but I have no idea what any of it means :-) (What's a
dyadic matrix?) Can you elaborate?

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] [RFC] should we argue for a matrix power operator, @@?

2014-03-15 Thread Nathaniel Smith

On Sat, Mar 15, 2014 at 1:13 PM, Alan G Isaac alan.is...@gmail.com wrote:
 On 3/15/2014 12:32 AM, Nathaniel Smith wrote:
   I know you were worried
 about losing the .I attribute on matrices if switching to ndarrays for
 teaching -- given that ndarray will probably not get a .I attribute,
 how much would the existence of @@ -1 affect you?

 Not much. Positive integer powers would be useful
 (for illustrating e.g. graph theory and difference equations),
 but not enough to delay the PEP.

So to be clear, even if numpy.matrix is going away, and even if
ndarray isn't getting a .I attribute, then you're just as happy
typing/teaching inv(X) as X @@ -1?

 I think NumPy should take the money and run.
 Getting `@` is great.  Let's get experience with
 it before deciding whether it's worth asking for `@@`.

 Questions for `@@`:
 - would it just be `matrix_power`, with all the restrictions?
 - or would `a(10,2,2)@@-1` return an array of matrix inverses?
 - etc

The version in the PEP does do gufunc-style broadcasting for 2d
arrays, yes. So will np.linalg.matrix_power as soon as someone bothers
to send a patch ;-)

 In the end, I'd like to see a functional implementation before
 deciding on `@@`, but I would not like to see `@` delayed at all.

Oh, well, not much is going to affect `@`'s timing, unless we're
*dreadfully* slow. Py 3.5 isn't even scheduled yet b/c 3.4 isn't out,
and IIUC Python's standard release cycle is 18 months. So we've got a
year+ before feature freeze, regardless.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] It looks like Py 3.5 will include a dedicated infix matrix multiply operator

2014-03-14 Thread Nathaniel Smith

Well, that was fast. Guido says he'll accept the addition of '@' as an
infix operator for matrix multiplication, once some details are ironed
out:
  https://mail.python.org/pipermail/python-ideas/2014-March/027109.html
  http://legacy.python.org/dev/peps/pep-0465/

Specifically, we need to figure out whether we want to make an
argument for a matrix power operator (@@), and what
precedence/associativity we want '@' to have. I'll post two separate
threads to get feedback on those in an organized way -- this is just a
heads-up.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-14 Thread Nathaniel Smith

Hi all,

Here's the main blocker for adding a matrix multiply operator '@' to
Python: we need to decide what we think its precedence and associativity
should be. I'll explain what that means so we're on the same page, and what
the choices are, and then we can all argue about it. But even better would
be if we could get some data to guide our decision, and this would be a lot
easier if some of you all can help; I'll suggest some ways you might be
able to do that.

So! Precedence and left- versus right-associativity. If you already know
what these are you can skim down until you see CAPITAL LETTERS.

We all know what precedence is. Code like this:
  a + b * c
gets evaluated as:
  a + (b * c)
because * has higher precedence than +. It binds more tightly, as they
say. Python's complete precedence able is here:
  http://docs.python.org/3/reference/expressions.html#operator-precedence

Associativity, in the parsing sense, is less well known, though it's just
as important. It's about deciding how to evaluate code like this:
  a * b * c
Do we use
  a * (b * c)# * is right associative
or
  (a * b) * c# * is left associative
? Here all the operators have the same precedence (because, uh... they're
the same operator), so precedence doesn't help. And mostly we can ignore
this in day-to-day life, because both versions give the same answer, so who
cares. But a programming language has to pick one (consider what happens if
one of those objects has a non-default __mul__ implementation). And of
course it matters a lot for non-associative operations like
  a - b - c
or
  a / b / c
So when figuring out order of evaluations, what you do first is check the
precedence, and then if you have multiple operators next to each other with
the same precedence, you check their associativity. Notice that this means
that if you have different operators that share the same precedence level
(like + and -, or * and /), then they have to all have the same
associativity. All else being equal, it's generally considered nice to have
fewer precedence levels, because these have to be memorized by users.

Right now in Python, every precedence level is left-associative, except for
'**'. If you write these formulas without any parentheses, then what the
interpreter will actually execute is:
  (a * b) * c
  (a - b) - c
  (a / b) / c
but
  a ** (b ** c)

Okay, that's the background. Here's the question. We need to decide on
precedence and associativity for '@'. In particular, there are three
different options that are interesting:

OPTION 1 FOR @:
Precedence: same as *
Associativity: left
My shorthand name for it: same-left (yes, very creative)

This means that if you don't use parentheses, you get:
   a @ b @ c  -  (a @ b) @ c
   a * b @ c  -  (a * b) @ c
   a @ b * c  -  (a @ b) * c

OPTION 2 FOR @:
Precedence: more-weakly-binding than *
Associativity: right
My shorthand name for it: weak-right

This means that if you don't use parentheses, you get:
   a @ b @ c  -  a @ (b @ c)
   a * b @ c  -  (a * b) @ c
   a @ b * c  -  a @ (b * c)

OPTION 3 FOR @:
Precedence: more-tightly-binding than *
Associativity: right
My shorthand name for it: tight-right

This means that if you don't use parentheses, you get:
   a @ b @ c  -  a @ (b @ c)
   a * b @ c  -  a * (b @ c)
   a @ b * c  -  (a @ b) * c

We need to pick which of which options we think is best, based on whatever
reasons we can think of, ideally more than hmm, weak-right gives me warm
fuzzy feelings ;-). (In principle the other 2 possible options are
tight-left and weak-left, but there doesn't seem to be any argument in
favor of either, so we'll leave them out of the discussion.)

Some things to consider:

* and @ are actually not associative (in the math sense) with respect to
each other, i.e., (a * b) @ c and a * (b @ c) in general give different
results when 'a' is not a scalar. So considering the two expressions 'a * b
@ c' and 'a @ b * c', we can see that each of these three options gives
produces different results in some cases.

Same-left is the easiest to explain and remember, because it's just, @
acts like * and /. So we already have to know the rule in order to
understand other non-associative expressions like a / b / c or a - b - c,
and it'd be nice if the same rule applied to things like a * b @ c so we
only had to memorize *one* rule. (Of course there's ** which uses the
opposite rule, but I guess everyone internalized that one in secondary
school; that's not true for * versus @.) This is definitely the default we
should choose unless we have a good reason to do otherwise.

BUT: there might indeed be a good reason to do otherwise, which is the
whole reason this has come up. Consider:
Mat1 @ Mat2 @ vec
Obviously this will execute much more quickly if we do
Mat1 @ (Mat2 @ vec)
because that results in two cheap matrix-vector multiplies, while
(Mat1 @ Mat2) @ vec
starts out by doing an expensive matrix-matrix multiply. So: maybe @ should
be right associative, so that we

Re: [Numpy-discussion] It looks like Py 3.5 will include a dedicated infix matrix multiply operator

2014-03-14 Thread Nathaniel Smith

On Sat, Mar 15, 2014 at 3:18 AM, Chris Laumann chris.laum...@gmail.com wrote:

 That’s great.

 Does this mean that, in the not-so-distant future, the matrix class will go 
 the way of the dodos? I have had more subtle to fix bugs sneak into code b/c 
 something returns a matrix instead of an array than almost any other single 
 source I can think of. Having two almost indistinguishable types for 2d 
 arrays with slightly different semantics for a small subset of operations is 
 terrible.

Well, it depends on what your definition of distant is :-). Py 3.5
won't be out for some time (3.*4* is coming out this week). And we'll
still need to sit down and figure out if there's any bits of matrix we
want to save (e.g., maybe create an ndarray version of the parser used
for np.matrix(1 2; 3 4)), come up with a transition plan, have a
long mailing list argument about it, etc. But the goal (IMO) is
definitely to get rid of np.matrix as soon as reasonable given these
considerations, and similarly to find a way to switch scipy.sparse
matrices to a more ndarray-like API. So it'll be a few years at least,
but I think we'll get there.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] [RFC] should we argue for a matrix power operator, @@?

2014-03-14 Thread Nathaniel Smith

Hi all,

Here's the second thread for discussion about Guido's concerns about
PEP 465. The issue here is that PEP 465 as currently written proposes
two new operators, @ for matrix multiplication and @@ for matrix power
(analogous to * and **):
  http://legacy.python.org/dev/peps/pep-0465/

The main thing we care about of course is @; I pushed for including @@
because I thought it was nicer to have than not, and I thought the
analogy between * and ** might make the overall package more appealing
to Guido's aesthetic sense.

It turns out I was wrong :-). Guido is -0 on @@, but willing to be
swayed if we think it's worth the trouble to make a solid case.

Note that question now is *not*, how will @@ affect the reception of
@. @ itself is AFAICT a done deal, regardless of what happens with @@.
For this discussion let's assume @ can be taken for granted, and that
we can freely choose to either add @@ or not add @@ to the language.
The question is: which do we think makes Python a better language (for
us and in general)?

Some thoughts to start us off:

Here are the interesting use cases for @@ that I can think of:
- 'vector @@ 2' gives the squared Euclidean length (because it's the
same as vector @ vector). Kind of handy.
- 'matrix @@ n' of course gives the matrix power, which is of marginal
use but does come in handy sometimes, e.g., when looking at graph
connectivity.
- 'matrix @@ -1' provides a very transparent notation for translating
textbook formulas (with all their inverses) into code. It's a bit
unhelpful in practice, because (a) usually you should use solve(), and
(b) 'matrix @@ -1' is actually more characters than 'inv(matrix)'. But
sometimes transparent notation may be important. (And in some cases,
like using numba or theano or whatever, 'matrix @@ -1 @ foo' could be
compiled into a call to solve() anyway.)

(Did I miss any?)

In practice it seems to me that the last use case is the one that's
might matter a lot practice, but then again, it might not -- I'm not
sure. For example, does anyone who teaches programming with numpy have
a feeling about whether the existence of '@@ -1' would make a big
difference to you and your students? (Alan? I know you were worried
about losing the .I attribute on matrices if switching to ndarrays for
teaching -- given that ndarray will probably not get a .I attribute,
how much would the existence of @@ -1 affect you?)

On a more technical level, Guido is worried about how @@'s precedence
should work (and this is somewhat related to the other thread about
@'s precedence and associativity, because he feels that if we end up
giving @ and * different precedence, then that makes it much less
clear what to do with @@, and reduces the strength of the */**/@/@@
analogy). In particular, if we want to argue for @@ then we'll need to
figure out what expressions like
   a @@ b @@ c
and
   a ** b @@ c
and
   a @@ b ** c
should do.

A related question is what @@ should do if given an array as its right
argument. In the current PEP, only integers are accepted, which rules
out a bunch of the more complicated cases like a @@ b @@ c (at least
assuming @@ is right-associative, like **, and I can't see why you'd
want anything else). OTOH, in the brave new gufunc world, it
technically would make sense to define @@ as being a gufunc with
signature (m,m),()-(m,m), and the way gufuncs work this *would* allow
the power to be an array -- for example, we'd have:

   mat = randn(m, m)
   pow = range(n)
   result = gufunc_matrix_power(mat, pow)
   assert result.shape == (n, m, m)
   for i in xrange(n):
   assert np.all(result[i, :, :] == mat ** i)

In this case, a @@ b @@ c would at least be a meaningful expression to
write. OTOH it would be incredibly bizarre and useless, so probably
no-one would ever write it.

As far as these technical issues go, my guess is that the correct rule
is that @@ should just have the same precedence and the same (right)
associativity as **, and in practice no-one will ever write stuff like
a @@ b @@ c. But if we want to argue for @@ we need to come to some
consensus or another here.

It's also possible the answer is ugh, these issues are too
complicated, we should defer this until later when we have more
experience with @ and gufuncs and stuff. After all, I doubt anyone
else will swoop in and steal @@ to mean something else! OTOH, if e.g.
there's a strong feeling that '@@ -1' will make a big difference in
pedagogical contexts, then putting that off for years might be a
mistake.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Matrix multiplication infix operator PEP nearly ready to go

2014-03-12 Thread Nathaniel Smith

Hi all,

The proposal to add an infix operator to Python for matrix
multiplication is nearly ready for its debut on python-ideas; so if
you want to look it over first, just want to check out where it's
gone, then now's a good time:
  https://github.com/numpy/numpy/pull/4351

The basic idea here is to try to make the strongest argument we can
for the simplest extension that we actually want, and then whether it
gets accepted or rejected at least we'll know that's final. Absolutely
all comments and feedback welcome.

Cheers,
-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Matrix multiplication infix operator PEP nearly ready to go

2014-03-12 Thread Nathaniel Smith

On Thu, Mar 13, 2014 at 1:03 AM, Alan G Isaac alan.is...@gmail.com wrote:
 On 3/12/2014 6:04 PM, Nathaniel Smith wrote:
https://github.com/numpy/numpy/pull/4351

 The Semantics section still begins with 0d, then 2d, then 1d, then nd.
 Given the context of the proposal, the order should be:

 2d (the core need expressed in the proposal)
 nd (which generalizes via broadcasting the 2d behavior)
 1d (special casing)
 0d (error)

I've just switched it to 2d - 1d - 3d+ - 0d. You're right that 2d
should go first, but IMO 1d should go after it because 2d and 1d are
the two cases that really get used heavily in practice.

 In this context I see one serious problem: is there a NumPy function
 that produces the proposed nd behavior?  If not why not, and
 can it really be sold as a core need if the need to implement
 it has never been pressed to the point of an implementation?

The logic isn't we have a core need to implement these exact
semantics. It's: we have a core need for this operator; given that
we are adding an operator we have to figure out exactly what the
semantics should be; we did that and documented it and got consensus
from a bunch of projects on it. I don't think the actual details of
the semantics matter nearly as much as the fact that they exist.

 Unless this behavior is first implemented, the obvious question remains:
 why will `@` not just implement `dot`, for which there is a well
 tested and much used implementation?

Because of the reason above, I'm not sure it will come up (I don't
think python-dev is nearly as familiar with the corner cases of
numpy.dot as we are :-)). But if it does the answer is easy: no-one
ever thought through exactly how `dot` should work in these rare edge
cases, now we did. But we can't just change `dot` quickly, because of
backwards compatibility considerations. `@` is new, there's no
compatibility problem, so we might as well get it right from the
start.

If the behavioural differences between `dot` and `@` were more
controversial then I'd worry more. But the consequences of the 0d
thing are trivial to understand, and in the 3d+ case we're already
shipping dozens of functions that have exactly these broadcasting
semantics.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] c api deprecations with NPY_NO_DEPRECATED_API

2014-03-11 Thread Nathaniel Smith

On 11 Mar 2014 13:28, Paul Brossier p...@piem.org wrote:
 If I understand correctly, the current version is the one installed on
 the user system. So using NPY_API_VERSION would mean this code should
 work with any version of numpy. I guess this is what I want (I would
 even expect this to be the default setting). Did I miss something?

Using NPY_API_VERSION here means this code will work with any version of
numpy, *including ones that aren't released yet and might have arbitrary
API changes*.

This is almost certainly not what you want.

The idea of the deprecation support is that it gives you a grace period to
adapt to upcoming changes before they break your code. Suppose PyArray_foo
is going to be removed in numpy 1.10. If we just removed it, your first
warning would be when we release 1.10 and suddenly you have angry users who
find your software no longer works. So the trick is that before we remove
it entirely, we release 1.9, in which PyArray_foo is available if your
NPY_DEPRECATED_API version is set to 1.8 or earlier, but not if it's set to
1.9. Your released versions thus continue to work, your users are happy,
and the first person to encounter the problem is you, when you try to
update your NPY_DEPRECATED_API to 1.9. You fix the problem, you make a new
release, and then when 1.10 comes along everything works.

Moral: set NPY_DEPRECATED_API to match the highest numpy version you've
tested.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] c api deprecations with NPY_NO_DEPRECATED_API

2014-03-11 Thread Nathaniel Smith

On 11 Mar 2014 14:25, Paul Brossier p...@piem.org wrote:

 On 11/03/2014 10:49, Nathaniel Smith wrote:
  On 11 Mar 2014 13:28, Paul Brossier p...@piem.org
  mailto:p...@piem.org wrote:
  If I understand correctly, the current version is the one installed on
  the user system. So using NPY_API_VERSION would mean this code should
  work with any version of numpy. I guess this is what I want (I would
  even expect this to be the default setting). Did I miss something?
 
  Using NPY_API_VERSION here means this code will work with any version
  of numpy, *including ones that aren't released yet and might have
  arbitrary API changes*.
 
  This is almost certainly not what you want.

 Thanks for the clarification.

  The idea of the deprecation support is that it gives you a grace period
  to adapt to upcoming changes before they break your code. Suppose
  PyArray_foo is going to be removed in numpy 1.10. If we just removed it,
  your first warning would be when we release 1.10 and suddenly you have
  angry users who find your software no longer works. So the trick is that
  before we remove it entirely, we release 1.9, in which PyArray_foo is
  available if your NPY_DEPRECATED_API version is set to 1.8 or earlier,
  but not if it's set to 1.9. Your released versions thus continue to
  work, your users are happy, and the first person to encounter the
  problem is you, when you try to update your NPY_DEPRECATED_API to 1.9.
  You fix the problem, you make a new release, and then when 1.10 comes
  along everything works.
 
  Moral: set NPY_DEPRECATED_API to match the highest numpy version you've
  tested.

 I guess you meant NPY_NO_DEPRECATED_API?

Yes. I'm just too lazy to check these things on my phone :-).

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy gsoc ideas (was: numpy gsoc topic idea: configurable algorithm precision and vector math library integration)

2014-03-06 Thread Nathaniel Smith

On Thu, Mar 6, 2014 at 5:17 AM, Sturla Molden sturla.mol...@gmail.com wrote:
Nathaniel Smith n...@pobox.com wrote:

3. Using Cython in the numpy core

The numpy core contains tons of complicated C code implementing
elaborate operations like indexing, casting, ufunc dispatch, etc. It
would be really nice if we could use Cython to write some of these
things.

So the idea of having a NumPy as a pure C library in the core is abandoned?

This question doesn't make sense to me so I think I must be missing
some context.

Nothing is abandoned: This is one email by one person on one mailing
list suggesting a project to the explore the feasibility of something.
And anyway, Cython is just a C code generator, similar in principle to
(though vastly more sophisticated than) the ones we already use. It's
not like we've ever promised our users we'll keep stable which kind of
code generators we use internally.

However, there is a practical problem: Cython assumes that
each .pyx file generates a single compiled module with its own
Cython-defined API. Numpy, however, contains a large number of .c
files which are all compiled together into a single module, with its
own home-brewed system for defining the public API. And we can't
rewrite the whole thing. So for this to be viable, we would need some
way to compile a bunch of .c *and .pyx* files together into a single
module, and allow the .c and .pyx files to call each other.

Cython takes care of that already.

http://docs.cython.org/src/userguide/sharing_declarations.html#cimport

http://docs.cython.org/src/userguide/external_C_code.html#using-cython-declarations-from-c

Linking multiple .c and .pyx files together into a single .so/.dll is
much more complicated than just using 'cimport'. Try it if you don't
believe me :-).

-n

--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] numpy gsoc ideas (was: numpy gsoc topic idea: configurable algorithm precision and vector math library integration)

2014-03-06 Thread Nathaniel Smith

On Thu, Mar 6, 2014 at 9:11 AM, David Cournapeau courn...@gmail.com wrote:

 On Wed, Mar 5, 2014 at 9:11 PM, Nathaniel Smith n...@pobox.com wrote:
 So this project would have the following goals, depending on how
 practical this turns out to be: (1) produce a hacky proof-of-concept
 system for doing the above, (2) turn the hacky proof-of-concept into
 something actually viable for use in real life (possibly this would
 require getting changes upstream into Cython, etc.), (3) use this
 system to actually port some interesting numpy code into cython.


 Having to synchronise two projects may be hard for a GSoC, no ?

Yeah, if someone is interested in this it would be nice to get someone
from Cython involved too. But that's why the primary goal is to
produce a proof-of-concept -- even if all that comes out is that we
learn that this cannot be done in an acceptable manner, then that's
still a succesful (albeit disappointing) result.

 Otherwise, I am a bit worried about cython being used on the current C code
 as is, because core and python C API are so interwined (especially
 multiarray).

I don't understand this objection. The whole advantage of Cython is
that it makes it much, much easier to write code that involves
intertwining complex algorithms and heavy use of the Python C API :-).
There's tons of bug-prone spaghetti in numpy for doing boring things
like refcounting, exception passing, and argument parsing.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Adding weights to cov and corrcoef

2014-03-06 Thread Nathaniel Smith

On Wed, Mar 5, 2014 at 4:45 PM, Sebastian Berg
sebast...@sipsolutions.net wrote:

 Hi all,

 in Pull Request https://github.com/numpy/numpy/pull/3864 Neol Dawe
 suggested adding new parameters to our `cov` and `corrcoef` functions to
 implement weights, which already exists for `average` (the PR still
 needs to be adapted).

 The idea right now would be to add a `weights` and a `frequencies`
 keyword arguments to these functions.

 In more detail: The situation is a bit more complex for `cov` and
 `corrcoef` than `average`, because there are different types of weights.
 The current plan would be to add two new keyword arguments:
   * weights: Uncertainty weights which causes `N` to be recalculated
 accordingly (This is R's `cov.wt` default I believe).
   * frequencies: When given, `N = sum(frequencies)` and the values
 are weighted by their frequency.

I don't understand this description at all. One them recalculates N,
and the other sets N according to some calculation?

Is there a standard reference on how these are supposed to be
interpreted? When you talk about per-value uncertainties, I start
imagining that we're trying to estimate a population covariance given
a set of samples each corrupted by independent measurement noise, and
then there's some natural hierarchical Bayesian model one could write
down and get an ML estimate of the latent covariance via empirical
Bayes or something. But this requires a bunch of assumptions and is
that really what we want to do? (Or maybe it collapses down into
something simpler if the measurement noise is gaussian or something?)

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] numpy gsoc ideas (was: numpy gsoc topic idea: configurable algorithm precision and vector math library integration)

2014-03-05 Thread Nathaniel Smith

On Mon, Mar 3, 2014 at 7:20 PM, Julian Taylor
jtaylor.deb...@googlemail.com wrote:
 hi,

 as the numpy gsoc topic page is a little short on options I was thinking
 about adding two topics for interested students. But as I have no
 experience with gsoc or mentoring and the ideas are not very fleshed out
 yet I'd like to ask if it might make sense at all:

 1. configurable algorithm precision
[...]
 with np.precmode(default=fast):
   np.abs(complex_array)

 or fast everything except sum and hypot

 with np.precmode(default=fast, sum=kahan, hypot=standard):
   np.sum(d)
[...]

Not a big fan of this one -- it seems like the biggest bulk of the
effort would be in figuring out a non-horrible API for exposing these
things and getting consensus around it, which is not a good fit to the
SoC structure.

I'm pretty nervous about the datetime proposal that's currently on the
wiki, for similar reasons -- I'm not sure it's actually doable in the
SoC context.

 2. vector math library integration

This is a great suggestion -- clear scope, clear benefit.

Two more ideas:

3. Using Cython in the numpy core

The numpy core contains tons of complicated C code implementing
elaborate operations like indexing, casting, ufunc dispatch, etc. It
would be really nice if we could use Cython to write some of these
things. However, there is a practical problem: Cython assumes that
each .pyx file generates a single compiled module with its own
Cython-defined API. Numpy, however, contains a large number of .c
files which are all compiled together into a single module, with its
own home-brewed system for defining the public API. And we can't
rewrite the whole thing. So for this to be viable, we would need some
way to compile a bunch of .c *and .pyx* files together into a single
module, and allow the .c and .pyx files to call each other. This might
involve changes to Cython, some sort of clever post-processing or glue
code to get existing cython-generated source code to play nicely with
the rest of numpy, or something else.

So this project would have the following goals, depending on how
practical this turns out to be: (1) produce a hacky proof-of-concept
system for doing the above, (2) turn the hacky proof-of-concept into
something actually viable for use in real life (possibly this would
require getting changes upstream into Cython, etc.), (3) use this
system to actually port some interesting numpy code into cython.

4. Pythonic dtypes

The current dtype system is klugey. It basically defines its own class
system, in parallel to Python's, and unsurprisingly, this new class
system is not as good. In particular, it has limitations around the
storage of instance-specific data which rule out a large variety of
interesting user-defined dtypes, and causes us to need some truly
nasty hacks to support the built-in dtypes we do have. And it makes
defining a new dtype much more complicated than defining a new Python
class.

This project would be to implement a new dtype system for numpy, in
which np.dtype becomes a near-empty base class, different dtypes
(e.g., float64, float32) are simply different subclasses of np.dtype,
and dtype objects are simply instances of these classes. Further
enhancements would be to make it possible to define new dtypes in pure
Python by subclassing np.dtype and implementing special methods for
the various dtype operations, and to make it possible for ufunc loops
to see the dtype objects.

This project would provide the key enabling piece for a wide variety
of interesting new features: missing value support, better handling of
strings and categorical data, unit handling, automatic
differentiation, and probably a bunch more I'm forgetting right now.

If we get someone who's up to handling the dtype thing then I can
mentor or co-mentor.

What do y'all think?

(I don't think I have access to update that wiki page -- or maybe I'm
just not clever enough to figure out how -- so it would be helpful if
someone who can, could?)

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] How exactly ought 'dot' to work?

2014-02-23 Thread Nathaniel Smith

On Sat, Feb 22, 2014 at 7:09 PM, Pauli Virtanen p...@iki.fi wrote:
23.02.2014 00:03, Nathaniel Smith kirjoitti:
Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In
practice this doesn't usually matter much, because these are very
rarely used. But, I would like to nail down the behaviour so we can
say something precise in the matrix multiplication PEP.

I'm not sure it's necessary to say much about this in the PEP. It should
in my view concentrate on arguing why the new binop is needed in the
Python language, and for that, restricting to 2D is good enough IMHO.

How exactly Numpy makes use of the capability for 2-dim arrays is
something that should definitely be discussed.

But I think this is a problem mainly interesting for Numpy devs, and not
for CPython devs.

I actually disagree strongly. I think it's very important to make
clear that we have a clear, well thought through, and cross-project
approach to what @ is supposed to mean, so that this doesn't come
across as numpy asking python-dev for a blank check to go and define
the de facto semantics of a new operator just for ourselves and
without any adult supervision. I just don't think they trust us that
much. (Honestly I probably wouldn't either in their place). It's true
that the higher-dim cases aren't the most central ones, but it can't
hurt to document all the details.

I've tentatively rewritten the first section of the PEP to try and
accomplish this framing:

https://github.com/njsmith/numpy/blob/matmul-pep/doc/neps/return-of-revenge-of-matmul-pep.rst
Comments welcome etc.

Also BTW in the process I discovered another reason why broadcasting
is better than the outer product semantics -- with broadcasting,
writing down matrix power for 2d is trivial and natural, but with the
outer product semantics, it's practically impossible!

-n

Re: [Numpy-discussion] OpenBLAS on Mac

2014-02-22 Thread Nathaniel Smith

On Sat, Feb 22, 2014 at 3:55 PM, Sturla Molden sturla.mol...@gmail.com wrote:
 On 20/02/14 17:57, Jurgen Van Gael wrote:
 Hi All,

 I run Mac OS X 10.9.1 and was trying to get OpenBLAS working for numpy.
 I've downloaded the OpenBLAS source and compiled it (thanks to Olivier
 Grisel).

 How?

 $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1

You'll definitely want to disable the affinity support too, and
probably memory warmup. And possibly increase the maximum thread
count, unless you'll only use the library on the computer it was built
on. And maybe other things. The OpenBLAS build process has so many
ways to accidentally impale yourself, it's an object lesson in why
building regulations are a good thing.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] A PEP for adding infix matrix multiply to Python

2014-02-22 Thread Nathaniel Smith

[Apologies for wide distribution -- please direct followups to either
the github PR linked below, or else numpy-discussion@scipy.org]

After the numpy-discussion thread about np.matrix a week or so back, I
got curious and read the old PEPs that attempted to add better
matrix/elementwise operators to Python.
  http://legacy.python.org/dev/peps/pep-0211/
  http://legacy.python.org/dev/peps/pep-0225/
And I was a bit surprised -- if I were BDFL I probably would have
rejected these PEPs too. One is actually a proposal to make
itertools.product into an infix operator, which no-one would consider
seriously on its own merits. And the other adds a whole pile of
weirdly spelled new operators with no clear idea about what they
should do.

But it seems to me that at this point, with the benefit of multiple
years more experience, we know much better what we want -- basically,
just a nice clean infix op for matrix multiplication. And that just
asking for this directly, and explaining clearly why we want it, is
something that hasn't been tried. So maybe we should try and see what
happens.

As a starting point for discussion, I wrote a draft. It can be read
and commented on here:
   https://github.com/numpy/numpy/pull/4351

It's important that if we're going to do this at all, we do it right,
and that means being able to honestly say that this document
represents our consensus when going to python-dev. So if you think you
might object please do so now :-)

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] How exactly ought 'dot' to work?

2014-02-22 Thread Nathaniel Smith

Hi all,

Currently numpy's 'dot' acts a bit weird for ndim2 or ndim1. In
practice this doesn't usually matter much, because these are very
rarely used. But, I would like to nail down the behaviour so we can
say something precise in the matrix multiplication PEP. So here's one
proposal.

# CURRENT:

dot(0d, any) - scalar multiplication
dot(any, 0d) - scalar multiplication
dot(1d, 1d) - inner product
dot(2d, 1d) - treat 1d as column matrix, matrix-multiply, then
discard added axis
dot(1d, 2d) - treat 1d as row matrix, matrix-multiply, then discard added axis
dot(2d, 2d) - matrix multiply
dot(2-or-more d, 2-or-more d) - a complicated outer product thing:
Specifically, if the inputs have shapes (r, n, m), (s, m, k), then
numpy returns an array with shape (r, s, n, k), created like:
for i in range(r):
for j in range(s):
output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :])

# PROPOSED:

General rule: given dot on shape1, shape2, we try to match these
shapes against two templates like
  (..., n?, m) and (..., m, k?)
where ... indicates zero or more dimensions, and ? indicates an
optional axis. ? axes are always matched before ... axes, so for an
input with ndim=2, the ? axis is always matched. An unmatched ? axis
is treated as having size 1.

Next, the ... axes are broadcast against each other in the usual way
(prepending 1s to make lengths the same, requiring corresponding
entries to either match or have the value 1).  And then the actual
computations are performed using the usual broadcasting rules.

Finally, we return an output with shape (..., n?, k?). Here ...
indicates the result of broadcasting the input ...'s against each
other. And, n? and k? mean: either the value taken from the input
shape, if the corresponding entry was matched -- but if no match was
made, then we leave this entry out. The idea is that just as a column
vector on the right is m x 1, a 1d vector on the right is treated as
m x nothing. For purposes of actually computing the product,
nothing acts like 1, as mentioned above. But it makes a difference
in what we return: in each of these cases we copy the input shape into
the output, so we can get an output with shape (n, nothing), or
(nothing, k), or (nothing, nothing), which work out to be (n,),
(k,) and (), respectively. This gives a (somewhat) intuitive principle
for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they
are, and a general template for extending this behaviour to other
operations like gufunc 'solve'.

Anyway, the end result of this is that the PROPOSED behaviour differs
from the current behaviour in the following ways:
- passing 0d arrays to 'dot' becomes an error. (This in particular is
an important thing to know, because if core Python adds an operator
for 'dot', then we must decide what it should do for Python scalars,
which are logically 0d.)
- ndim2 arrays are now handled by aligning and broadcasting the extra
axes, instead of taking an outer product. So dot((r, m, n), (r, n, k))
returns (r, m, k), not (r, r, m, k).

Comments?

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] How exactly ought 'dot' to work?

2014-02-22 Thread Nathaniel Smith

On Sat, Feb 22, 2014 at 5:17 PM, Matthew Brett matthew.br...@gmail.com wrote:
 The discussion might become confusing in the conflation of:

 * backward incompatible changes to dot
 * coherent behavior to propose in a PEP

Right, I definitely am asking about how we think the ideal dot
operator should work. The migration strategy to get to there from here
is a separate question, that will raise a bunch of details that are a
distraction from the fundamental question of what we *want*.

 If A @ B means 'matrix multiply A, B' - then it would be a shame to
 raise an error if A or B is a scalar.

 Sympy, for example, will allow matrix multiplication by a scalar,
 MATLAB / Octave too.

Interesting! We do disagree on this then. I feel strongly that given
separate matrix product and elementwise product operators @ and *,
then 'scalar @ matrix' should be an error, and if you want scalar
(elementwise) product then you should write 'scalar * matrix'.

Sympy, MATLAB, Octave are not really good guides, because either they
have only a single operator available (Sympy with *, np.matrix with
*), or they have an alternative operator available but it's annoying
to type and rarely used (MATLAB/Octave with .*).

For us, the two operators are both first-class, and we've all decided
that the scalar/elementwise operator is actually the more important
and commonly used one, and matrix multiply is the unusual case
(regardless of whether we end up spelling it 'dot' or '@'). So why
would we want a special case for scalars in dot/@? And anyway,
TOOWTDI.

Notation like 'L * X @ Y' really makes it immediately clear what sort
of operations we're dealing with, too.

 I have used  2D dot calls in the past, maybe still do, I'm not sure.

Have you used dot(2D, 2D)? That's the case that this proposal would
change -- dot(2D, =2D) is the same under both the outer product and
broadcasting proposals.

I feel pretty strongly that the broadcasting proposal is the right
thing, for consistency with other operators -- e.g., all ufuncs and
all functions in np.linalg already do broadcasting, so 'dot' is
currently really the odd one out. The outer product functionality is
potentially useful, but it should be called np.dot.outer, because
logically it has the same relationship to np.dot that np.add.outer has
to np.add, etc.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function

2014-02-20 Thread Nathaniel Smith

If you send a patch that deprecates dot's current behaviour for ndim2,
we'll probably merge it. (We'd like it to function like you suggest, for
consistency with other gufuncs. But to get there we have to deprecate the
current behaviour first.)

While I'm wishing for things I'll also mention that it would be really neat
if binary gufuncs would have a .outer method like regular ufuncs do, so
anyone currently using ndim2 dot could just switch to that. But that's a
lot more work than just deprecating something :-).

-n
On 20 Feb 2014 09:27, Eric Moore e...@redtetrahedron.org wrote:



 On Thursday, February 20, 2014, Eelco Hoogendoorn 
 hoogendoorn.ee...@gmail.com wrote:

 If the standard semantics are not affected, and the most common
 two-argument scenario does not take more than a single if-statement
 overhead, I don't see why it couldn't be a replacement for the existing
 np.dot; but others mileage may vary.


 On Thu, Feb 20, 2014 at 11:34 AM, Stefan Otte stefan.o...@gmail.comwrote:

 Hey,

 so I propose the following.  I'll implement a new function `mdot`.
 Incorporating the changes in `dot` are unlikely. Later, one can still
 include
 the features in `dot` if desired.

 `mdot` will have a default parameter `optimize`.  If `optimize==True` the
 reordering of the multiplication is done.  Otherwise it simply chains the
 multiplications.

 I'll test and benchmark my implementation and create a pull request.

 Cheers,
  Stefan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 Another consideration here is that we need a better way to work with
 stacked matrices such as np.linalg handles now.  Ie I want to compute the
 matrix product of two (k, n, n) arrays producing a (k,n,n) result.  Near
 as  I can tell there isn't a way to do this right now that doesn't involve
 an explicit loop. Since dot will return a (k, n, k, n) result. Yes this
 output contains what I want but it also computes a lot of things that I
 don't want too.

 It would also be nice to be able to do a matrix product reduction, (k, n,
 n) - (n, n) in a single line too.

 Eric

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function

2014-02-20 Thread Nathaniel Smith

On Thu, Feb 20, 2014 at 1:35 PM, Stefan Otte stefan.o...@gmail.com wrote:
 Hey guys,

 I quickly hacked together a prototype of the optimization step:
 https://github.com/sotte/numpy_mdot

 I think there is still room for improvements so feedback is welcome :)
 I'll probably have some time to code on the weekend.

 @Nathaniel, I'm still not sure about integrating it in dot. Don't a
 lot of people use the optional out parameter of dot?

The email you're replying to below about deprecating stuff in 'dot'
was in reply to Eric's email about using dot on arrays with shape (k,
n, n), so those comments are unrelated to the mdot stuff.

I wouldn't mind seeing out= arguments become kw-only in general, but
even if we decided to do that it would take a long deprecation period,
so yeah, let's give up on 'dot(A, B, C, D)' as syntax for mdot.

However, the suggestion of supporting np.dot([A, B, C, D]) still seems
like it might be a good idea...? I have mixed feelings about it -- one
less item cluttering up the namespace, but it is weird and magical to
have two totally different calling conventions for the same function.

-n

 On Thu, Feb 20, 2014 at 4:02 PM, Nathaniel Smith n...@pobox.com wrote:
 If you send a patch that deprecates dot's current behaviour for ndim2,
 we'll probably merge it. (We'd like it to function like you suggest, for
 consistency with other gufuncs. But to get there we have to deprecate the
 current behaviour first.)

 While I'm wishing for things I'll also mention that it would be really neat
 if binary gufuncs would have a .outer method like regular ufuncs do, so
 anyone currently using ndim2 dot could just switch to that. But that's a
 lot more work than just deprecating something :-).

 -n

 On 20 Feb 2014 09:27, Eric Moore e...@redtetrahedron.org wrote:



 On Thursday, February 20, 2014, Eelco Hoogendoorn
 hoogendoorn.ee...@gmail.com wrote:

 If the standard semantics are not affected, and the most common
 two-argument scenario does not take more than a single if-statement
 overhead, I don't see why it couldn't be a replacement for the existing
 np.dot; but others mileage may vary.


 On Thu, Feb 20, 2014 at 11:34 AM, Stefan Otte stefan.o...@gmail.com
 wrote:

 Hey,

 so I propose the following.  I'll implement a new function `mdot`.
 Incorporating the changes in `dot` are unlikely. Later, one can still
 include
 the features in `dot` if desired.

 `mdot` will have a default parameter `optimize`.  If `optimize==True`
 the
 reordering of the multiplication is done.  Otherwise it simply chains
 the
 multiplications.

 I'll test and benchmark my implementation and create a pull request.

 Cheers,
  Stefan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 Another consideration here is that we need a better way to work with
 stacked matrices such as np.linalg handles now.  Ie I want to compute the
 matrix product of two (k, n, n) arrays producing a (k,n,n) result.  Near as
 I can tell there isn't a way to do this right now that doesn't involve an
 explicit loop. Since dot will return a (k, n, k, n) result. Yes this output
 contains what I want but it also computes a lot of things that I don't want
 too.

 It would also be nice to be able to do a matrix product reduction, (k, n,
 n) - (n, n) in a single line too.

 Eric

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe

2014-02-19 Thread Nathaniel Smith

Hey all,

Just a heads up: thanks to the tireless work of Olivier Grisel, the
OpenBLAS development branch is now fork-safe when built with its default
threading support. (It is still not thread-safe when built using OMP for
threading and gcc, but this is not the default.)

Gory details: https://github.com/xianyi/OpenBLAS/issues/294

Check it out - if it works you might want to consider lobbying your
favorite distro to backport it.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal to make power return float, and other such things.

2014-02-18 Thread Nathaniel Smith

Perhaps integer power should raise an error on negative powers? That way
people will at least be directed to use arr ** -1.0 instead of silently
getting nonsense from arr ** -1.
On 18 Feb 2014 06:57, Robert Kern robert.k...@gmail.com wrote:

 On Tue, Feb 18, 2014 at 11:44 AM, Sturla Molden sturla.mol...@gmail.com
 wrote:
  Robert Kern robert.k...@gmail.com wrote:
 
  We're talking about numpy.power(), not just ndarray.__pow__(). The
  equivalence of the two is indeed an implementation detail, but I do
  think that it is useful to maintain the equivalence. If we didn't, it
  would be the only exception, to my knowledge.
 
  But in this case it makes sence.

 Every proposed special case we come up with makes sense in some way.
 That doesn't mean that they are special enough to break the rules. In
 my opinion, this is not special enough to break the rules. In your
 opinion, it is.

  math.pow(2,2) and 2**2 does not do the same. That is how Python behaves.

 Yes, because the functions in the math module are explicitly thin
 wrappers around floating-point C library functions and don't have any
 particular relationship to the special methods on int objects. numpy
 does have largely 1:1 relationship between its ndarray operator
 special methods and the ufuncs that implement them. I believe this is
 a useful relationship for learning the API and predicting what a given
 expression is going to do.

 --
 Robert Kern
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Proposal to make power return float, and other such things.

2014-02-18 Thread Nathaniel Smith

On 18 Feb 2014 07:07, Robert Kern robert.k...@gmail.com wrote:

 On Tue, Feb 18, 2014 at 12:00 PM, Nathaniel Smith n...@pobox.com wrote:
  Perhaps integer power should raise an error on negative powers? That way
  people will at least be directed to use arr ** -1.0 instead of silently
  getting nonsense from arr ** -1.

 Controllable by np.seterr(invalid=...)? I could get behind that.

I'm not sure the np.seterr part would work or be a good idea, given that we
have no way to return or propagate NaN... I vote for just an unconditional
error.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] allocated memory cache for numpy

2014-02-18 Thread Nathaniel Smith

On 18 Feb 2014 10:21, Julian Taylor jtaylor.deb...@googlemail.com wrote:

 On Mon, Feb 17, 2014 at 9:42 PM, Nathaniel Smith n...@pobox.com wrote:
  On 17 Feb 2014 15:17, Sturla Molden sturla.mol...@gmail.com wrote:
 
  Julian Taylor jtaylor.deb...@googlemail.com wrote:
 
   When an array is created it tries to get its memory from the cache
and
   when its deallocated it returns it to the cache.
 
 ...
 
  Another optimization we should consider that might help a lot in the
same
  situations where this would help: for code called from the cpython eval
  loop, it's afaict possible to determine which inputs are temporaries by
  checking their refcnt. In the second call to __add__ in '(a + b) + c',
the
  temporary will have refcnt 1, while the other arrays will all have
refcnt
 1. In such cases (subject to various sanity checks on shape, dtype,
etc) we
  could elide temporaries by reusing the input array for the output. The
risk
  is that there may be some code out there that calls these operations
  directly from C with non-temp arrays that nonetheless have refcnt 1,
but we
  should at least investigate the feasibility. E.g. maybe we can do the
  optimization for tp_add but not PyArray_Add.
 

 this seems to be a really good idea, I experimented a bit and it
 solves the temporary problem for this types of arithmetic nicely.
 Its simple to implement, just change to inplace in
 array_{add,sub,mul,div} handlers for the python slots. Doing so does
 not fail numpy, scipy and pandas testsuite so it seems save.
 Performance wise, besides the simple page zeroing limited benchmarks
 (a+b+c), it also it brings the laplace out of place benchmark to the
 same speed as the inplace benchmark [0]. This is very nice as the
 inplace variant is significantly harder to read.

Sweet.

 Does anyone see any issue we might be overlooking in this refcount ==
 1 optimization for the python api?
 I'll post a PR with the change shortly.

It occurs belatedly that Cython code like
  a = np.arange(10)
  b = np.arange(10)
  c = a + b
might end up calling tp_add with refcnt 1 arrays. Ditto for same with cdef
np.ndarray or cdef object added. We should check...

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New (old) function proposal.

2014-02-18 Thread Nathaniel Smith

On 18 Feb 2014 11:05, Charles R Harris charlesr.har...@gmail.com wrote:

 Hi All,

 There is an old ticket, #1499, that suggest adding a segment_axis
function.

 def segment_axis(a, length, overlap=0, axis=None, end='cut', endvalue=0):
 Generate a new array that chops the given array along the given
axis
 into overlapping frames.

 Parameters
 --
 a : array-like
 The array to segment
 length : int
 The length of each frame
 overlap : int, optional
 The number of array elements by which the frames should overlap
 axis : int, optional
 The axis to operate on; if None, act on the flattened array
 end : {'cut', 'wrap', 'end'}, optional
 What to do with the last frame, if the array is not evenly
 divisible into pieces.

 - 'cut'   Simply discard the extra values
 - 'wrap'  Copy values from the beginning of the array
 - 'pad'   Pad with a constant value

 endvalue : object
 The value to use for end='pad'


 Examples
 
  segment_axis(arange(10), 4, 2)
 array([[0, 1, 2, 3],
[2, 3, 4, 5],
[4, 5, 6, 7],
[6, 7, 8, 9]])


 Is there and interest in having this function available?

I'd use it, though haven't looked at the details of this api per set yet.

rolling_window or shingle are better names.

It should probably be documented and implemented to return a view when
possible (using stride tricks). Along with a note that whether this is
possible depends heavily on 32- vs. 64-bitness.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] New (old) function proposal.

2014-02-18 Thread Nathaniel Smith

On 18 Feb 2014 12:04, Charles R Harris charlesr.har...@gmail.com wrote:
 Where does 'shingle' come from. I can see the analogy but haven't seen
that as a technical term.

It just seems like a good name :-).

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Rethinking multiple dimensional indexing with sequences?

2014-02-18 Thread Nathaniel Smith

So to be clear - what's being suggested is that code like this will be
deprecated in 1.9, and then in some future release break:

slices = []
for i in ...:
slices.append(make_slice(...))
subarray = arr[slices]

Instead, you will have to do:

subarray = arr[tuple(slices)]

And the reason is that when we allow multi-dimensional indexes to be
passed as lists instead of a tuple, numpy has no reliable way to tell
what to do with something like

   arr[[0, 1]]

Maybe it means

   arr[0, 1]

Or maybe it means

   arr[np.asarray([0, 1])]

Who knows? Right now we have some heuristics to guess based on what
exact index objects are in there, but really making a guess at all is
a pretty broken approach, and will be getting more broken as more
non-ndarray array-like types come into common use -- in particular,
the way things are right now, arr[pandas_series] will soon be (or is
already) triggering this same guessing logic.

So, any objections to requiring tuples here?

-n

On Tue, Feb 18, 2014 at 11:09 AM, Sebastian Berg
sebast...@sipsolutions.net wrote:
 Hey all,

 currently in numpy this is possible:

 a = np.zeros((5, 5))
 a[[0, slice(None, None)]]
 #this behaviour has its quirks, since the correct way is:
 a[(0, slice(None, None))] # or identically a[0, :]

 The problem with using an arbitrary sequence is, that an arbitrary
 sequence is also typically an array like so there is a lot of guessing
 involved:

 a[[0, slice(None, None)]]  == a[(0, slice(None, None))]
 # but:
 a[[0, 1]] == a[np.array([0, 1])]

 Now also NumPy commonly uses lists here to build up indexing tuples
 (since they are mutable), however would it really be so bad if we had to
 do `arr[tuple(slice_list)]` in the end to resolve this issue? So the
 proposal would be to deprecate anything but (base class) tuples, or
 maybe at least only allow this weird logic for lists and not all
 sequences. I do not believe we can find a logic to decide what to do
 which will not be broken in some way...

 PS: The code implementing the advanced index or nd-index logic is
 here:
 https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/mapping.c#L196

 - Sebastian


 Another confusing example:

 In [9]: a = np.arange(10)

 In [10]: a[[(0, 1), (2, 3)] * 17] # a[np.array([(0, 1), (2, 3)] * 17)]
 Out[10]:
 array([[0, 1],
   snip
[2, 3]])

 In [11]: a[[(0, 1), (2, 3)]] # a[np.array([0, 1]), np.array([2, 3])]
 ---
 IndexErrorTraceback (most recent call
 last)
 ipython-input-11-57b93f64dfa6 in module()
  1 a[[(0, 1), (2, 3)]]

 IndexError: too many indices for array


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] allocated memory cache for numpy

2014-02-17 Thread Nathaniel Smith

On 17 Feb 2014 15:17, Sturla Molden sturla.mol...@gmail.com wrote:

 Julian Taylor jtaylor.deb...@googlemail.com wrote:

  When an array is created it tries to get its memory from the cache and
  when its deallocated it returns it to the cache.

 Good idea, however there is already a C function that does this. It uses a
 heap to keep the cached memory blocks sorted according to size. You know
it
 as malloc — and is why we call this allocation from the heap. Which by the
 way is what NumPy already does.  ;-)

Common malloc implementations are not well optimized for programs that have
frequent, short-lived, large-sized allocations. Usually they optimize for
small short-lived allocations of of small sizes. It's totally plausible
that we could do a better job in the common case of array operations like
'a + b + c + d' that allocate and free a bunch of same-sized temporary
arrays as they go. (Right now, if those arrays are large, that expression
will always generate multiple mmap/munmap calls.) The question is to what
extent numpy programs are bottlenecked by such allocations.

Also, I'd be pretty wary of caching large chunks of unused memory. People
already have a lot of trouble understanding their program's memory usage,
and getting rid of 'greedy free' will make this even worse.

Another optimization we should consider that might help a lot in the same
situations where this would help: for code called from the cpython eval
loop, it's afaict possible to determine which inputs are temporaries by
checking their refcnt. In the second call to __add__ in '(a + b) + c', the
temporary will have refcnt 1, while the other arrays will all have refcnt
1. In such cases (subject to various sanity checks on shape, dtype, etc)
we could elide temporaries by reusing the input array for the output. The
risk is that there may be some code out there that calls these operations
directly from C with non-temp arrays that nonetheless have refcnt 1, but we
should at least investigate the feasibility. E.g. maybe we can do the
optimization for tp_add but not PyArray_Add.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] allocated memory cache for numpy

2014-02-17 Thread Nathaniel Smith

On Mon, Feb 17, 2014 at 3:55 PM, Stefan Seefeld ste...@seefeld.name wrote:
 On 02/17/2014 03:42 PM, Nathaniel Smith wrote:
 Another optimization we should consider that might help a lot in the
 same situations where this would help: for code called from the
 cpython eval loop, it's afaict possible to determine which inputs are
 temporaries by checking their refcnt. In the second call to __add__ in
 '(a + b) + c', the temporary will have refcnt 1, while the other
 arrays will all have refcnt 1. In such cases (subject to various
 sanity checks on shape, dtype, etc) we could elide temporaries by
 reusing the input array for the output. The risk is that there may be
 some code out there that calls these operations directly from C with
 non-temp arrays that nonetheless have refcnt 1, but we should at least
 investigate the feasibility. E.g. maybe we can do the optimization for
 tp_add but not PyArray_Add.

 For element-wise operations such as the above, wouldn't it be even
 better to use loop fusion, by evaluating the entire compound expression
 per element, instead of each individual operation ? That would require
 methods such as __add__ to return an operation object, rather than the
 result value. I believe a technique like that is used in the numexpr
 package (https://github.com/pydata/numexpr), which I saw announced here
 recently...

Hi Stefan (long time no see!),

Sure, that would be an excellent thing, but adding a delayed
evaluation engine to numpy is a big piece of new code, and we'd want
to make it something you opt-in to explicitly. (There are too many
weird potential issues with e.g. errors showing up at some far away
place from the actual broken code, due to evaluation being delayed to
there.) By contrast, the optimization suggested here is a tiny change
we could do now, and would still be useful even in the hypothetical
future where we do have lazy evaluation, for anyone who doesn't use
it.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] deprecate numpy.matrix

2014-02-10 Thread Nathaniel Smith

On Sun, Feb 9, 2014 at 4:59 PM, alex argri...@ncsu.edu wrote:
 Hello list,

 I wrote this mini-nep for numpy but I've been advised it is more
 appropriate for discussion on the list.

 
 The ``numpy.matrix`` API provides a low barrier to using Python
 for linear algebra, just as the pre-3 Python ``input`` function
 and ``print`` statement provided low barriers to using Python for
 automatically evaluating input and for printing output.

 On the other hand, it really needs to be deprecated.
 Let's deprecate ``numpy.matrix``.
 

 I understand that numpy.matrix will not be deprecated any time soon,
 but I hope this will register as a vote to help nudge its deprecation
 closer to the realm of acceptable discussion.

To make this more productive, maybe it would be useful to elaborate on
what exactly we should do here.

I can't imagine we'll actually remove 'matrix' from the numpy
namespace at any point in the near future.

I do have the sense that when people choose to use it, they eventually
come to regret this choice. It's a bit buggy and has confusing
behaviours, and due to limitations of numpy's subclassing model, will
probably always be buggy and have confusing behaviours. And it's
marketed as being for new users, who are exactly the kind of users who
aren't sophisticated enough to recognize these dangers.

Maybe there should be a big warning to this effect in the np.matrix docstring?

Maybe using np.matrix should raise a DeprecationWarning?
(DeprecationWarning doesn't have to mean that something will be
disappearing -- e.g. the Python stdlib deprecates stuff all the time,
but never actually removes it. It's just a warning flag that there are
better options available.)

Or what?

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] deprecate numpy.matrix

2014-02-10 Thread Nathaniel Smith

On Mon, Feb 10, 2014 at 11:16 AM, Alexander Belopolsky ndar...@mac.com wrote:

 On Sun, Feb 9, 2014 at 4:59 PM, alex argri...@ncsu.edu wrote:

 On the other hand, it really needs to be deprecated.


 While numpy.matrix may have its problems, a NEP should list a better
 rationale than the above to gain acceptance.

 Personally, I decided not to use numpy.matrix in production code about 10
 years ago and never looked back to that decision.  I've heard however that
 some of the worst inheritance warts have been fixed over the years.  I also
 resisted introducing inheritance  in the implementation of masked arrays,
 but I lost that argument.  For better or worse, inheritance from ndarray is
 here to stay and I would rather see numpy.matrix stay as a test-bed for
 fixing inheritance issues rather than see it deprecated and have the same
 issues pop up in ma or elsewhere.

In practice, the existence of np.matrix doesn't seem to have any
affect on whether inheritance issues get fixed. And in the long run, I
think the goal is to move people away from inheriting from np.ndarray.
Really the only good reason to inherit from np.ndarray right now, is
if there's something you want to do that is impossible without using
inheritance. But we're working on fixing those issues (e.g.,
__numpy_ufunc__ in the next release). And AFAICT most of the remaining
issues with inheritance simply cannot be fixed, because the
requirements are ill-defined and contradictory.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] deprecate numpy.matrix

2014-02-10 Thread Nathaniel Smith

On Mon, Feb 10, 2014 at 12:02 PM, Matthieu Brucher
matthieu.bruc...@gmail.com wrote:
 Yes, but these will be scipy.sparse matrices, nothing to do with numpy
 (dense) matrices.

Unfortunately when scipy.sparse matrices interact with dense ndarrays
(e.g., sparse matrix * dense vector), then you always get back
np.matrix objects instead of np.ndarray objects. So it's impossible to
avoid np.matrix entirely if using scipy.sparse.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Memory leak?

2014-01-31 Thread Nathaniel Smith

On Fri, Jan 31, 2014 at 3:14 PM, Chris Laumann chris.laum...@gmail.com wrote:

 Current scipy superpack for osx so probably pretty close to master.

What does numpy.__version__ say?

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Memory leak?

2014-01-31 Thread Nathaniel Smith

On Fri, Jan 31, 2014 at 4:29 PM, Benjamin Root ben.r...@ou.edu wrote:
 Just to chime in here about the SciPy Superpack... this distribution tracks
 the master branch of many projects, and then puts out releases, on the
 assumption that master contains pristine code, I guess. I have gone down
 strange rabbit holes thinking that a particular bug was fixed already and
 the user telling me a version number that would confirm that, only to
 discover that the superpack actually packaged matplotlib about a month prior
 to releasing a version.

 I will not comment on how good or bad of an idea it is for the Superpack to
 do that, but I just wanted to make other developers aware of this to keep
 them from falling down the same rabbit hole.

Wow, that is good to know. Esp. since the web page:
  http://fonnesbeck.github.io/ScipySuperpack/
simply advertises that it gives you things like numpy 1.9 and scipy
0.14, which don't exist. (With some note about dev versions buried in
prose a few sentences later.)

Empirically, development versions of numpy have always contained bugs,
regressions, and compatibility breaks that were fixed in the released
version; and we make absolutely no guarantees about compatibility
between dev versions and any release versions. And it sort of has to
be that way for us to be able to make progress. But if too many people
start using dev versions for daily use, then we and downstream
dependencies will have to start adding compatibility hacks and stuff
to support those dev versions. Which would be a nightmare for
developers and users both.

Recommending this build for daily use by non-developers strikes me as
dangerous for both users and the wider ecosystem.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Memory leak in numpy?

2014-01-29 Thread Nathaniel Smith

On Wed, Jan 29, 2014 at 7:39 PM, Joseph McGlinchy jmcglin...@esri.com wrote:
 Upon further investigation, I do believe it is within the scipy code where
 there is a leak. I commented out my call to processBinaryImage(), which is
 all scipy code calls, and my memory usage remains flat with approximately a
 1MB variation. Any ideas?

I'd suggest continuing along this line, and keep chopping things out
until you have a minimal program that still shows the problem --
that'll probably make it much clearer where the problem is actually
coming from...

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Nathaniel Smith

There is no reliable way to predict how much memory an arbitrary numpy
operation will need, no. However, in most cases the main memory cost will
be simply the need to store the input and output arrays; for large arrays,
all other allocations should be negligible.

The most effective way to avoid running out of memory, therefore, is to
avoid creating temporary arrays, by using only in-place operations.

E.g., if a and b each require N bytes of ram, then memory requirements
(roughly).

c = a + b: 3N
c = a + 2*b: 4N
a += b: 2N
np.add(a, b, out=a): 2N
b *= 2; a += b: 2N

Note that simply loading a and b requires 2N memory, so the latter code
samples are near-optimal.

Of course some calculations do require the use of temporary storage space...

-n
On 24 Jan 2014 15:19, Dinesh Vadhia dineshbvad...@hotmail.com wrote:

  I want to write a general exception handler to warn if too much data is
 being loaded for the ram size in a machine for a successful numpy array
 operation to take place.  For example, the program multiplies two floating
 point arrays A and B which are populated with loadtext.  While the data is
 being loaded, want to continuously check that the data volume doesn't pass
 a threshold that will cause on out-of-memory error during the A*B
 operation.  The known variables are the amount of memory available, data
 type (floats in this case) and the numpy array operation to be performed.
 It seems this requires knowledge of the internal memory requirements of
 each numpy operation.  For sake of simplicity, can ignore other memory
 needs of program.  Is this possible?


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Nathaniel Smith

On 24 Jan 2014 15:57, Chris Barker - NOAA Federal chris.bar...@noaa.gov
wrote:


 c = a + b: 3N
 c = a + 2*b: 4N

 Does python garbage collect mid-expression? I.e. :

 C = (a + 2*b) + b

 4 or 5 N?

It should be collected as soon as the reference gets dropped, so 4N. (This
is the advantage of a greedy refcounting collector.)

 Also note that when memory gets tight, fragmentation can be a problem.
I.e. if two size-n arrays where just freed, you still may not be able to
allocate a size-2n array. This seems to be worse on windows, not sure why.

If your arrays are big enough that you're worried that making a stray copy
will ENOMEM, then you *shouldn't* have to worry about fragmentation -
malloc will give each array its own virtual mapping, which can be backed by
discontinuous physical memory. (I guess it's possible windows has a somehow
shoddy VM system and this isn't true, but that seems unlikely these days?)

Memory fragmentation is more a problem if you're allocating lots of small
objects of varying sizes.

On 32 bit, virtual address fragmentation could also be a problem, but if
you're working with giant data sets then you need 64 bits anyway :-).

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Nathaniel Smith

Yes.
On 24 Jan 2014 17:19, Dinesh Vadhia dineshbvad...@hotmail.com wrote:

  So, with the example case, the approximate memory cost for an in-place
 operation would be:

 A *= B : 2N

 But, if the original A or B is to remain unchanged then it will be:

 C = A * B : 3N ?



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Catching out-of-memory error before it happens

2014-01-24 Thread Nathaniel Smith

On Fri, Jan 24, 2014 at 10:29 PM, Chris Barker chris.bar...@noaa.gov wrote:
 On Fri, Jan 24, 2014 at 8:25 AM, Nathaniel Smith n...@pobox.com wrote:

 If your arrays are big enough that you're worried that making a stray copy
 will ENOMEM, then you *shouldn't* have to worry about fragmentation - malloc
 will give each array its own virtual mapping, which can be backed by
 discontinuous physical memory. (I guess it's possible windows has a somehow
 shoddy VM system and this isn't true, but that seems unlikely these days?)

 All I know is that when I push the limits with memory on a 32 bit Windows
 system, it often crashed out when I've never seen more than about 1GB of
 memory use by the application -- I would have thought that would be plenty
 of overhead.

 I also know that I've reached limits onWindows32 well before OS_X 32, but
 that may be because IIUC, Windows32 only allows 2GB per process, whereas
 OS-X32 allows 4GB per process.

 Memory fragmentation is more a problem if you're allocating lots of small
 objects of varying sizes.

 It could be that's what I've been doing

 On 32 bit, virtual address fragmentation could also be a problem, but if
 you're working with giant data sets then you need 64 bits anyway :-).

 well, giant is defined relative to the system capabilities... but yes, if
 you're  pushing the limits of a 32 bit system , the easiest thing to do is
 go to 64bits and some more memory!

Oh, yeah, common confusion. Allowing 2 GiB of address space per
process doesn't mean you can actually practically use 2 GiB of
*memory* per process, esp. if you're allocating/deallocating a mix of
large and small objects, because address space fragmentation will kill
you way before that. The memory is there, there isn't anywhere to slot
it into the process's address space. So you don't need to add more
memory, just switch to a 64-bit OS.

On 64-bit you have oodles of address space, so the memory manager can
easily slot in large objects far away from small objects, and it's
only fragmentation within each small-object arena that hurts. A good
malloc will keep this overhead down pretty low though -- certainly
less than the factor of two you're thinking about.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Comparison changes

2014-01-24 Thread Nathaniel Smith

On 25 Jan 2014 00:05, Sebastian Berg sebast...@sipsolutions.net wrote:

 Hi all,

 in https://github.com/numpy/numpy/pull/3514 I proposed some changes to
 the comparison operators. This includes:

 1. Comparison with None will broadcast in the future, so that `arr ==
 None` will actually compare all elements to None. (A FutureWarning for
 now)

 2. I added that == and != will give FutureWarning when an error was
 raised. In the future they should not silence these errors anymore. (For
 example shape mismatches)

This can just be a DeprecationWarning, because the only change is to raise
new more errors.

 3. We used to use PyObject_RichCompareBool for equality which includes
 an identity check. I propose to not do that identity check since we have
 elementwise equality (returning an object array for objects would be
 nice in some ways, but I think that is only an option for a dedicated
 function). The reason is that for example

  a = np.array([np.array([1, 2, 3]), 1])
  b = np.array([np.array([1, 2, 3]), 1])
  a == b

 will happen to work if it happens to be that `a[0] is b[0]`. This
 currently has no deprecation, since the logic is in the inner loop and I
 am not sure if it is easy to add well there.

Surely any environment where we can call PyObject_RichCompareBool is an
environment where we can issue a warning...?

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] IRR

2014-01-23 Thread Nathaniel Smith

Hey all,

We have a PR languishing that fixes np.irr to handle negative rate-of-returns:
  https://github.com/numpy/numpy/pull/4210
I don't even know what IRR stands for, and it seems rather confusing
from the discussion there. Anyone who knows something about the issues
is invited to speak up...

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] A one-byte string dtype?

2014-01-21 Thread Nathaniel Smith

On 21 Jan 2014 11:13, Oscar Benjamin oscar.j.benja...@gmail.com wrote:
 If the Numpy array would manage the buffers itself then that per string
memory
 overhead would be eliminated in exchange for an 8 byte pointer and at
least 1
 byte to represent the length of the string (assuming you can somehow use
 Pascal strings when short enough - null bytes cannot be used). This gives
an
 overhead of 9 bytes per string (or 5 on 32 bit). In this case you save
memory
 if the strings are more than 3 characters long and you get at least a 50%
 saving for strings longer than 9 characters.

There are various optimisations possible as well.

For ASCII strings of up to length 8, one could also use tagged pointers to
eliminate the lookaside buffer entirely. (Alignment rules mean that
pointers to allocated buffers always have the low bits zero; so you can
make a rule that if the low bit is set to one, then this means the
pointer itself should be interpreted as containing the string data; use
the spare bit in the other bytes to encode the length.)

In some cases it may also make sense to let identical strings share
buffers, though this adds some overhead for reference counting and
interning.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] A one-byte string dtype?

2014-01-21 Thread Nathaniel Smith

On 21 Jan 2014 17:28, David Goldsmith d.l.goldsm...@gmail.com wrote:


 Am I the only one who feels that this (very important--I'm being sincere,
not sarcastic) thread has matured and specialized enough to warrant it's
own home on the Wiki?

Sounds plausible, perhaps you could write up such a page?

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] A one-byte string dtype?

2014-01-20 Thread Nathaniel Smith

On Mon, Jan 20, 2014 at 10:28 PM, Charles R Harris
charlesr.har...@gmail.com wrote:



 On Mon, Jan 20, 2014 at 2:27 PM, Oscar Benjamin oscar.j.benja...@gmail.com
 wrote:


 On Jan 20, 2014 8:35 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:
 
  I think we may want something like PEP 393. The S datatype may be the
  wrong place to look, we might want a modification of U instead so as to
  transparently get the benefit of python strings.

 The approach taken in PEP 393 (the FSR) makes more sense for str than it
 does for numpy arrays for two reasons: str is immutable and opaque.

 Since str is immutable the maximum code point in the string can be
 determined once when the string is created before anything else can get a
 pointer to the string buffer.

 Since it is opaque no one can rightly expect it to expose a particular
 binary format so it is free to choose without compromising any expected
 semantics.

 If someone can call buffer on an array then the FSR is a semantic change.

 If a numpy 'U' array used the FSR and consisted only of ASCII characters
 then it would have a one byte per char buffer. What then happens if you put
 a higher code point in? The buffer needs to be resized and the data copied
 over. But then what happens to any buffer objects or array views? They would
 be pointing at the old buffer from before the resize. Subsequent
 modifications to the resized array would not show up in other views and vice
 versa.

 I don't think that this can be done transparently since users of a numpy
 array need to know about the binary representation. That's why I suggest a
 dtype that has an encoding. Only in that way can it consistently have both a
 binary and a text interface.


 I didn't say we should change the S type, but that we should have something,
 say 's', that appeared to python as a string. I think if we want transparent
 string interoperability with python together with a compressed
 representation, and I think we need both, we are going to have to deal with
 the difficulties of utf-8. That means raising errors if the string doesn't
 fit in the allotted size, etc. Mind, this is a workaround for the mass of
 ascii data that is already out there, not a substitute for 'U'.

If we're going to be taking that much trouble, I'd suggest going ahead
and adding a variable-length string type (where the array itself
contains a pointer to a lookaside buffer, maybe with an optimization
for stashing short strings directly). The fixed-length requirement is
pretty onerous for lots of applications (e.g., pandas always uses
dtype=O for strings -- and that might be a good workaround for some
people in this thread for now). The use of a lookaside buffer would
also make it practical to resize the buffer when the maximum code
point changed, for that matter...

Though, IMO any new dtype here would need a cleanup of the dtype code
first so that it doesn't require yet more massive special cases all
over umath.so.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Memory allocation cleanup

2014-01-10 Thread Nathaniel Smith

On Fri, Jan 10, 2014 at 9:18 AM, Julian Taylor
jtaylor.deb...@googlemail.com wrote:
 On Fri, Jan 10, 2014 at 3:48 AM, Nathaniel Smith n...@pobox.com wrote:

 Also, none of the Py* interfaces implement calloc(), which is annoying
 because it messes up our new optimization of using calloc() for
 np.zeros. [...]


 Another thing that is not directly implemented in Python is aligned
 allocation. This is going to get increasingly important with the advent
 heavily vectorized x86 CPUs (e.g. AVX512 is rolling out now) and the C
 malloc being optimized for the oldish SSE (16 bytes). I want to change the
 array buffer allocation to make use of posix_memalign and C11 aligned_malloc
 if available to avoid some penalties when loading from non 32 byte aligned
 buffers. I could imagine it might also help coprocessors and gpus to have
 higher alignments, but I'm not very familiar with that type of hardware.
 The allocator used by the Python3.4 is plugable, so we could implement our
 special allocators with the new API, but only when 3.4 is more widespread.

 For this reason and missing calloc I don't think we should use the Python
 API for data buffers just yet. Any benefits are relatively small anyway.

It really would be nice if our data allocations would all be visible
to the tracemalloc library though, somehow. And I doubt we want to
patch *all* Python allocations to go through posix_memalign, both
because this is rather intrusive and because it would break python -X
tracemalloc.

How certain are we that we want to switch to aligned allocators in the
future? If we don't, then maybe it makes to ask python-dev for a
calloc interface; but if we do, then I doubt we can convince them to
add aligned allocation interfaces, and we'll need to ask for something
else (maybe a null allocator, which just notifies the python memory
tracking machinery that we allocated something ourselves?).

It's not obvious to me why aligning data buffers is useful - can you
elaborate? There's no code simplification, because we always have to
handle the unaligned case anyway with the standard unaligned
startup/cleanup loops. And intuitively, given the existence of such
loops, alignment shouldn't matter much in practice, since the most
that shifting alignment can do is change the number of elements that
need to be handled by such loops by (SIMD alignment value / element
size). For doubles, in a buffer that has 16 byte alignment but not 32
byte alignment, this means that worst case, we end up doing 4
unnecessary non-SIMD operations. And surely that only matters for very
small arrays (for large arrays such constant overhead will amortize
out), but for small arrays SIMD doesn't help much anyway? Probably I'm
missing something, because you actually know something about SIMD and
I'm just hand-waving from first principles :-). But it'd be nice to
understand the reasoning for why/whether alignment really helps in the
numpy context.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] adding fused multiply and add to numpy

2014-01-09 Thread Nathaniel Smith

On Thu, Jan 9, 2014 at 3:30 PM, Julian Taylor
jtaylor.deb...@googlemail.com wrote:
 On Thu, Jan 9, 2014 at 3:50 PM, Frédéric Bastien no...@nouiz.org wrote:
 How hard would it be to provide the choise to the user? We could
 provide 2 functions like: fma_fast() fma_prec() (for precision)? Or
 this could be a parameter or a user configuration option like for the
 overflow/underflow error.

 I like Freddie Witherden proposal to name the function madd which does not
 guarantee one rounding operation. This leaves the namespace open for a
 special fma function with that guarantee. It can use the libc fma function
 which is very slow sometimes but platform independent. This is assuming
 apple did not again take shortcuts like they did with their libc hypot
 implementation, can someone disassemble apple libc to check what they are
 doing for C99 fma?
 And it leaves users the possibility to use the faster madd function if they
 do not need the precision guarantee.

If madd doesn't provide any rounding guarantees, then its only reason
for existence is that it provides a fused a*b+c loop that better
utilizes memory bandwidth, right? I'm guessing that speed-wise it
doesn't really matter whether you use the fancy AVX instructions or
not, since even the naive implementation is memory bound -- the
advantage is just in the fusion?

Lack of loop fusion is obviously a major limitation of numpy, but it's
a very general problem. I'm sceptical about whether we want to get
into the business of adding functions whose only purpose is to provide
pre-fused loops. After madd, what other operations should we provide
like this? msub (a*b-c)? add3 (a+b+c)? maddm (a*b+c*d)? mult3 (a*b*c)?
How do we decide? Surely it's better to direct people who are hitting
memory bottlenecks to much more powerful and general solutions to this
problem, like numexpr/cython/numba/theano?

(OTOH the verison that gives rounding guarantees is obviously a unique
new feature.)

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Memory allocation cleanup

2014-01-09 Thread Nathaniel Smith

On Thu, Jan 9, 2014 at 11:21 PM, Charles R Harris
charlesr.har...@gmail.com wrote:
 Apropos Julian's changes to use the PyObject_* allocation suite for some
 parts of numpy, I posted the following

 I think numpy memory management is due a cleanup. Currently we have

 PyDataMem_*
 PyDimMem_*
 PyArray_*

 Plus the malloc, PyMem_*, and PyObject_* interfaces. That is six ways to
 manage heap allocations. As far as I can tell, PyArray_* is always PyMem_*
 in practice. We probably need to keep the PyDataMem family as it has a
 memory tracking option, but PyDimMem just confuses things, I'd rather just
 use PyMem_* with explicit size. Curiously, the PyObject_Malloc family is not
 documented apart from some release notes.

 We should also check for the macro versions of PyMem_* as they are
 deprecated for extension modules.

 Nathaniel then suggested that we consider going all Python allocators,
 especially as new memory tracing tools are coming online in 3.4. Given that
 these changes could have some impact on current extension writers I thought
 I'd bring this up on the list to gather opinions.

 Thoughts?

After a bit more research, some further points to keep in mind:

Currently, PyDimMem_* and PyArray_* are just aliases for malloc/free,
and PyDataMem_* is an alias for malloc/free with some extra tracing
hooks wrapped around it. (AFAIK, these tracing hooks are not used by
anyone anywhere -- at least, if they are I haven't heard about it, and
there is no code on github that uses them.)

There is one substantial difference between the PyMem_* and PyObject_*
interfaces as compared to malloc(), which is that the Py* interfaces
require that the GIL be held when they are called. (@Julian -- I think
your PR we just merged fulfills this requirement, is that right?) I
strongly suspect that we have PyDataMem_* calls outside of the GIL --
e.g., when allocating ufunc buffers -- and third-party code might as
well.

Python 3.4's new memory allocation API and tracing stuff is documented here:
  http://www.python.org/dev/peps/pep-0445/
  http://docs.python.org/dev/c-api/memory.html
  http://docs.python.org/dev/library/tracemalloc.html

In particular, 3.4 adds a set of PyRawMem_* functions, which do not
require the GIL. Checking the current source code for _tracemalloc.c,
it appears that PyRawMem_* functions *are* traced, so that's nice -
that means that switching PyDataMem_* to use PyRawMem_* would be both
safe and provide benefits. However, PyRawMem_* does not provide the
pymalloc optimizations for small allocations.

Also, none of the Py* interfaces implement calloc(), which is annoying
because it messes up our new optimization of using calloc() for
np.zeros. (calloc() is generally faster than malloc()+explicit
zeroing, because it can use OS-specific virtual memory tricks to zero
out the memory for free. These same tricks also mean that if you use
np.zeros() to allocate a large array, and then only write to a few
entries in that array, the total memory used is proportional to the
number of non-zero entries, rather than to the actual size of the
array, which can be extremely useful in some situations as a kind of
poor man's sparse array.)

I'm pretty sure that the vast majority of our allocations do occur
with GIL protection, so we might want to switch to using PyObject_*
for most cases to take advantage of the small-object optimizations,
and use PyRawMem_* for any non-GIL cases (like possibly ufunc
buffers), with a compatibility wrapper to replace PyRawMem_* with
malloc() on pre-3.4 pythons. Of course this will need some profiling
to see if PyObject_* is actually better than malloc() in practice. For
calloc(), we could try and convince python-dev to add this, or
np.zeros() could explicitly use calloc() even when other code uses Py*
interface and then uses an ndarray flag or special .base object to
keep track of the fact that we need to use free() to deallocate this
memory, or we could give up on the calloc optimization.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Speedup by avoiding memory alloc twice in scalar array

2014-01-08 Thread Nathaniel Smith

On Wed, Jan 8, 2014 at 12:13 PM, Julian Taylor
jtaylor.deb...@googlemail.com wrote:
 On 18.07.2013 15:36, Nathaniel Smith wrote:
 On Wed, Jul 17, 2013 at 5:57 PM, Frédéric Bastien no...@nouiz.org wrote:
 On the usefulness of doing only 1 memory allocation, on our old gpu ndarray,
 we where doing 2 alloc on the GPU, one for metadata and one for data. I
 removed this, as this was a bottleneck. allocation on the CPU are faster the
 on the GPU, but this is still something that is slow except if you reuse
 memory. Do PyMem_Malloc, reuse previous small allocation?

 Yes, at least in theory PyMem_Malloc is highly-optimized for small
 buffer re-use. (For requests 256 bytes it just calls malloc().) And
 it's possible to define type-specific freelists; not sure if there's
 any value in doing that for PyArrayObjects. See Objects/obmalloc.c in
 the Python source tree.

 PyMem_Malloc is just a wrapper around malloc, so its only as optimized
 as the c library is (glibc is not good for small allocations).
 PyObject_Malloc uses a small object allocator for requests smaller 512
 bytes (256 in python2).

Right, I meant PyObject_Malloc of course.

 I filed a pull request [0] replacing a few functions which I think are
 safe to convert to this API. The nditer allocation which is completely
 encapsulated and the construction of the scalar and array python objects
 which are deleted via the tp_free slot (we really should not support
 third party libraries using PyMem_Free on python objects without checks).

 This already gives up to 15% improvements for scalar operations compared
 to glibc 2.17 malloc.
 Do I understand the discussions here right that we could replace
 PyDimMem_NEW  which is used for strides in PyArray with the small object
 allocation too?
 It would still allow swapping the stride buffer, but every application
 must then delete it with PyDimMem_FREE which should be a reasonable
 requirement.

That sounds reasonable to me.

If we wanted to get even more elaborate, we could by default stick the
shape/strides into the same allocation as the PyArrayObject, and then
defer allocating a separate buffer until someone actually calls
PyArray_Resize. (With a new flag, similar to OWNDATA, that tells us
whether we need to free the shape/stride buffer when deallocating the
array.) It's got to be a vanishingly small proportion of arrays where
PyArray_Resize is actually called, so for most arrays, this would let
us skip the allocation entirely, and the only cost would be that for
arrays where PyArray_Resize *is* called to add new dimensions, we'd
leave the original buffers sitting around until the array was freed,
wasting a tiny amount of memory. Given that no-one has noticed that
currently *every* array wastes 50% of this much memory (see upthread),
I doubt anyone will care...

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Altering/initializing NumPy array in C

2014-01-01 Thread Nathaniel Smith

On 1 Jan 2014 13:57, Bart Baker bart...@gmail.com wrote:

 Hello,

 I'm having issues with performing operations on an array in C and
 passing it back to Python. The array values seem to become unitialized
 upon being passed back to Python. My first attempt involved initializing
 the array in C as so:

 double a_fin[max_mth];

 where max_mth is an int.

You're stack-allocating your array, so the memory is getting recycled for
other uses as soon as your C function returns. You should malloc it instead
(but you don't have to worry about free'ing it, numpy will do that when the
array object is deconstructed). Any C reference will fill you in on the
details of stack versus malloc allocation.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Deprecate boolean math operators?

2013-12-06 Thread Nathaniel Smith

On Thu, Dec 5, 2013 at 7:33 PM,  josef.p...@gmail.com wrote:
 On Thu, Dec 5, 2013 at 5:37 PM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
 Hey,

 there was a discussion that for numpy booleans math operators +,-,* (and
 the unary -), while defined, are not very helpful. I have set up a quick
 PR with start (needs some fixes inside numpy still):

 https://github.com/numpy/numpy/pull/4105

 The idea is to deprecate these, since the binary operators |,^,| (and
 the unary ~ even if it is weird) behave identical. This would not affect
 sums of boolean arrays. For the moment I saw one annoying change in
 numpy, and that is `abs(x - y)` being used for allclose and working
 nicely currently.

 I like mask = mask1 * mask2

 That's what I learned working my way through scipy.stats.distributions
 a long time ago.

* is least problematic case, since there numpy and python bools
already almost agree. (They return the same values, but numpy returns
a bool array instead of an integer array.)

On Thu, Dec 5, 2013 at 8:05 PM, Alan G Isaac alan.is...@gmail.com wrote:
 For + and * (and thus `dot`), this will fix something that is not broken.
 It is in fact in conformance with a large literature on boolean arrays
 and boolean matrices.

Interesting point! I had assumed that dot() just upcast! But what do
you think about the inconsistency between sum() and dot() on bool
arrays?

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Deprecate boolean math operators?

2013-12-06 Thread Nathaniel Smith

On Fri, Dec 6, 2013 at 11:55 AM, Alexander Belopolsky ndar...@mac.com wrote:



 On Fri, Dec 6, 2013 at 1:46 PM, Alan G Isaac alan.is...@gmail.com wrote:

 On 12/6/2013 1:35 PM, josef.p...@gmail.com wrote:
  unary versus binary minus

 Oh right; I consider binary `-` broken for
 Boolean arrays. (Sorry Alexander; I did not
 see your entire issue.)


  I'd rather write ~ than unary - if that's what it is.

 I agree.  So I have no objection to elimination
 of the `-`.


 It looks like we are close to reaching a consensus on the following points:

 1. * is well-defined on boolean arrays and may be used in preference of  in
 code that is designed to handle 1s and 0s of any dtype in addition to
 booleans.

 2. + is defined consistently with * and the only issue is the absence of
 additive inverse.  This is not a problem as long as presence of - does not
 suggest otherwise.

 3. binary and unary minus should be deprecated because its use in
 expressions where variables can be either boolean or numeric would lead to
 subtle bugs.  For example -x*y would produce different results from -(x*y)
 depending on whether x is boolean or not.  In all situations, ^ is
 preferable to binary - and ~ is preferable to unary -.

 4. changing boolean arithmetics to auto-promotion to int is precluded by a
 significant use-case of boolean matrices.

+1

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Deprecate boolean math operators?

2013-12-06 Thread Nathaniel Smith

Not sure how much time it's worth spending on coming up with new
definitions for boolean subtraction, since even if we deprecate the
current behavior now we won't be able to implement any of them for a
year+, and then we'll end up having to go through these debates again
then anyway.

-n

On Fri, Dec 6, 2013 at 2:29 PM,  josef.p...@gmail.com wrote:
 On Fri, Dec 6, 2013 at 4:14 PM,  josef.p...@gmail.com wrote:
 On Fri, Dec 6, 2013 at 3:50 PM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
 On Fri, 2013-12-06 at 15:30 -0500, josef.p...@gmail.com wrote:
 On Fri, Dec 6, 2013 at 2:59 PM, Nathaniel Smith n...@pobox.com wrote:
  On Fri, Dec 6, 2013 at 11:55 AM, Alexander Belopolsky ndar...@mac.com 
  wrote:
 
 
 
  On Fri, Dec 6, 2013 at 1:46 PM, Alan G Isaac alan.is...@gmail.com 
  wrote:
 
  On 12/6/2013 1:35 PM, josef.p...@gmail.com wrote:
   unary versus binary minus
 
  Oh right; I consider binary `-` broken for
  Boolean arrays. (Sorry Alexander; I did not
  see your entire issue.)
 
 
   I'd rather write ~ than unary - if that's what it is.
 
  I agree.  So I have no objection to elimination
  of the `-`.
 
 
  It looks like we are close to reaching a consensus on the following 
  points:
 
  1. * is well-defined on boolean arrays and may be used in preference of 
   in
  code that is designed to handle 1s and 0s of any dtype in addition to
  booleans.
 
  2. + is defined consistently with * and the only issue is the absence of
  additive inverse.  This is not a problem as long as presence of - does 
  not
  suggest otherwise.
 
  3. binary and unary minus should be deprecated because its use in
  expressions where variables can be either boolean or numeric would lead 
  to
  subtle bugs.  For example -x*y would produce different results from 
  -(x*y)
  depending on whether x is boolean or not.  In all situations, ^ is
  preferable to binary - and ~ is preferable to unary -.
 
  4. changing boolean arithmetics to auto-promotion to int is precluded 
  by a
  significant use-case of boolean matrices.
 
  +1

 +0.5
 (I would still prefer a different binary minus, but it would be
 inconsistent with a logical unary minus that negates.)


 The question is if the current xor behaviour can make sense? It doesn't
 seem to make much sense mathematically? Which only leaves that `abs(x -
 y)` is actually what a (python) programmer might expect.
 I think I would like to deprecate at least the unary one. The ~ kind of
 behaviour just doesn't fit as far as I can see.

 I haven't seen any real use cases for xor yet.
 My impression is that both plus and minus are just overflow accidents
 and not intentional. plus works in a useful way, minus as xor might be
 used once per century.

 I would deprecate both unary and binary minus.

 (And when nobody is looking in two versions from now, I would add a
 binary minus that overflows to the clipped version, so I get a set
 subtraction. :)

 Actually minus works as expected if we avoid negative overflow:

 m1 - m1*m2
 array([False, False,  True, False], dtype=bool)
 m1 * ~m2
 array([False, False,  True, False], dtype=bool)
 m1  ~m2
 array([False, False,  True, False], dtype=bool)

 I find the first easy to read, but m1 - m2 would be one operation
 less, and chain more easily m1 - m2 - m3
 m1 are mailing list subscribers, take away
 m2 owners of apples, take away
 m3 users of Linux
 = exotic developers

 Josef







 5. `/` is useless
 6 `**` follows from 1.

 m1 ** m2
 array([1, 0, 1, 1], dtype=int8)
 m1 ** 2
 array([False, False,  True,  True], dtype=bool)
 m1 ** 3
 array([0, 0, 1, 1])

 but I'm using python with an old numpy right now
 np.__version__
 '1.6.1'


 Both of these are currently not defined, they will just cause upcast to
 int8. I suppose it would be possible to deprecate that upcast though
 (same goes for most all other ufuncs/operators in principle).

 We would have to start the discussion again for all other
 operators/ufuncs to see if they are useful in some cases.
 For most treating as int will make sense, I guess.

 Josef



 Josef


 
  --
  Nathaniel J. Smith
  Postdoctoral researcher - Informatics - University of Edinburgh
  http://vorpus.org
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

Re: [Numpy-discussion] math.fsum() like ufunc for numpy

2013-12-02 Thread Nathaniel Smith

I think that would be great. Technically what you'd want is a gufunc.

-n

On Mon, Dec 2, 2013 at 9:44 AM, Daniele Nicolodi dani...@grinta.net wrote:
 Hello,

 there would be interest in adding a floating point accurate summation
 function like Python's math.fsum() in the form of an ufunc to NumPy?

 I had a look at the algorithm
 (http://code.activestate.com/recipes/393090-binary-floating-point-summation-accurate-to-full-p/)
 and it looks quite straightforward to implement. I can try to submit a
 patch for it.

 Cheers,
 Daniele
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] nasty bug in 1.8.0??

2013-12-02 Thread Nathaniel Smith

On Mon, Dec 2, 2013 at 11:35 AM, Neal Becker ndbeck...@gmail.com wrote:
 I don't think that behavior is acceptable.

That's... too bad? I'm not sure what your objection actually is.

It's an intentional change (though disabled by default in 1.8), and a
necessary step to rationalizing our definition of contiguity and
stride handling in general, which has a number of benefits:
  http://docs.scipy.org/doc/numpy-dev/release.html#npy-relaxed-strides-checking
  
http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html#internal-memory-layout-of-an-ndarray

Why do you care about the stride of an array with only 1 element,
where by definition you never use the stride?

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] nasty bug in 1.8.0??

2013-12-02 Thread Nathaniel Smith

On Mon, Dec 2, 2013 at 3:15 PM, Jim Bosch tallji...@gmail.com wrote:
 If your arrays are contiguous, you don't really need the strides (use the
 itemsize instead). How is ndarray broken by this?

 ndarray is broken by this change because it expects the stride to be a
 multiple of the itemsize (I think; I'm just looking at code here, as I
 haven't had time to build NumPy 1.8 yet to test this); it has a slightly
 more restricted model for what data can look like than NumPy has, and it's
 easier to always just look at the stride for all sizes rather than
 special-case for size=1.

Note that arrays in which any dimension is 0 (i.e., 0 total elements)
can also have arbitrary strides with no consequence.

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] PyArray_BASE equivalent in python

2013-11-26 Thread Nathaniel Smith

On Tue, Nov 26, 2013 at 11:54 AM, Peter Rennert p.renn...@cs.ucl.ac.uk wrote:
 Hi,

 I as the title says, I am looking for a way to set in python the base of
 an ndarray to an object.

 Use case is porting qimage2ndarray to PySide where I want to do
 something like:

 In [1]: from PySide import QtGui

 In [2]: image =
 QtGui.QImage('/home/peter/code/pyTools/sandbox/images/faceDemo.jpg')

 In [3]: import numpy as np

 In [4]: a = np.frombuffer(image.bits())

 -- I would like to do something like:
 In [5]: a.base = image

 -- to avoid situations such as:
 In [6]: del image

 In [7]: a
 Segmentation fault (core dumped)

This is a bug in PySide -- the buffer object returned by image.bits()
needs to hold a reference to the original image. Please report a bug
to them. You will also get a segfault from code that doesn't use numpy
at all, by doing things like:

bits = image.bits()
del image
anything involving the bits object

As a workaround, you can write a little class with an
__array_interface__ attribute that points to the image's contents, and
then call np.asarray() on this object. The resulting array will have
your object as its .base, and then your object can hold onto whatever
references it wants.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] PyArray_BASE equivalent in python

2013-11-26 Thread Nathaniel Smith

On Tue, Nov 26, 2013 at 1:37 PM, Peter Rennert p.renn...@cs.ucl.ac.uk wrote:
 I probably did something wrong, but it does not work how I tried it. I
 am not sure if you meant it like this, but I tried to subclass from
 ndarray first, but then I do not have access to __array_interface__. Is
 this what you had in mind?

 from PySide import QtGui
 import numpy as np

 class myArray():
  def __init__(self, shape, bits, strides):
  self.__array_interface__ = \
  {'data': bits,
   'typestr': 'i32',
   'descr': [('', 'f8')],
   'shape': shape,
   'strides': strides,
   'version': 3}

You need this object to also hold a reference to the image object --
the idea is that so long as the array lives it will hold a ref to this
object in .base, and then this object holds the image alive. But...

 image = QtGui.QImage('/home/peter/code/pyTools/sandbox/images/faceDemo.jpg')

 b = myArray((image.width(), image.height()), image.bits(),
 (image.bytesPerLine(), 4))
 b = np.asarray(b)

 b.base
 #read-write buffer ptr 0x7fd744c4b010, size 1478400 at 0x264e9f0

...this isn't promising, since it suggests that numpy is cleverly
cutting out the middle-man when you give it a buffer object, since it
knows that buffer objects are supposed to actually take care of memory
management.

You might have better luck using the raw pointer two-tuple form for
the data field. You can't get these pointers directly from a buffer
object, but numpy will give them to you. So you can use something like

  data: np.asarray(bits).__array_interface__[data]

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] PyArray_BASE equivalent in python

2013-11-26 Thread Nathaniel Smith

On Tue, Nov 26, 2013 at 2:55 PM, Peter Rennert p.renn...@cs.ucl.ac.uk wrote:
 Btw, I just wanted to file a bug at PySide, but it might be alright at
 their end, because I can do this:

 from PySide import QtGui

 image = QtGui.QImage('/home/peter/code/pyTools/sandbox/images/faceDemo.jpg')

 a = image.bits()

 del image

 a
 #read-write buffer ptr 0x7f5fe0034010, size 1478400 at 0x3c1a6b0

That just means that the buffer still has a pointer to the QImage's
old memory. It doesn't mean that following that pointer won't crash.
Try str(a) or something that actually touches the buffer contents...

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] RFC: is it worth giving a lightning talk at PyCon 2014 on numpy vbench-marking?

2013-11-24 Thread Nathaniel Smith

On Sun, Nov 24, 2013 at 5:32 PM, Yaroslav Halchenko
li...@onerussian.com wrote:

 On Tue, 15 Oct 2013, Nathaniel Smith wrote:
 What do you have to lose?

  btw -- fresh results are here http://yarikoptic.github.io/numpy-vbench/ .

  I have tuned benchmarking so it now reflects the best performance across
  multiple executions of the whole battery, thus eliminating spurious
  variance if estimate is provided from a single point in time.  Eventually I
  expect many of those curves to become even cleaner.

 On another note, what do you think of moving the vbench benchmarks
 into the main numpy tree? We already require everyone who submits a
 bug fix to add a test; there are a bunch of speed enhancements coming
 in these days and it would be nice if we had some way to ask people to
 submit a benchmark along with each one so that we know that the
 enhancement stays enhanced...

 On this positive note (it is boring to start a new thread, isn't it?) --
 would you be interested in me transfering numpy-vbench over to
 github.com/numpy ?

If you mean just moving the existing git repo under the numpy
organization, like github.com/numpy/numpy-vbench, then I'm not sure
how much difference it would make really. What seems like it'd be
really useful though would be if the code could move into the main
numpy tree, so that people could submit both benchmarks and
optimizations within a single PR.

 as of today, plots on http://yarikoptic.github.io/numpy-vbench  should
 be updating 24x7 (just a loop, thus no time guarantee after you submit
 new changes).

 Besides benchmarking new benchmarks (your PRs  would still be very
 welcome,  so far it was just me and Julian T) and revisions, that
 process also goes through a random sample of existing previously
 benchmarked revisions and re-runs the benchmarks thus improving upon the
 ultimate 'min' timing performance.  So you can see already that many
 plots became much 'cleaner', although now there might be a bit of bias
 in estimates for recent revisions since they hadn't accumulated yet as
 many of 'independent runs' as older revisions.

Cool!

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

< 2 3 4 5 6 7 8 9 10 11 >

601 - 700 of 1301 matches

Mail list logo