On Mon, Mar 10, 2014 at 1:29 PM, Joachim Durchholz <[email protected]> wrote: > Am 10.03.2014 13:30, schrieb [email protected]: >> >> >> [...] I also don't want >> >> to forget the Python scientific data processing world, which (a) is where >> these problems actually occur (because they make heavy use of both >> elementwise and matrix multiplication), > > Ack. > >> (b) has, after a debate that's been >> >> going on for 15 years back to the early releases of Numeric, >> overwhelmingly >> settled on using * for elementwise multiplication, > > I'm very, very sceptical, both of the claim (the PEP certainly does not make > a strong case here) and of the usefulness of that. > > The only use case that I see for element-wise multiplication is the scalar > product, which is a very weak case for @ because you still need the Python > loop (or reduce() call) to add the products. > Besides, the number of scalar product implementations you write is very > limited - once per library, possibly with a handful of variants. Hardly > enough to make "a clumsy loop" an argument of any weight. > > Prove me wrong if you can :-)
No-one uses elementwise multiplication to implement the scalar product, they just call np.dot or whatever the local equivalent is. Nonetheless, every single numeric language I know of has a dedicated 1-or-2-character infix operator for elementwise multiplication; the only debate is whether it or matrix multiplication get more prominent billing. Pauli did some back-of-the-envelope checks, which find that elementwise-* is in fact called hundreds of times in the sklearn/nipy/scipy test suites, and this excludes scalar * matrix/scalar * vector cases: https://github.com/numpy/numpy/pull/4351#issuecomment-37140164 It's at least in the same ballpark of usage as matrix multiplication. What I think you may be missing is that the whole "zen of arrays" is that they give you a coherent, composable system for writing simple scalar code that works over lots of scalars at once. You can write down a simple function like def f(x, y): return x * y + np.sin(x) and this looks like a nice simple scalar formula, you can test it by passing in scalars to x & y, BUT if you pass in arrays for x and y, then this same function will calculate that formula at thousands of values at once, and do it at near-C speeds. This is "composable" in the sense that f() now can be treated as a built-in "vectorized" (array) function, just like np.sin. And as a bonus, there's also a whole coherent system for doing more interesting manipulations, which f() will also participate in. E.g. if you want to evaluate this function on a grid, you can just do f(x_vals[:, np.newaxis], y_vals[np.newaxis, :]) which passes in 2d arrays for x and y with shape n-by-1 and 1-by-n, then this will give you the values at all combinations of x_vals and y_vals, returned as a neat n-by-n 2d array, with reasonable memory use (e.g., x * y will allocate a 2d temporary, but sin(x) will allocate only a 1d temporary). Notice that this case uses elementwise-* on 2d arrays. (If you want to see someone using trick who isn't me, then I basically just stole it from the official matplotlib 2d examples, e.g.: http://matplotlib.org/examples/images_contours_and_fields/streamplot_demo_features.html) Or, another example: I've recently been working on a project that involves calculating Bayesian posterior expectations using a simplified particle filtering method. If we have N particles (i.e., potential parameter settings for our statistical model), and we store the likelihoods of the data P(data | parameter value) in an array with N entries, the priors P(parameter value) in another array with N entries, and the values themselves in a third array, then in PEP-notation the posterior expectation of some function g() is: normalize(likelihoods * priors) @ g(values) (This is using @ in its scalar-product guise, but in fact it turns out in the real code we have multiple different versions of this calculation that we have to do, so the different likelihoods are stored in a 2d array; the same formula works unchanged, though, with @ now acting as matrix-vector multiplication.) The utility of elementwise multiplication just isn't a controversial point anywhere in the broader numeric ecosystem. >> (c) involves something >> >> like 20-30x more downstream users (judging by metrics like pypi downloads, >> github code search for "import numpy" versus "import sympy", number of >> pypi >> packages declaring a dependency on numpy versus sympy, etc.). So I just >> don't think I'm going to get very far if I go back to the numpy list and >> say "Alright, guys! We're switching * to mean matrix multiplication to >> make >> the mathematicians happy!". > > > Well, you came here to get our perspective. > No skin off my nose if you dislike it, or ignore it. Of course. I only bring it up because it's totally possible that there's some way of tweaking what's in the PEP to make it more useful to everybody, and if so then I hope you guys will help me think of it. > I did mention that I'm rather sceptical about the reported "overwhelming > support"; 2 out of 5 projects already implemented it, supposedly one of them > is your own; 2 are reported as "planning to implement it". > "Overwhelming support" and "general consensus" looks quite differently. And > I hate being dragged into a bandwagon by promises of overwhelming support > that has been generated as a self-fulfilling prophecy; that's politics. Not sure what you're referring to here. Some projects that use * to mean elementwise multiplication include: numpy, theano, pycuda, pandas, numexpr, pytables, h5py, blaze, panda3d... The projects which use the opposite convention are overrepresented in the list in the PEP right now because I wanted to make sure to get feedback from other perspectives early, but at least scipy.sparse, pyoperators, and pyviennacl all reacted to the PEP by saying "hallelulah, this is what we really wanted all along" (and they all implement some way of doing elementwise multiplication, they just don't call it *). It's only sympy/sage that find elementwise multiplication so weird. And in any case, the point is really about what actual users do -- numpy and the other projects listed there like theano and pandas collectively have ~hundreds of downstream projects that depend on them, and AFAICT well over 90% of those use np.ndarray exclusively, and ignore np.matrix to the maximum extent possible. pyoperators and pyviennacl, by contrast, are projects with ~1 user apiece. This is the "vast majority" referred to in the PEP. > So... no general consensus. In fact, we don't care much about the PEP, we > live with * for multiplication just fine, and while we acknowledge the > usefulness of matrix multiplication, we do think that it's no service to > Python if every Python project promotes its set of pet operators - I bet the > network guys could use some, generating output is another area (an operator > for concatenating into an output stream would be more generally useful than > matrix multiplication). The day that I see 20% of PyCon tutorials are on the topic of "how to manipulate ipaddr objects", and there's a massive non-Python literature using a standard infix notation for manipulating ipaddrs, and they've spent 15 years struggling with the operators that Python already has, is the day that I'll support a new operator for the network guys too :-). >> So the question is, what's the best system that balances between all these >> different communities and their constraints? Right now my feeling is that >> the PEP's proposal is the best available option, and if that means that >> sympy/sage will just ignore the PEP then that's fine, you guys are solving >> a different problem. But if you have a better way to balance these >> competing needs then I'd love to hear it! > > > Well, now that you're asking, I'll give my answer even if it's probably not > what you want to read. > > Replace your math hat with a Python language designer hat. > With that hat on, the question isn't just whether @ is going to help math > and math-based domains, it's whether it's going to help Python as a whole. > Given that Python serves not just these domains, the relevance of matrix > multiplication diminishes greatly. Right, this is the point of the sections called "But isn't matrix multiplication a pretty niche requirement?" and "So ``@`` is good for matrix formulas, but how common are those really?". I actually do feel that @ is more useful to Python as a whole than some of the existing operators. If you have any thoughts on how to make those sections stronger then I'd definitely be interested in hearing them. > Personally, in Guido's place, I'd > outright reject any domain-specific operator, and ask the people to come up > with a better proposal for user-defined operators. What I have seen of > user-defined operators discussion made me suspect that previous attempts > were rejected not because user-defined operators, but because immature. AFAICT his most recent comment on user-defined operators is: "I encourage the numpy folks to think about a language preprocessor that adds extra operators and perhaps other syntax." (https://plus.google.com/115212051037621986145/posts/hZVVtJ9bK3u) ...which doesn't give me much hope on this account. AFAICT the possibilities are: (1) no new operators, (2) one new very well-motivated operator like @, (3) there is no possibility 3. I personally don't see how to make user-defined operators work in Python at all -- among other issues, they would totally break the current compile/eval separation. > Speaking of language design competence, I'm missing a serious discussion of > syntactic ambiguities in your PEP. > Saying "oh it's only used for annotations so no problem" is just handwaving, > you can still run into ambiguities at the lexical level; also, there's no > outline of the changes needed in the parser and how you can be sure it > doesn't misinterpret an annotation for an operator or vice versa, and how > the changes might the accuracy of error messages. I didn't go into details here because AFAICT they're trivial -- am I missing something? Annotation-@ is only legal as the first token in a statement; matmul-@ is a binary operator, so it's only legal when *not* the first token in a statement. This is pretty unambiguous, and Python has lots of parsing issues that are more subtle than this ("is not", "if" in comprehensions, etc.). > This all in the sense of "do as you wish but that's the points I'd be asking > if I were Guido, so you might want to clarify there before presenting this > to the Python community at large". Indeed, and it's appreciated! -n -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sympy. To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAPJVwBkHpz%3DMdfDA%2B5Fu85srPY5jrAEx6GNosS4C26NF1-iSWQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
