date:20120514

Re: [Numpy-discussion] histogram2d and histogramdd return counts as floats while histogram returns ints

2012-05-14 Thread Nils Becker

 is this intended?
 
 np.histogramdd([[1,2],[3,4]],bins=2)
 
 (array([[ 1.,  0.],
[ 0.,  1.]]),
  [array([ 1. ,  1.5,  2. ]), array([ 3. ,  3.5,  4. ])])
 
 np.histogram2d([1,2],[3,4],bins=2)
 
 (array([[ 1.,  0.],
[ 0.,  1.]]),
  array([ 1. ,  1.5,  2. ]),
  array([ 3. ,  3.5,  4. ]))
 
 np.histogram([1,2],bins=2)
 
 (array([1, 1]), array([ 1. ,  1.5,  2. ]))

maybe i should have been more explicit. what i meant to say is that

1. the counts in a histogram are integers. whenever no normalization is
used i would expect that i get an integer array when i call a histogram
function.
2. now it might be intended that the data type is always the same so
that float has to be used to accomodate the normalized histograms.
3. in any case, the 1d histogram function handles this differently from
the 2d and dd ones. this seems inconsistent and might be considered a bug.

n.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

2012-05-14 Thread mark florisson

On 12 May 2012 22:55, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote:
 On 05/11/2012 03:37 PM, mark florisson wrote:

 On 11 May 2012 12:13, Dag Sverre Seljebotnd.s.seljeb...@astro.uio.no
  wrote:

 (NumPy devs: I know, I get too many ideas. But this time I *really*
 believe
 in it, I think this is going to be *huge*. And if Mark F. likes it it's
 not
 going to be without manpower; and as his mentor I'd pitch in too here and
 there.)

 (Mark F.: I believe this is *very* relevant to your GSoC. I certainly
 don't
 want to micro-manage your GSoC, just have your take.)

 Travis, thank you very much for those good words in the NA-mask
 interactions... thread. It put most of my concerns away. If anybody is
 leaning towards for opaqueness because of its OOP purity, I want to refer
 to
 C++ and its walled-garden of ideological purity -- it has, what, 3-4
 different OOP array libraries, neither of which is able to out-compete
 the
 other. Meanwhile the rest of the world happily cooperates using pointers,
 strides, CSR and CSC.

 Now, there are limits to what you can do with strides and pointers.
 Noone's
 denying the need for more. In my mind that's an API where you can do
 fetch_block and put_block of cache-sized, N-dimensional blocks on an
 array;
 but it might be something slightly different.

 Here's what I'm asking: DO NOT simply keep extending ndarray and the
 NumPy C
 API to deal with this issue.

 What we need is duck-typing/polymorphism at the C level. If you keep
 extending ndarray and the NumPy C API, what we'll have is a one-to-many
 relationship: One provider of array technology, multiple consumers (with
 hooks, I'm sure, but all implementations of the hook concept in the NumPy
 world I've seen so far are a total disaster!).

 What I think we need instead is something like PEP 3118 for the
 abstract
 array that is only available block-wise with getters and setters. On the
 Cython list we've decided that what we want for CEP 1000 (for boxing
 callbacks etc.) is to extend PyTypeObject with our own fields; we could
 create CEP 1001 to solve this issue and make any Python object an
 exporter
 of block-getter/setter-arrays (better name needed).

 What would be exported is (of course) a simple vtable:

 typedef struct {
    int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right,
 ...);
    ...
 } block_getter_setter_array_vtable;

 Let's please discuss the details *after* the fundamentals. But the reason
 I
 put void* there instead of PyObject* is that I hope this could be used
 beyond the Python world (say, Python-Julia); the void* would be handed
 to
 you at the time you receive the vtable (however we handle that).


 I suppose it would also be useful to have some way of predicting the
 output format polymorphically for the caller. E.g. dense *
 block_diagonal results in block diagonal, but dense + block_diagonal
 results in dense, etc. It might be useful for the caller to know
 whether it needs to allocate a sparse, dense or block-structured
 array. Or maybe the polymorphic function could even do the allocation.
 This needs to happen recursively of course, to avoid intermediate
 temporaries. The compiler could easily handle that, and so could numpy
 when it gets lazy evaluation.


 Ah. But that depends too on the computation to be performed too; a)
 elementwise, b) axis-wise reductions, c) linear algebra...

 In my oomatrix code (please don't look at it, it's shameful) I do this using
 multiple dispatch.

 I'd rather ignore this for as long as we can, only implementing a[:] = ...
 -- I can't see how decisions here would trickle down to the API that's used
 in the kernel, it's more like a pre-phase, and better treated orthogonally.


 I think if the heavy lifting of allocating output arrays and exporting
 these arrays work in numpy, then support in Cython could use that (I
 can already hear certain people object to more complicated array stuff
 in Cython :). Even better here would be an external project that each
 our projects could use (I still think the nditer sorting functionality
 of arrays should be numpy-agnostic and externally available).


 I agree with the separate project idea. It's trivial for NumPy to
 incorporate that as one of its methods for exporting arrays, and I don't
 think it makes sense to either build it into Cython, or outright depend on
 NumPy.

 Here's what I'd like (working title: NumBridge?).

  - Mission: Be the double* + shape + strides in a world where that is no
 longer enough, by providing tight, focused APIs/ABIs that are usable across
 C/Fortran/Python.

 I basically want something I can quickly acquire from a NumPy array, then
 pass it into my C code without dragging along all the cruft that I don't
 need.

  - Written in pure C + specs, usable without Python

  - PEP 3118 done right, basically semi-standardize the internal Cython
 memoryview ABI and get something that's passable on stack

  - Get block get/put API

  - Iterator APIs

  - Utility

Re: [Numpy-discussion] Missing data wrap-up and request for comments

2012-05-14 Thread Richard Hattersley

For what it's worth, I'd prefer ndmasked.

As has been mentioned elsewhere, some algorithms can't really cope with
missing data. I'd very much rather they fail than silently give incorrect
results. Working in the climate prediction business (as with many other
domains I'm sure), even the *potential* for incorrect results can be
damaging.


On 11 May 2012 06:14, Travis Oliphant tra...@continuum.io wrote:


 On May 10, 2012, at 12:21 AM, Charles R Harris wrote:



 On Wed, May 9, 2012 at 11:05 PM, Benjamin Root ben.r...@ou.edu wrote:



 On Wednesday, May 9, 2012, Nathaniel Smith wrote:



 My only objection to this proposal is that committing to this approach
 seems premature. The existing masked array objects act quite
 differently from numpy.ma, so why do you believe that they're a good
 foundation for numpy.ma, and why will users want to switch to their
 semantics over numpy.ma's semantics? These aren't rhetorical
 questions, it seems like they must have concrete answers, but I don't
 know what they are.


 Based on the design decisions made in the original NEP, a re-made
 numpy.ma would have to lose _some_ features particularly, the ability to
 share masks. Save for that and some very obscure behaviors that are
 undocumented, it is possible to remake numpy.ma as a compatibility layer.

 That being said, I think that there are some fundamental questions that
 has concerned. If I recall, there were unresolved questions about behaviors
 surrounding assignments to elements of a view.

 I see the project as broken down like this:
 1.) internal architecture (largely abi issues)
 2.) external architecture (hooks throughout numpy to utilize the new
 features where possible such as where= argument)
 3.) getter/setter semantics
 4.) mathematical semantics

 At this moment, I think we have pieces of 2 and they are fairly
 non-controversial. It is 1 that I see as being the immediate hold-up here.
 3  4 are non-trivial, but because they are mostly about interfaces, I
 think we can be willing to accept some very basic, fundamental, barebones
 components here in order to lay the groundwork for a more complete API
 later.

 To talk of Travis's proposal, doing nothing is no-go. Not moving forward
 would dishearten the community. Making a ndmasked type is very intriguing.
 I see it as a set towards eventually deprecating ndarray? Also, how would
 it behave with no.asarray() and no.asanyarray()? My other concern is a
 possible violation of DRY. How difficult would it be to maintain two
 ndarrays in parallel?

 As for the flag approach, this still doesn't solve the problem of legacy
 code (or did I misunderstand?)


 My understanding of the flag is to allow the code to stay in and get
 reworked and experimented with while keeping it from contaminating
 conventional use.

 The whole point of putting the code in was to experiment and adjust. The
 rather bizarre idea that it needs to be perfect from the get go is
 disheartening, and is seldom how new things get developed. Sure, there is a
 plan up front, but there needs to be feedback and change. And in fact, I
 haven't seen much feedback about the actual code, I don't even know that
 the people complaining have tried using it to see where it hurts. I'd like
 that sort of feedback.


 I don't think anyone is saying it needs to be perfect from the get go.
  What I am saying is that this is fundamental enough to downstream users
 that this kind of thing is best done as a separate object.  The flag could
 still be used to make all Python-level array constructors build ndmasked
 objects.

 But, this doesn't address the C-level story where there is quite a bit of
 downstream use where people have used the NumPy array as just a pointer to
 memory without considering that there might be a mask attached that should
 be inspected as well.

 The NEP addresses this a little bit for those C or C++ consumers of the
 ndarray in C who always use PyArray_FromAny which can fail if the array has
 non-NULL mask contents.   However, it is *not* true that all downstream
 users use PyArray_FromAny.

 A large number of users just use something like PyArray_Check and then
 PyArray_DATA to get the pointer to the data buffer and then go from there
 thinking of their data as a strided memory chunk only (no extra mask).
  The NEP fundamentally changes this simple invariant that has been in NumPy
 and Numeric before it for a long, long time.

 I really don't see how we can do this in a 1.7 release.It has too many
 unknown and I think unknowable downstream effects.But, I think we could
 introduce another arrayobject that is the masked_array with a Python-level
 flag that makes it the default array in Python.

 There are a few more subtleties,  PyArray_Check by default will pass
 sub-classes so if the new ndmask array were a sub-class then it would be
 passed (just like current numpy.ma arrays and matrices would pass that
 check today).However, there is a PyArray_CheckExact macro which could

Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1

2012-05-14 Thread Ralf Gommers

On Sun, May 13, 2012 at 1:14 PM, Paul Anton Letnes 
paul.anton.let...@gmail.com wrote:

 On Sat, May 12, 2012 at 9:50 PM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
 
 
  On Sun, May 6, 2012 at 12:12 AM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 
 
 
  On Sat, May 5, 2012 at 2:56 PM, Paul Anton Letnes
  paul.anton.let...@gmail.com wrote:
 
  Hi,
 
  I'm getting a couple of errors when testing. System:
  Arch Linux (updated today)
  Python 3.2.3
  gcc 4.7.0
  (Anything else?)
 
  I think that this error:
  AssertionError: selectedrealkind(19): expected -1 but got 16
  is due to the fact that newer versions of gfortran actually supports
  precision this high (quad precision).
 
 
  Yes, but it should be fixed. I can't duplicate this here with a fresh
  checkout of the branch.
 
 
  This failure makes no sense to me.
 
  Error comes from this code:
 
  'selectedrealkind(%s): expected %r but got %r' %  (i,
  selected_real_kind(i), selectedrealkind(i)))
 
  So selected_real_kind(19) returns -1.
 
  selected_real_kind is the function
  numpy.f2py.crackfortran._selected_real_kind_func, which is defined as:
 
  def _selected_real_kind_func(p, r=0, radix=0):
  #XXX: This should be processor dependent
  # This is only good for 0 = p = 20
  if p  7: return 4
  if p  16: return 8
  if platform.machine().lower().startswith('power'):
  if p = 20:
  return 16
  else:
  if p  19:
  return 10
  elif p = 20:
  return 16
  return -1
 
  For p=19 this function should always return 16. So the result from
 compiling
  foo.f90 is fine, but the test is broken in a very strange way.
 
  Paul, is the failure reproducible on your machine? If so, can you try to
  debug it?
 
  Ralf

 Hi Ralf.

 The Arch numpy (1.6.1) for python 2.7, installed via pacman (the
 package manager) has this problem.

 After installation of numpy 1.6.2rc1 in a virtualenv, the test passes.
 Maybe the bug was fixed in the RC, and I screwed up which numpy
 version I tested? I'm sorry that I can't find out - I just built a new
 machine, and the old one is lying around the livingroom in pieces. Was
 that particular bit of code changed between 1.6.1 and 1.6.2rc1?


It was actually, in https://github.com/numpy/numpy/commit/e7f2210e1.

So you tested 1.6.1 by accident before, and it's working now? Problem
solved in that case.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

2012-05-14 Thread Dag Sverre Seljebotn

On 05/14/2012 06:31 PM, mark florisson wrote:
 On 12 May 2012 22:55, Dag Sverre Seljebotnd.s.seljeb...@astro.uio.no  wrote:
 On 05/11/2012 03:37 PM, mark florisson wrote:

 On 11 May 2012 12:13, Dag Sverre Seljebotnd.s.seljeb...@astro.uio.no
   wrote:

 (NumPy devs: I know, I get too many ideas. But this time I *really*
 believe
 in it, I think this is going to be *huge*. And if Mark F. likes it it's
 not
 going to be without manpower; and as his mentor I'd pitch in too here and
 there.)

 (Mark F.: I believe this is *very* relevant to your GSoC. I certainly
 don't
 want to micro-manage your GSoC, just have your take.)

 Travis, thank you very much for those good words in the NA-mask
 interactions... thread. It put most of my concerns away. If anybody is
 leaning towards for opaqueness because of its OOP purity, I want to refer
 to
 C++ and its walled-garden of ideological purity -- it has, what, 3-4
 different OOP array libraries, neither of which is able to out-compete
 the
 other. Meanwhile the rest of the world happily cooperates using pointers,
 strides, CSR and CSC.

 Now, there are limits to what you can do with strides and pointers.
 Noone's
 denying the need for more. In my mind that's an API where you can do
 fetch_block and put_block of cache-sized, N-dimensional blocks on an
 array;
 but it might be something slightly different.

 Here's what I'm asking: DO NOT simply keep extending ndarray and the
 NumPy C
 API to deal with this issue.

 What we need is duck-typing/polymorphism at the C level. If you keep
 extending ndarray and the NumPy C API, what we'll have is a one-to-many
 relationship: One provider of array technology, multiple consumers (with
 hooks, I'm sure, but all implementations of the hook concept in the NumPy
 world I've seen so far are a total disaster!).

 What I think we need instead is something like PEP 3118 for the
 abstract
 array that is only available block-wise with getters and setters. On the
 Cython list we've decided that what we want for CEP 1000 (for boxing
 callbacks etc.) is to extend PyTypeObject with our own fields; we could
 create CEP 1001 to solve this issue and make any Python object an
 exporter
 of block-getter/setter-arrays (better name needed).

 What would be exported is (of course) a simple vtable:

 typedef struct {
 int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t *lower_right,
 ...);
 ...
 } block_getter_setter_array_vtable;

 Let's please discuss the details *after* the fundamentals. But the reason
 I
 put void* there instead of PyObject* is that I hope this could be used
 beyond the Python world (say, Python-Julia); the void* would be handed
 to
 you at the time you receive the vtable (however we handle that).


 I suppose it would also be useful to have some way of predicting the
 output format polymorphically for the caller. E.g. dense *
 block_diagonal results in block diagonal, but dense + block_diagonal
 results in dense, etc. It might be useful for the caller to know
 whether it needs to allocate a sparse, dense or block-structured
 array. Or maybe the polymorphic function could even do the allocation.
 This needs to happen recursively of course, to avoid intermediate
 temporaries. The compiler could easily handle that, and so could numpy
 when it gets lazy evaluation.


 Ah. But that depends too on the computation to be performed too; a)
 elementwise, b) axis-wise reductions, c) linear algebra...

 In my oomatrix code (please don't look at it, it's shameful) I do this using
 multiple dispatch.

 I'd rather ignore this for as long as we can, only implementing a[:] = ...
 -- I can't see how decisions here would trickle down to the API that's used
 in the kernel, it's more like a pre-phase, and better treated orthogonally.


 I think if the heavy lifting of allocating output arrays and exporting
 these arrays work in numpy, then support in Cython could use that (I
 can already hear certain people object to more complicated array stuff
 in Cython :). Even better here would be an external project that each
 our projects could use (I still think the nditer sorting functionality
 of arrays should be numpy-agnostic and externally available).


 I agree with the separate project idea. It's trivial for NumPy to
 incorporate that as one of its methods for exporting arrays, and I don't
 think it makes sense to either build it into Cython, or outright depend on
 NumPy.

 Here's what I'd like (working title: NumBridge?).

   - Mission: Be the double* + shape + strides in a world where that is no
 longer enough, by providing tight, focused APIs/ABIs that are usable across
 C/Fortran/Python.

 I basically want something I can quickly acquire from a NumPy array, then
 pass it into my C code without dragging along all the cruft that I don't
 need.

   - Written in pure C + specs, usable without Python

   - PEP 3118 done right, basically semi-standardize the internal Cython
 memoryview ABI and get something that's passable on stack

   -

Re: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

2012-05-14 Thread Dag Sverre Seljebotn

On 05/14/2012 10:36 PM, Dag Sverre Seljebotn wrote:
 On 05/14/2012 06:31 PM, mark florisson wrote:
 On 12 May 2012 22:55, Dag Sverre Seljebotnd.s.seljeb...@astro.uio.no   
 wrote:
 On 05/11/2012 03:37 PM, mark florisson wrote:

 On 11 May 2012 12:13, Dag Sverre Seljebotnd.s.seljeb...@astro.uio.no
wrote:

 (NumPy devs: I know, I get too many ideas. But this time I *really*
 believe
 in it, I think this is going to be *huge*. And if Mark F. likes it it's
 not
 going to be without manpower; and as his mentor I'd pitch in too here and
 there.)

 (Mark F.: I believe this is *very* relevant to your GSoC. I certainly
 don't
 want to micro-manage your GSoC, just have your take.)

 Travis, thank you very much for those good words in the NA-mask
 interactions... thread. It put most of my concerns away. If anybody is
 leaning towards for opaqueness because of its OOP purity, I want to refer
 to
 C++ and its walled-garden of ideological purity -- it has, what, 3-4
 different OOP array libraries, neither of which is able to out-compete
 the
 other. Meanwhile the rest of the world happily cooperates using pointers,
 strides, CSR and CSC.

 Now, there are limits to what you can do with strides and pointers.
 Noone's
 denying the need for more. In my mind that's an API where you can do
 fetch_block and put_block of cache-sized, N-dimensional blocks on an
 array;
 but it might be something slightly different.

 Here's what I'm asking: DO NOT simply keep extending ndarray and the
 NumPy C
 API to deal with this issue.

 What we need is duck-typing/polymorphism at the C level. If you keep
 extending ndarray and the NumPy C API, what we'll have is a one-to-many
 relationship: One provider of array technology, multiple consumers (with
 hooks, I'm sure, but all implementations of the hook concept in the NumPy
 world I've seen so far are a total disaster!).

 What I think we need instead is something like PEP 3118 for the
 abstract
 array that is only available block-wise with getters and setters. On the
 Cython list we've decided that what we want for CEP 1000 (for boxing
 callbacks etc.) is to extend PyTypeObject with our own fields; we could
 create CEP 1001 to solve this issue and make any Python object an
 exporter
 of block-getter/setter-arrays (better name needed).

 What would be exported is (of course) a simple vtable:

 typedef struct {
  int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t 
 *lower_right,
 ...);
  ...
 } block_getter_setter_array_vtable;

 Let's please discuss the details *after* the fundamentals. But the reason
 I
 put void* there instead of PyObject* is that I hope this could be used
 beyond the Python world (say, Python-Julia); the void* would be handed
 to
 you at the time you receive the vtable (however we handle that).


 I suppose it would also be useful to have some way of predicting the
 output format polymorphically for the caller. E.g. dense *
 block_diagonal results in block diagonal, but dense + block_diagonal
 results in dense, etc. It might be useful for the caller to know
 whether it needs to allocate a sparse, dense or block-structured
 array. Or maybe the polymorphic function could even do the allocation.
 This needs to happen recursively of course, to avoid intermediate
 temporaries. The compiler could easily handle that, and so could numpy
 when it gets lazy evaluation.


 Ah. But that depends too on the computation to be performed too; a)
 elementwise, b) axis-wise reductions, c) linear algebra...

 In my oomatrix code (please don't look at it, it's shameful) I do this using
 multiple dispatch.

 I'd rather ignore this for as long as we can, only implementing a[:] = ...
 -- I can't see how decisions here would trickle down to the API that's used
 in the kernel, it's more like a pre-phase, and better treated orthogonally.


 I think if the heavy lifting of allocating output arrays and exporting
 these arrays work in numpy, then support in Cython could use that (I
 can already hear certain people object to more complicated array stuff
 in Cython :). Even better here would be an external project that each
 our projects could use (I still think the nditer sorting functionality
 of arrays should be numpy-agnostic and externally available).


 I agree with the separate project idea. It's trivial for NumPy to
 incorporate that as one of its methods for exporting arrays, and I don't
 think it makes sense to either build it into Cython, or outright depend on
 NumPy.

 Here's what I'd like (working title: NumBridge?).

- Mission: Be the double* + shape + strides in a world where that is no
 longer enough, by providing tight, focused APIs/ABIs that are usable across
 C/Fortran/Python.

 I basically want something I can quickly acquire from a NumPy array, then
 pass it into my C code without dragging along all the cruft that I don't
 need.

- Written in pure C + specs, usable without Python

- PEP 3118 done right, basically semi-standardize the internal Cython

Re: [Numpy-discussion] Correlation code from NumPy 1.5 Beginner's Guide

2012-05-14 Thread Paul Hobson

On Sun, May 13, 2012 at 9:48 AM, Dinesh Prasad dprasad...@yahoo.com wrote:
 Hello. I am new to the list thanks for accepting my question.

 I am trying to run the attached code, directly from the book in the title.

 It simply calculates correlation of returns of the stock listed in the
 spreadsheets. could it be that the numPY library is not being recognized on
 my system for some reason?

 I installed the 32 bit version of python/numpy. This is to preserve
 compatibility with another application that will trigger the code. I am on
 Windows 7 64 bit Home Edition, however.

 Thanks for any suggestions.

 -Dinesh

Dinesh,

What errors are you getting? I didn't run your code, but from the
looks of things, you might want to set unpack=False as you read the
data.

In other words, instead of this:
 bhp = numpy.loadtxt('BHP.csv', delimiter=',', usecols=(6,), unpack=True)

use this:
 bhp = numpy.loadtxt('BHP.csv', delimiter=',', usecols=(6,), unpack=False)

The unpack keyword argument makes the np.loadtxt function try to
dump each column into its own variable. It may support dumping the
data to a single tuple, but I don't remember that being the case.
-paul
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

2012-05-14 Thread David Cournapeau

On Mon, May 14, 2012 at 5:31 PM, mark florisson
markflorisso...@gmail.comwrote:

 On 12 May 2012 22:55, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no
 wrote:
  On 05/11/2012 03:37 PM, mark florisson wrote:
 
  On 11 May 2012 12:13, Dag Sverre Seljebotnd.s.seljeb...@astro.uio.no
   wrote:
 
  (NumPy devs: I know, I get too many ideas. But this time I *really*
  believe
  in it, I think this is going to be *huge*. And if Mark F. likes it it's
  not
  going to be without manpower; and as his mentor I'd pitch in too here
 and
  there.)
 
  (Mark F.: I believe this is *very* relevant to your GSoC. I certainly
  don't
  want to micro-manage your GSoC, just have your take.)
 
  Travis, thank you very much for those good words in the NA-mask
  interactions... thread. It put most of my concerns away. If anybody is
  leaning towards for opaqueness because of its OOP purity, I want to
 refer
  to
  C++ and its walled-garden of ideological purity -- it has, what, 3-4
  different OOP array libraries, neither of which is able to out-compete
  the
  other. Meanwhile the rest of the world happily cooperates using
 pointers,
  strides, CSR and CSC.
 
  Now, there are limits to what you can do with strides and pointers.
  Noone's
  denying the need for more. In my mind that's an API where you can do
  fetch_block and put_block of cache-sized, N-dimensional blocks on an
  array;
  but it might be something slightly different.
 
  Here's what I'm asking: DO NOT simply keep extending ndarray and the
  NumPy C
  API to deal with this issue.
 
  What we need is duck-typing/polymorphism at the C level. If you keep
  extending ndarray and the NumPy C API, what we'll have is a one-to-many
  relationship: One provider of array technology, multiple consumers
 (with
  hooks, I'm sure, but all implementations of the hook concept in the
 NumPy
  world I've seen so far are a total disaster!).
 
  What I think we need instead is something like PEP 3118 for the
  abstract
  array that is only available block-wise with getters and setters. On
 the
  Cython list we've decided that what we want for CEP 1000 (for boxing
  callbacks etc.) is to extend PyTypeObject with our own fields; we could
  create CEP 1001 to solve this issue and make any Python object an
  exporter
  of block-getter/setter-arrays (better name needed).
 
  What would be exported is (of course) a simple vtable:
 
  typedef struct {
 int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t
 *lower_right,
  ...);
 ...
  } block_getter_setter_array_vtable;
 
  Let's please discuss the details *after* the fundamentals. But the
 reason
  I
  put void* there instead of PyObject* is that I hope this could be used
  beyond the Python world (say, Python-Julia); the void* would be
 handed
  to
  you at the time you receive the vtable (however we handle that).
 
 
  I suppose it would also be useful to have some way of predicting the
  output format polymorphically for the caller. E.g. dense *
  block_diagonal results in block diagonal, but dense + block_diagonal
  results in dense, etc. It might be useful for the caller to know
  whether it needs to allocate a sparse, dense or block-structured
  array. Or maybe the polymorphic function could even do the allocation.
  This needs to happen recursively of course, to avoid intermediate
  temporaries. The compiler could easily handle that, and so could numpy
  when it gets lazy evaluation.
 
 
  Ah. But that depends too on the computation to be performed too; a)
  elementwise, b) axis-wise reductions, c) linear algebra...
 
  In my oomatrix code (please don't look at it, it's shameful) I do this
 using
  multiple dispatch.
 
  I'd rather ignore this for as long as we can, only implementing a[:] =
 ...
  -- I can't see how decisions here would trickle down to the API that's
 used
  in the kernel, it's more like a pre-phase, and better treated
 orthogonally.
 
 
  I think if the heavy lifting of allocating output arrays and exporting
  these arrays work in numpy, then support in Cython could use that (I
  can already hear certain people object to more complicated array stuff
  in Cython :). Even better here would be an external project that each
  our projects could use (I still think the nditer sorting functionality
  of arrays should be numpy-agnostic and externally available).
 
 
  I agree with the separate project idea. It's trivial for NumPy to
  incorporate that as one of its methods for exporting arrays, and I don't
  think it makes sense to either build it into Cython, or outright depend
 on
  NumPy.
 
  Here's what I'd like (working title: NumBridge?).
 
   - Mission: Be the double* + shape + strides in a world where that is
 no
  longer enough, by providing tight, focused APIs/ABIs that are usable
 across
  C/Fortran/Python.
 
  I basically want something I can quickly acquire from a NumPy array, then
  pass it into my C code without dragging along all the cruft that I don't
  need.
 
   - Written in pure C + specs,

[Numpy-discussion] Fancy-indexing reorders output in corner cases?

2012-05-14 Thread Zachary Pincus

Hello all,

The below seems to be a bug, but perhaps it's unavoidably part of the indexing 
mechanism?

It's easiest to show via example... note that using [0,1] to pull two columns 
out of the array gives the same shape as using :2 in the simple case, but 
when there's additional slicing happening, the shapes get transposed or 
something.

In [2]: numpy.version.version # latest git version
Out[2]: '1.7.0.dev-3d4'

In [3]: d = numpy.empty((10, 9, 8, 7))

In [4]: d[:,:,:,[0,1]].shape
Out[4]: (10, 9, 8, 2)

In [5]: d[:,:,:,:2].shape
Out[5]: (10, 9, 8, 2)

In [6]: d[:,0,:,[0,1]].shape
Out[6]: (2, 10, 8)

In [7]: d[:,0,:,:2].shape
Out[7]: (10, 8, 2)

In [8]: d[0,:,:,[0,1]].shape
Out[8]: (2, 9, 8)

In [9]: d[0,:,:,:2].shape
Out[9]: (9, 8, 2)

Oddly, this error can appear/disappear depending on the position of the other 
axis sliced:
In [14]: d = numpy.empty((10, 9, 8))

In [15]: d[:,:,[0,1]].shape
Out[15]: (10, 9, 2)

In [16]: d[:,0,[0,1]].shape
Out[16]: (10, 2)

In [17]: d[0,:,[0,1]].shape
Out[17]: (2, 9)

This cannot be the expected behavior, right?
Zach

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Fancy-indexing reorders output in corner cases?

2012-05-14 Thread Stéfan van der Walt

Hi Zach

On Mon, May 14, 2012 at 4:33 PM, Zachary Pincus zachary.pin...@yale.edu wrote:
 The below seems to be a bug, but perhaps it's unavoidably part of the 
 indexing mechanism?

 It's easiest to show via example... note that using [0,1] to pull two 
 columns out of the array gives the same shape as using :2 in the simple 
 case, but when there's additional slicing happening, the shapes get 
 transposed or something.

When fancy indexing and slicing is mixed, the resulting shape is
essentially unpredictable.  The correct way to do it is to only use
fancy indexing, i.e. generate the indices of the sliced dimension as
well.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Fancy-indexing reorders output in corner cases?

2012-05-14 Thread Zachary Pincus

 On Mon, May 14, 2012 at 4:33 PM, Zachary Pincus zachary.pin...@yale.edu 
 wrote:
 The below seems to be a bug, but perhaps it's unavoidably part of the 
 indexing mechanism?
 
 It's easiest to show via example... note that using [0,1] to pull two 
 columns out of the array gives the same shape as using :2 in the simple 
 case, but when there's additional slicing happening, the shapes get 
 transposed or something.
 
 When fancy indexing and slicing is mixed, the resulting shape is
 essentially unpredictable.

Aah, right -- this does come up on the list not infrequently, doesn't it. I'd 
always thought it was more exotic usages that raised these issues. Good to know.

  The correct way to do it is to only use
 fancy indexing, i.e. generate the indices of the sliced dimension as
 well.
 

Excellent -- thanks!


 Stéfan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Fancy-indexing reorders output in corner cases?

2012-05-14 Thread Travis Oliphant


On May 14, 2012, at 7:07 PM, Stéfan van der Walt wrote:

 Hi Zach
 
 On Mon, May 14, 2012 at 4:33 PM, Zachary Pincus zachary.pin...@yale.edu 
 wrote:
 The below seems to be a bug, but perhaps it's unavoidably part of the 
 indexing mechanism?
 
 It's easiest to show via example... note that using [0,1] to pull two 
 columns out of the array gives the same shape as using :2 in the simple 
 case, but when there's additional slicing happening, the shapes get 
 transposed or something.
 
 When fancy indexing and slicing is mixed, the resulting shape is
 essentially unpredictable.  The correct way to do it is to only use
 fancy indexing, i.e. generate the indices of the sliced dimension as
 well.

This is not quite accurate.   It is not unpredictable.  It is very predictable, 
but a bit (too) complicated in the most general case.  The problem occurs when 
you intermingle fancy indexing with slice notation (and for this purpose 
integer selection is considered fancy-indexing).   While in simple cases you 
can think that [0,1] is equivalent to :2 --- it is not because fancy-indexing 
uses zip-based ideas instead of cross-product based ideas.   

The problem in general is how to make sense of something like

a[:, :, in1, in2]   

If you keep fancy indexing to one side of the slice notation only, then you get 
what you expect.   The shape of the output will be the first two dimensions of 
a + the broadcasted shape of in1 and in2 (where integers are interpreted as 
fancy-index arrays). 

So, let's say a is (10,9,8,7)  and in1 is (3,4) and in2 is (4,)

The shape of the output will be (10,9,3,4) filled with essentially a[:,:,i,j] = 
a[:,:,in1[i,j], in2[j]]

What happens, though when you have

a[:, in1 :, in2]? 

in1 and in2 are broadcasted together to create a two-dimensional sub-space 
that must fit somewhere.   Where should it go?   Should it replace in1 or in2?  
  I.e. should the output be 

(10,3,4,8) or (10,8,3,4).  

To resolve this ambiguity, the code sends the (3,4) sub-space to the front of 
the dimensions and returns (3,4,10,8).   In retro-spect, the code should 
raise an error as I doubt anyone actually relies on this behavior, and then we 
could have done the right thing for situations like in1 being an integer 
which actually makes some sense and should not have been confused with the 
general case  

In this particular case you might also think that we could say the result 
should be (10,3,8,4) but there is no guarantee that the number of dimensions 
that should be appended by the fancy-indexing objects will be the same as the 
number of dimensions replaced.Again, this is how fancy-indexing combines 
with other fancy-indexing objects. 

So, the behavior is actually quite predictable, it's just that in some common 
cases it doesn't do what you would expect --- especially if you think that 
[0,1] is the same as :2.   When I wrote this code to begin with I should have 
raised an error and then worked in the cases that make sense.This is a good 
example of making the mistake of thinking that it's better to provide something 
very general rather than just raise an error when an obvious and clear solution 
is not available.  

There is the possibility that we could now raise an error in NumPy when this 
situation is encountered because I strongly doubt anyone is actually relying on 
the current behavior.I would like to do this, actually, as soon as 
possible.  Comments? 

-Travis




 
 Stéfan
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1

2012-05-14 Thread Paul Anton Letnes

On Mon, May 14, 2012 at 9:47 PM, Ralf Gommers
ralf.gomm...@googlemail.com wrote:


 On Sun, May 13, 2012 at 1:14 PM, Paul Anton Letnes
 paul.anton.let...@gmail.com wrote:

 On Sat, May 12, 2012 at 9:50 PM, Ralf Gommers
 ralf.gomm...@googlemail.com wrote:
 
 
  On Sun, May 6, 2012 at 12:12 AM, Charles R Harris
  charlesr.har...@gmail.com wrote:
 
 
 
  On Sat, May 5, 2012 at 2:56 PM, Paul Anton Letnes
  paul.anton.let...@gmail.com wrote:
 
  Hi,
 
  I'm getting a couple of errors when testing. System:
  Arch Linux (updated today)
  Python 3.2.3
  gcc 4.7.0
  (Anything else?)
 
  I think that this error:
  AssertionError: selectedrealkind(19): expected -1 but got 16
  is due to the fact that newer versions of gfortran actually supports
  precision this high (quad precision).
 
 
  Yes, but it should be fixed. I can't duplicate this here with a fresh
  checkout of the branch.
 
 
  This failure makes no sense to me.
 
  Error comes from this code:
 
      'selectedrealkind(%s): expected %r but got %r' %  (i,
  selected_real_kind(i), selectedrealkind(i)))
 
  So selected_real_kind(19) returns -1.
 
  selected_real_kind is the function
  numpy.f2py.crackfortran._selected_real_kind_func, which is defined as:
 
  def _selected_real_kind_func(p, r=0, radix=0):
      #XXX: This should be processor dependent
      # This is only good for 0 = p = 20
      if p  7: return 4
      if p  16: return 8
      if platform.machine().lower().startswith('power'):
      if p = 20:
      return 16
      else:
      if p  19:
      return 10
      elif p = 20:
      return 16
      return -1
 
  For p=19 this function should always return 16. So the result from
  compiling
  foo.f90 is fine, but the test is broken in a very strange way.
 
  Paul, is the failure reproducible on your machine? If so, can you try to
  debug it?
 
  Ralf

 Hi Ralf.

 The Arch numpy (1.6.1) for python 2.7, installed via pacman (the
 package manager) has this problem.

 After installation of numpy 1.6.2rc1 in a virtualenv, the test passes.
 Maybe the bug was fixed in the RC, and I screwed up which numpy
 version I tested? I'm sorry that I can't find out - I just built a new
 machine, and the old one is lying around the livingroom in pieces. Was
 that particular bit of code changed between 1.6.1 and 1.6.2rc1?


 It was actually, in https://github.com/numpy/numpy/commit/e7f2210e1.

 So you tested 1.6.1 by accident before, and it's working now? Problem solved
 in that case.

 Ralf


Looks like it to me! Sorry for the silly bug report. I'll have to take
more care next time...

Paul
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] histogram2d and histogramdd return counts as floats while histogram returns ints

Re: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

Re: [Numpy-discussion] Missing data wrap-up and request for comments

Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1

Re: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

Re: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

Re: [Numpy-discussion] Correlation code from NumPy 1.5 Beginner's Guide

Re: [Numpy-discussion] Fixing issue of future opaqueness of ndarray this summer

[Numpy-discussion] Fancy-indexing reorders output in corner cases?

Re: [Numpy-discussion] Fancy-indexing reorders output in corner cases?

Re: [Numpy-discussion] Fancy-indexing reorders output in corner cases?

Re: [Numpy-discussion] Fancy-indexing reorders output in corner cases?

Re: [Numpy-discussion] ANN: NumPy 1.6.2 release candidate 1

13 matches

Site Navigation

Mail list logo

Footer information