Re: [Numpy-discussion] NumPy re-factoring project
Den 15.06.2010 18:30, skrev Sturla Molden: > A very radical solution would be to get rid of all C, and go for a > "pure Python" solution. NumPy could build up a text string with OpenCL > code on the fly, and use the OpenCL driver as a "JIT compiler" for > fast array expressions. Most GPUs and CPUs will support OpenCL, and > thus there will be no need for a compiled language like C or Fortran > for fast computation in the near future. Paradoxically, with OpenCL, Python is going to be better for fast numerical computation than C or Fortran, because Python is better at generating an manipulating dynamic text. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
A very radical solution would be to get rid of all C, and go for a "pure Python" solution. NumPy could build up a text string with OpenCL code on the fly, and use the OpenCL driver as a "JIT compiler" for fast array expressions. Most GPUs and CPUs will support OpenCL, and thus there will be no need for a compiled language like C or Fortran for fast computation in the near future. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Sun, 13 Jun 2010 06:54:29 +0200, Sturla Molden wrote: [clip: memory management only in the interface] You forgot views: if memory management is done in the interface layer, it must also make sure that the memory pointed to by a view is never moved around, and not freed before all the views are freed. This probably cannot be done with existing Python objects, since IIRC, they always own their own memory. So one should refactor the memory management out of ndarray on the interface side. The core probably also needs to decide when to create views (indexing), so a callback to the interface is needed. > Second: If we cannot figure out how much to allocate before starting to > use C (very unlikely), The point is that while you in principle can figure it out, you don't *want to* do this work in the interface layer. Computing the output size is just code duplication. > we can always use a callback to Python. Or we can This essentially amounts to having a NpyArray_resize in the interface layer, which the core can call. Here, however, one has the requirement that all arrays needed by core are always passed in to it, including temporary arrays. Sounds a bit like Fortran 77 -- it's not going to be bad, if the number of temporaries is not very large. If the core needs to allocate arrays by itself, this will unevitably lead to memory management that cannot avoided by having the allocation done in the interface layer, as the garbage collector must not mistake a temporary array referenced only by the core as unused. An alternative here is to have an allocation routine NpyArray_new_tmp NpyArray_del_tmp whose allocated memory is released automatically, if left dangling, after the call to the core is completed. This would be useful for temporary arrays, although then one would have to be careful not ever to return memory allocated by this back to the caller. > have two C functions, the first determining the amount of allocation, > the second doing the computation. That sounds like a PITA, think about LAPACK. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Den 13.06.2010 05:47, skrev David Cournapeau: > > This only works in simple cases. What do you do when you don't know > the output size ? First: If you don't know, you don't know. Then you're screwed and C is not going to help. Second: If we cannot figure out how much to allocate before starting to use C (very unlikely), we can always use a callback to Python. Or we can have two C functions, the first determining the amount of allocation, the second doing the computation. > How do you deal with reallocation ? That is platform dependent. A buffer object could e.g. have a realloc method. > How do you deal > with ufunc and broadcasting buffering without allocating in the > library ? We allocate whatever we need on the Python side, possibly with a callback to Python if that's what we need. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Sun, Jun 13, 2010 at 11:39 AM, Sturla Molden wrote: > If NumPy does not allocate memory on it's own, there will be no leaks > due to errors in NumPy. > > There is still work to do in the core, i.e. the computational loops in > array operators, broadcasting, ufuncs, copying data between buffers, etc. > > The C functions in the core would then be called with the output array > already allocated. This only works in simple cases. What do you do when you don't know the output size ? How do you deal with reallocation ? How do you deal with ufunc and broadcasting buffering without allocating in the library ? I don't see how that's possible. Before calling things stupid, you better have a solution to those issues, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Den 13.06.2010 02:39, skrev David Cournapeau: > > But the point is to get rid of the python dependency, and if you don't > allow any api call to allocate memory, there is not much left to > implement in the core. > > Memory allocation is platform dependent. A CPython version could use bytearray, other platforms could e.g. use arrays from a gc managed heap (IronPython, Jython). Because memory management is platform dependent, it does not naturally belong in the core. Having ndarrays allocate buffers with malloc() and "own" buffers they've allocated complicate things terribly. The purpose of "ownership" is to know which ndarray is to call free() on the buffer. Why go through all that pain? It's just a duplication of Python's garbage collection. Re-inventing the wheel is stupid. Let buffers be Python objects that clean themselves up. If NumPy does not allocate memory on it's own, there will be no leaks due to errors in NumPy. There is still work to do in the core, i.e. the computational loops in array operators, broadcasting, ufuncs, copying data between buffers, etc. The C functions in the core would then be called with the output array already allocated. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Sun, Jun 13, 2010 at 2:00 AM, Sturla Molden wrote: > Den 12.06.2010 15:57, skrev David Cournapeau: >> Anything non trivial will require memory allocation and object >> ownership conventions. If the goal is interoperation with other >> languages and vm, you may want to use something else than plain >> malloc, to interact better with the allocation strategies of the host >> platform (reference counting, garbage collection). >> >> > > We can allocate memory on the Python side e.g. using Python's bytearray, > ctypes array, or a Python string. > > We don't need the convenstion of "ownership" if we use a Python object > to store the data buffer (e.g. bytearray). It owns itself, and commit > suicide when garbage collected. But the point is to get rid of the python dependency, and if you don't allow any api call to allocate memory, there is not much left to implement in the core. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Sat, Jun 12, 2010 at 2:56 PM, Dag Sverre Seljebotn < da...@student.matnat.uio.no> wrote: > Charles Harris wrote: > > On Sat, Jun 12, 2010 at 11:38 AM, Dag Sverre Seljebotn < > > da...@student.matnat.uio.no> wrote: > > > >> Christopher Barker wrote: > >> > David Cournapeau wrote: > >> >>> In the core C numpy library there would be new "numpy_array" struct > >> >>> with attributes > >> >>> > >> >>> numpy_array->buffer > >> > > >> >> Anything non trivial will require memory allocation and object > >> >> ownership conventions. > >> > > >> > I totally agree -- I've been thinking for a while about a core array > >> > data structure that you could use from C/C++, and would interact well > >> > with numpy and Python -- it would be even better if it WAS numpy. > >> > > >> > I was thinking that at the root of it would be a "data_block" object > >> > (the buffer in the above), that would have a reference counting > >> system. > >> > It would be its own system, but hopefully be able to link to Python's > >> > easily when used with Python. > >> > >> I think taking PEP 3118, strip out the Python-specific things, and then > >> add memory management conventions, would be a good starting point. > >> > >> Simply a simple header file/struct definition and specification, which > >> could in time become a de facto way of exporting multidimensional array > >> data between C libraries, between Fortran and C and so on (Kurt Smith's > >> fwrap could easily be adapted to support it). The C-NumPy would then be > >> a > >> library on top of this spec (mainly ufuncs operating on such structs). > >> > >> The memory management conventions needs some thought, as you say, > >> because > >> of slices -- but a central memory allocator is not good enough because > >> one > >> would often be accessing memory that's allocated with other purposes in > >> mind (and should not be deallocated, or deallocated in a special way). > >> So > >> refcounts + deallocator callback seems reasonable. > >> > >> (Not that I'm involved in this, just my 2 cents.) > >> > >> > > This is more the way I see things, except I would divide the bottom layer > > into two parts, views and memory. The memory can come from many places -- > > memmaps, user supplied buffers, etc. -- but we should provide a simple > > reference counted allocator for the default. The views correspond more to > > PEP 3118 and simply provide data types, dimensions, and strides, much as > > arrays do now. However, I would confine the data types to those available > > in > > C with a bit extra information as to precision, because. Object arrays > > would be a special case of pointer arrays (void pointer arrays?) and > > structured arrays/Unicode might be a special case of char arrays. The > more > > complicated dtypes would then be built on top of those. Some things just > > won't be portable, pointers in particular, but such is life. > > > > As to languages, I think we should stay with C. C++ has much to offer for > > this sort of thing but would be quite a big jump and maybe not as > > universal > > as C. FORTRAN is harder to come by than C and older versions didn't have > > such things as unsigned integers. I really haven't used FORTRAN since the > > 77 > > version, so haven't much idea what the modern version looks like, but I > do > > suspect we have more C programmers than FORTRAN programmers, and adding a > > language translation on top of a design refactoring is just going to > > complicate things. > > I'm not sure how Fortran got into this, but if it was from what I wrote: I > certainly didn't suggest any part of NumPy is written in Fortran. Sorry > for causing confusion. > > It was a suggestion by someone else way back in the beginning. I think changing languages is a great way to get side tracked and spend the next year just hacking about. > What I meant: If there's an "ndarray.h" which basically just contains the > minimum C version of PEP 3118 for passing around and accessing arrays. > Without any runtime dependencies -- of course, writing code using it will > be much easier when using the C-NumPy runtime library, but for simply > exporting data from an existing library one could do without the runtime > dependency. > > Well, that would be just one bit. If we want to offer a *functioning* system we have to implement quite a bit more. > (To be more precise about fwrap: In fwrap there's a need to pass > multidimensional, flexible-size array data back and forth between > (autogenerated) Fortran and (autogenerated) C. It currently defines its > own structs (or extra arguments, but boils down to the same thing...) for > this purpose, but if an "ndarray.h" could create a standard for passing > array data then that would be a natural choice instead.) > > Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Charles Harris wrote: > On Sat, Jun 12, 2010 at 11:38 AM, Dag Sverre Seljebotn < > da...@student.matnat.uio.no> wrote: > >> Christopher Barker wrote: >> > David Cournapeau wrote: >> >>> In the core C numpy library there would be new "numpy_array" struct >> >>> with attributes >> >>> >> >>> numpy_array->buffer >> > >> >> Anything non trivial will require memory allocation and object >> >> ownership conventions. >> > >> > I totally agree -- I've been thinking for a while about a core array >> > data structure that you could use from C/C++, and would interact well >> > with numpy and Python -- it would be even better if it WAS numpy. >> > >> > I was thinking that at the root of it would be a "data_block" object >> > (the buffer in the above), that would have a reference counting >> system. >> > It would be its own system, but hopefully be able to link to Python's >> > easily when used with Python. >> >> I think taking PEP 3118, strip out the Python-specific things, and then >> add memory management conventions, would be a good starting point. >> >> Simply a simple header file/struct definition and specification, which >> could in time become a de facto way of exporting multidimensional array >> data between C libraries, between Fortran and C and so on (Kurt Smith's >> fwrap could easily be adapted to support it). The C-NumPy would then be >> a >> library on top of this spec (mainly ufuncs operating on such structs). >> >> The memory management conventions needs some thought, as you say, >> because >> of slices -- but a central memory allocator is not good enough because >> one >> would often be accessing memory that's allocated with other purposes in >> mind (and should not be deallocated, or deallocated in a special way). >> So >> refcounts + deallocator callback seems reasonable. >> >> (Not that I'm involved in this, just my 2 cents.) >> >> > This is more the way I see things, except I would divide the bottom layer > into two parts, views and memory. The memory can come from many places -- > memmaps, user supplied buffers, etc. -- but we should provide a simple > reference counted allocator for the default. The views correspond more to > PEP 3118 and simply provide data types, dimensions, and strides, much as > arrays do now. However, I would confine the data types to those available > in > C with a bit extra information as to precision, because. Object arrays > would be a special case of pointer arrays (void pointer arrays?) and > structured arrays/Unicode might be a special case of char arrays. The more > complicated dtypes would then be built on top of those. Some things just > won't be portable, pointers in particular, but such is life. > > As to languages, I think we should stay with C. C++ has much to offer for > this sort of thing but would be quite a big jump and maybe not as > universal > as C. FORTRAN is harder to come by than C and older versions didn't have > such things as unsigned integers. I really haven't used FORTRAN since the > 77 > version, so haven't much idea what the modern version looks like, but I do > suspect we have more C programmers than FORTRAN programmers, and adding a > language translation on top of a design refactoring is just going to > complicate things. I'm not sure how Fortran got into this, but if it was from what I wrote: I certainly didn't suggest any part of NumPy is written in Fortran. Sorry for causing confusion. What I meant: If there's an "ndarray.h" which basically just contains the minimum C version of PEP 3118 for passing around and accessing arrays. Without any runtime dependencies -- of course, writing code using it will be much easier when using the C-NumPy runtime library, but for simply exporting data from an existing library one could do without the runtime dependency. (To be more precise about fwrap: In fwrap there's a need to pass multidimensional, flexible-size array data back and forth between (autogenerated) Fortran and (autogenerated) C. It currently defines its own structs (or extra arguments, but boils down to the same thing...) for this purpose, but if an "ndarray.h" could create a standard for passing array data then that would be a natural choice instead.) Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Sat, Jun 12, 2010 at 3:57 PM, David Cournapeau wrote: > On Sat, Jun 12, 2010 at 10:27 PM, Sebastian Walter > wrote: >> On Thu, Jun 10, 2010 at 6:48 PM, Sturla Molden wrote: >>> >>> I have a few radical suggestions: >>> >>> 1. Use ctypes as glue to the core DLL, so we can completely forget about >>> refcounts and similar mess. Why put manual reference counting and error >>> handling in the core? It's stupid. >> >> I totally agree, I thought that the refactoring was supposed to provide >> simple data structures and simple algorithms to perform the C equivalents of >> sin,cos,exp, dot, +,-,*,/, dot, inv, ... >> >> Let me explain at an example what I expected: >> >> In the core C numpy library there would be new "numpy_array" struct >> with attributes >> >> numpy_array->buffer >> numpy_array->dtype >> numpy_array->ndim >> numpy_array->shape >> numpy_array->strides >> numpy_array->owndata >> etc. >> >> that replaces the current PyArrayObject which contains Python C API stuff: >> >> typedef struct PyArrayObject { >> PyObject_HEAD >> char *data; /* pointer to raw data buffer */ >> int nd; /* number of dimensions, also called ndim */ >> npy_intp *dimensions; /* size in each dimension */ >> npy_intp *strides; /* bytes to jump to get to the >> next element in each dimension */ >> PyObject *base; /* This object should be decref'd >> upon deletion of array */ >> /* For views it points to the original array >> */ >> /* For creation from buffer object it points >> to an object that shold be decref'd on >> deletion */ >> /* For UPDATEIFCOPY flag this is an array >> to-be-updated upon deletion of this one */ >> PyArray_Descr *descr; /* Pointer to type structure */ >> int flags; /* Flags describing array -- see below*/ >> PyObject *weakreflist; /* For weakreferences */ >> void *buffer_info; /* Data used by the buffer interface */ >> } PyArrayObject; >> >> >> >> Example: >> -- >> >> If one calls the following Python code >> x = numpy.zeros((N,M,K), dtype=float) >> the memory allocation would be done on the Python side. >> >> Calling a ufunc like >> y = numpy.sin(x) >> would first allocate the memory for y on the Python side >> and then call a C function a la >> numpy_unary_ufunc( double (*fcn_ptr)(double), numpy_array *x, numpy_array *y) >> >> If y is already allocated, one would call >> y = numpy.sin(x, out = y) >> >> Similarly z = x*y >> would first allocate the memory for z and then call a C function a la >> numpy_binary_ufunc( double (*fcn_ptr)(double, double), numpy_array *x, >> numpy_array *y, numpy_array *z) >> >> >> similarly other functions like dot: >> z = dot(x,y, out = z) >> >> would simply call a C function a la >> numpy_dot( numpy_array *x, numpy_array *y, numpy_array *z) >> >> >> If one wants to use numpy functions on the C side only, one would use >> the numpy_array struct manually. >> I.e. one has to do the memory management oneself in C. Which is >> perfectly ok since one is just interested in using >> the algorithms. > > Anything non trivial will require memory allocation and object > ownership conventions. If the goal is interoperation with other > languages and vm, you may want to use something else than plain > malloc, to interact better with the allocation strategies of the host > platform (reference counting, garbage collection). I'm just saying that the "host platform" could do the memory management and not libnumpy. I.e. libnumpy could be just a collection of algorithms. Reimplementing half of the Python C API somehow doesn't feel right to me. Those users who like to use C++ could write a class with methods that internally call the libnumpy functions: -- example code - class Array{ numpy_array *_array; public: const Array operator+(Array &rhs) const { Array retval( ... arguments for the right type and dimensions of the output...); numpy_add((*this)->_array, rhs->_array, retval->_array); return retval; } }; -- end code - I.e. let C++ do all the memory management and type inference but the numpy core C API does the number crunching. In other languages (Python, Ruby, R, whatever) one would implement a similar class. I cannot speak for others, but something about these lines is what I'd love to see since it would make it relatively easy to use numpy functionality even in existing C/C++/R/Ruby codes. Sebastian > > >> The only reason I see for C++ is the possibility to use meta programming >> which >> is very ill-designed. I'd rather like to see some simple code >> preprocessing on C code than >> C+
Re: [Numpy-discussion] NumPy re-factoring project
If I could, I would like to throw out another possible feature that might need to be taken into consideration for designing the implementation of numpy arrays. One thing I found somewhat lacking -- if that is the right term -- is a way to convolve a numpy array with an arbitrary windowing element. I managed to do some fancy indexing approaches, but they never did seem efficient. So -- if it is at all possible -- I would like to see a design that not only vectorizes well, but could also be used be used efficiently in arbitrary (or "fancy") indexing schemes as well. Maybe what I am asking is technically infeasible... I don't know. Just something to consider. I'll get back to doing documentation... Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
2010/6/12 Charles R Harris > > This is more the way I see things, except I would divide the bottom layer > into two parts, views and memory. The memory can come from many places -- > memmaps, user supplied buffers, etc. -- but we should provide a simple > reference counted allocator for the default. The views correspond more to > PEP 3118 and simply provide data types, dimensions, and strides, much as > arrays do now. However, I would confine the data types to those available in > C with a bit extra information as to precision, because. Object arrays > would be a special case of pointer arrays (void pointer arrays?) and > structured arrays/Unicode might be a special case of char arrays. The more > complicated dtypes would then be built on top of those. Some things just > won't be portable, pointers in particular, but such is life. > Well, if "data block" of ndarray is going to be refactored that much, it might be an interesting idea if it would support transparent compression. As I see the things, it is clear that the gap between CPU power and memory bandwith is widening and that trend will continue for long time. This means that having the possibility to deal with compressed data transparently, would impact not only in a less memory consumption, but also in improved memory access. Getting improved memory access by using compression may sound a bit crazy, but it is not. For example, the Blosc [1] project (which is undergoing the latests testing steps before becoming stable) is showing signs that, by using multimedia extensions in CPUs and multi-threading techniques, this is becoming possible (as shown in [2]). Of course, this optimization may have not much sense for accelerating computations in NumPy if it cannot get rid of the temporary variables problem (which I hope this can be resolved some day), but still, it could be useful improving performance with other features (like copying, broadcasting or performing selections more efficiently). [1] http://blosc.pytables.org/ [2] http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks Just another wish into the bag ;-) -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Sat, Jun 12, 2010 at 1:35 PM, Charles R Harris wrote: > > > On Sat, Jun 12, 2010 at 11:38 AM, Dag Sverre Seljebotn < > da...@student.matnat.uio.no> wrote: > >> Christopher Barker wrote: >> > David Cournapeau wrote: >> >>> In the core C numpy library there would be new "numpy_array" struct >> >>> with attributes >> >>> >> >>> numpy_array->buffer >> > >> >> Anything non trivial will require memory allocation and object >> >> ownership conventions. >> > >> > I totally agree -- I've been thinking for a while about a core array >> > data structure that you could use from C/C++, and would interact well >> > with numpy and Python -- it would be even better if it WAS numpy. >> > >> > I was thinking that at the root of it would be a "data_block" object >> > (the buffer in the above), that would have a reference counting system. >> > It would be its own system, but hopefully be able to link to Python's >> > easily when used with Python. >> >> I think taking PEP 3118, strip out the Python-specific things, and then >> add memory management conventions, would be a good starting point. >> >> Simply a simple header file/struct definition and specification, which >> could in time become a de facto way of exporting multidimensional array >> data between C libraries, between Fortran and C and so on (Kurt Smith's >> fwrap could easily be adapted to support it). The C-NumPy would then be a >> library on top of this spec (mainly ufuncs operating on such structs). >> >> The memory management conventions needs some thought, as you say, because >> of slices -- but a central memory allocator is not good enough because one >> would often be accessing memory that's allocated with other purposes in >> mind (and should not be deallocated, or deallocated in a special way). So >> refcounts + deallocator callback seems reasonable. >> >> (Not that I'm involved in this, just my 2 cents.) >> >> > This is more the way I see things, except I would divide the bottom layer > into two parts, views and memory. The memory can come from many places -- > memmaps, user supplied buffers, etc. -- but we should provide a simple > reference counted allocator for the default. The views correspond more to > PEP 3118 and simply provide data types, dimensions, and strides, much as > arrays do now. However, I would confine the data types to those available in > C with a bit extra information as to precision, because. Object arrays > would be a special case of pointer arrays (void pointer arrays?) and > structured arrays/Unicode might be a special case of char arrays. The more > complicated dtypes would then be built on top of those. Some things just > won't be portable, pointers in particular, but such is life. > > As to languages, I think we should stay with C. C++ has much to offer for > this sort of thing but would be quite a big jump and maybe not as universal > as C. FORTRAN is harder to come by than C and older versions didn't have > such things as unsigned integers. I really haven't used FORTRAN since the 77 > version, so haven't much idea what the modern version looks like, but I do > suspect we have more C programmers than FORTRAN programmers, and adding a > language translation on top of a design refactoring is just going to > complicate things. > > Oh, and we should have iterators for the views. So the base would be memory + views + iterators. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Sat, Jun 12, 2010 at 11:38 AM, Dag Sverre Seljebotn < da...@student.matnat.uio.no> wrote: > Christopher Barker wrote: > > David Cournapeau wrote: > >>> In the core C numpy library there would be new "numpy_array" struct > >>> with attributes > >>> > >>> numpy_array->buffer > > > >> Anything non trivial will require memory allocation and object > >> ownership conventions. > > > > I totally agree -- I've been thinking for a while about a core array > > data structure that you could use from C/C++, and would interact well > > with numpy and Python -- it would be even better if it WAS numpy. > > > > I was thinking that at the root of it would be a "data_block" object > > (the buffer in the above), that would have a reference counting system. > > It would be its own system, but hopefully be able to link to Python's > > easily when used with Python. > > I think taking PEP 3118, strip out the Python-specific things, and then > add memory management conventions, would be a good starting point. > > Simply a simple header file/struct definition and specification, which > could in time become a de facto way of exporting multidimensional array > data between C libraries, between Fortran and C and so on (Kurt Smith's > fwrap could easily be adapted to support it). The C-NumPy would then be a > library on top of this spec (mainly ufuncs operating on such structs). > > The memory management conventions needs some thought, as you say, because > of slices -- but a central memory allocator is not good enough because one > would often be accessing memory that's allocated with other purposes in > mind (and should not be deallocated, or deallocated in a special way). So > refcounts + deallocator callback seems reasonable. > > (Not that I'm involved in this, just my 2 cents.) > > This is more the way I see things, except I would divide the bottom layer into two parts, views and memory. The memory can come from many places -- memmaps, user supplied buffers, etc. -- but we should provide a simple reference counted allocator for the default. The views correspond more to PEP 3118 and simply provide data types, dimensions, and strides, much as arrays do now. However, I would confine the data types to those available in C with a bit extra information as to precision, because. Object arrays would be a special case of pointer arrays (void pointer arrays?) and structured arrays/Unicode might be a special case of char arrays. The more complicated dtypes would then be built on top of those. Some things just won't be portable, pointers in particular, but such is life. As to languages, I think we should stay with C. C++ has much to offer for this sort of thing but would be quite a big jump and maybe not as universal as C. FORTRAN is harder to come by than C and older versions didn't have such things as unsigned integers. I really haven't used FORTRAN since the 77 version, so haven't much idea what the modern version looks like, but I do suspect we have more C programmers than FORTRAN programmers, and adding a language translation on top of a design refactoring is just going to complicate things. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Christopher Barker wrote: > David Cournapeau wrote: >>> In the core C numpy library there would be new "numpy_array" struct >>> with attributes >>> >>> numpy_array->buffer > >> Anything non trivial will require memory allocation and object >> ownership conventions. > > I totally agree -- I've been thinking for a while about a core array > data structure that you could use from C/C++, and would interact well > with numpy and Python -- it would be even better if it WAS numpy. > > I was thinking that at the root of it would be a "data_block" object > (the buffer in the above), that would have a reference counting system. > It would be its own system, but hopefully be able to link to Python's > easily when used with Python. I think taking PEP 3118, strip out the Python-specific things, and then add memory management conventions, would be a good starting point. Simply a simple header file/struct definition and specification, which could in time become a de facto way of exporting multidimensional array data between C libraries, between Fortran and C and so on (Kurt Smith's fwrap could easily be adapted to support it). The C-NumPy would then be a library on top of this spec (mainly ufuncs operating on such structs). The memory management conventions needs some thought, as you say, because of slices -- but a central memory allocator is not good enough because one would often be accessing memory that's allocated with other purposes in mind (and should not be deallocated, or deallocated in a special way). So refcounts + deallocator callback seems reasonable. (Not that I'm involved in this, just my 2 cents.) Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
David Cournapeau wrote: >> In the core C numpy library there would be new "numpy_array" struct >> with attributes >> >> numpy_array->buffer > Anything non trivial will require memory allocation and object > ownership conventions. I totally agree -- I've been thinking for a while about a core array data structure that you could use from C/C++, and would interact well with numpy and Python -- it would be even better if it WAS numpy. I was thinking that at the root of it would be a "data_block" object (the buffer in the above), that would have a reference counting system. It would be its own system, but hopefully be able to link to Python's easily when used with Python. we're talking C here, so trying to have a full fledged memory management would be practically inventing another language, but if the just the data blocks of memory could be managed, that would allow things like sub-arrays, and slices, and views to work well. I as partly inspired by how useless C++ std::valarrays appeared to be -- a slice of a valarray is not the same as a valarray, because the original valrray is managing the memory (and I may have that wrong) -- what this said to me is that there needs to be a central system managing the memory blocks, while the user code can manage the higher level objects. If the C core were left with absolutely no memory management, then I think we'd have a next to useless system in raw C. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Den 12.06.2010 15:57, skrev David Cournapeau: > Anything non trivial will require memory allocation and object > ownership conventions. If the goal is interoperation with other > languages and vm, you may want to use something else than plain > malloc, to interact better with the allocation strategies of the host > platform (reference counting, garbage collection). > > We can allocate memory on the Python side e.g. using Python's bytearray, ctypes array, or a Python string. We don't need the convenstion of "ownership" if we use a Python object to store the data buffer (e.g. bytearray). It owns itself, and commit suicide when garbage collected. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Sat, Jun 12, 2010 at 10:27 PM, Sebastian Walter wrote: > On Thu, Jun 10, 2010 at 6:48 PM, Sturla Molden wrote: >> >> I have a few radical suggestions: >> >> 1. Use ctypes as glue to the core DLL, so we can completely forget about >> refcounts and similar mess. Why put manual reference counting and error >> handling in the core? It's stupid. > > I totally agree, I thought that the refactoring was supposed to provide > simple data structures and simple algorithms to perform the C equivalents of > sin,cos,exp, dot, +,-,*,/, dot, inv, ... > > Let me explain at an example what I expected: > > In the core C numpy library there would be new "numpy_array" struct > with attributes > > numpy_array->buffer > numpy_array->dtype > numpy_array->ndim > numpy_array->shape > numpy_array->strides > numpy_array->owndata > etc. > > that replaces the current PyArrayObject which contains Python C API stuff: > > typedef struct PyArrayObject { > PyObject_HEAD > char *data; /* pointer to raw data buffer */ > int nd; /* number of dimensions, also called ndim */ > npy_intp *dimensions; /* size in each dimension */ > npy_intp *strides; /* bytes to jump to get to the > next element in each dimension */ > PyObject *base; /* This object should be decref'd > upon deletion of array */ > /* For views it points to the original array */ > /* For creation from buffer object it points > to an object that shold be decref'd on > deletion */ > /* For UPDATEIFCOPY flag this is an array > to-be-updated upon deletion of this one */ > PyArray_Descr *descr; /* Pointer to type structure */ > int flags; /* Flags describing array -- see below*/ > PyObject *weakreflist; /* For weakreferences */ > void *buffer_info; /* Data used by the buffer interface */ > } PyArrayObject; > > > > Example: > -- > > If one calls the following Python code > x = numpy.zeros((N,M,K), dtype=float) > the memory allocation would be done on the Python side. > > Calling a ufunc like > y = numpy.sin(x) > would first allocate the memory for y on the Python side > and then call a C function a la > numpy_unary_ufunc( double (*fcn_ptr)(double), numpy_array *x, numpy_array *y) > > If y is already allocated, one would call > y = numpy.sin(x, out = y) > > Similarly z = x*y > would first allocate the memory for z and then call a C function a la > numpy_binary_ufunc( double (*fcn_ptr)(double, double), numpy_array *x, > numpy_array *y, numpy_array *z) > > > similarly other functions like dot: > z = dot(x,y, out = z) > > would simply call a C function a la > numpy_dot( numpy_array *x, numpy_array *y, numpy_array *z) > > > If one wants to use numpy functions on the C side only, one would use > the numpy_array struct manually. > I.e. one has to do the memory management oneself in C. Which is > perfectly ok since one is just interested in using > the algorithms. Anything non trivial will require memory allocation and object ownership conventions. If the goal is interoperation with other languages and vm, you may want to use something else than plain malloc, to interact better with the allocation strategies of the host platform (reference counting, garbage collection). > The only reason I see for C++ is the possibility to use meta programming which > is very ill-designed. I'd rather like to see some simple code > preprocessing on C code than > C++ template meta programming. I don't think anyone is seriously considering changing languages. Especially if interoperation is desired, C is miles ahead of C++ anyway. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Thu, Jun 10, 2010 at 6:48 PM, Sturla Molden wrote: > > I have a few radical suggestions: > > 1. Use ctypes as glue to the core DLL, so we can completely forget about > refcounts and similar mess. Why put manual reference counting and error > handling in the core? It's stupid. I totally agree, I thought that the refactoring was supposed to provide simple data structures and simple algorithms to perform the C equivalents of sin,cos,exp, dot, +,-,*,/, dot, inv, ... Let me explain at an example what I expected: In the core C numpy library there would be new "numpy_array" struct with attributes numpy_array->buffer numpy_array->dtype numpy_array->ndim numpy_array->shape numpy_array->strides numpy_array->owndata etc. that replaces the current PyArrayObject which contains Python C API stuff: typedef struct PyArrayObject { PyObject_HEAD char *data; /* pointer to raw data buffer */ int nd; /* number of dimensions, also called ndim */ npy_intp *dimensions; /* size in each dimension */ npy_intp *strides; /* bytes to jump to get to the next element in each dimension */ PyObject *base; /* This object should be decref'd upon deletion of array */ /* For views it points to the original array */ /* For creation from buffer object it points to an object that shold be decref'd on deletion */ /* For UPDATEIFCOPY flag this is an array to-be-updated upon deletion of this one */ PyArray_Descr *descr; /* Pointer to type structure */ int flags; /* Flags describing array -- see below*/ PyObject *weakreflist; /* For weakreferences */ void *buffer_info; /* Data used by the buffer interface */ } PyArrayObject; Example: -- If one calls the following Python code x = numpy.zeros((N,M,K), dtype=float) the memory allocation would be done on the Python side. Calling a ufunc like y = numpy.sin(x) would first allocate the memory for y on the Python side and then call a C function a la numpy_unary_ufunc( double (*fcn_ptr)(double), numpy_array *x, numpy_array *y) If y is already allocated, one would call y = numpy.sin(x, out = y) Similarly z = x*y would first allocate the memory for z and then call a C function a la numpy_binary_ufunc( double (*fcn_ptr)(double, double), numpy_array *x, numpy_array *y, numpy_array *z) similarly other functions like dot: z = dot(x,y, out = z) would simply call a C function a la numpy_dot( numpy_array *x, numpy_array *y, numpy_array *z) If one wants to use numpy functions on the C side only, one would use the numpy_array struct manually. I.e. one has to do the memory management oneself in C. Which is perfectly ok since one is just interested in using the algorithms. > > 2. The core should be a plain DLL, loadable with ctypes. (I know David > Cournapeau and Robert Kern is going to hate this.) But if Python can have a > custom loader for .pyd files, so can NumPy for it's core DLL. For ctypes we > just need to specify a fully qualified path to the DLL, which can be read > from a config file or whatever. That would probably the easiest way. > > 3. ctypes will take care of all the mess regarding the GIL. Again there is > no need to re-invent the wheel. ctypes.CDLL releases the GIL when calling > into C, and re-acquires before making callbacks to Python. ctypes.PyDLL > keeps the GIL for legacy library code that are not threadsafe. I have not much experience with parallelizing codes. If one implements the algorithms side effect free it should be quite easy to parallelize them on the C level, not? > > ctypes will also make porting to other Python implementations easier (or > even other languages: Ruby, JacaScript) easier. Not to mention that it will > make NumPy impervious to changes in the Python C API. > > 4. Write the core in C++ or Fortran (95/2003), not C. ANSI C (aka C89). Use > std::vector<> instead of malloc/free for C++, and possibly templates for > generics on various dtypes. The only reason I see for C++ is the possibility to use meta programming which is very ill-designed. I'd rather like to see some simple code preprocessing on C code than C++ template meta programming. And it should be possible to avoid mallocs in the C code, not? > > 5. Allow OpenMP pragmas in the core. If arrays are above a certain size, it > should switch to multi-threading. > > Sturla Just my 2 cents, Sebastian > > > > > > > Den 10.06.2010 15:26, skrev Charles R Harris: > > A few thoughts came to mind while reading the initial writeup. > > 1) How is the GIL handled in the callbacks. > 2) What about error handling? That is tricky to get right, especially in C > and wit
Re: [Numpy-discussion] NumPy re-factoring project
Fri, 11 Jun 2010 15:31:45 +0200, Sturla Molden wrote: [clip] >> The innermost dimension is handled via the ufunc loop, which is a >> simple for loop with constant-size step and is given a number of >> iterations. The array iterator objects are used only for stepping >> through the outer dimensions. That is, it essentially steps through >> your dtype** array, without explicitly constructing it. > > Yes, exactly my point. And because the iterator does not explicitely > construct the array, it sucks for parallel programming (e.g. with > OpenMP): > > - The iterator becomes a bottleneck to which access must be serialized > with a mutex. > - We cannot do proper work scheduling (load balancing) I don't necessarily agree: you can do for parallelized outer loop { critical section { p = get iterator pointer ++iterator } inner loop in region `p` } This does allow load balancing etc., as a free processor can immediately grab the next available slice. Also, it would be easier to implement with OpenMP pragmas in the current code base. Of course, the assumption here is that the outer iterator overhead is small compared to the duration of the inner loop. This must then be compared to the memory access overhead involved in the dtype** array. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Den 11.06.2010 17:17, skrev Anne Archibald: > > On the other hand, since memory reads are very slow, optimizations > that do more calculation per load/store could make a very big > difference, eliminating temporaries as a side effect. > Yes, that's the main issue, not the extra memory they use. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Sturla Molden wrote: > Den 11.06.2010 09:14, skrev Sebastien Binet: >> it of course depends on the granularity at which you wrap and use >> numpy-core but tight loops calling ctypes ain't gonna be pretty >> performance-wise. >> > > Tight loops in Python are never pretty. > > The purpose of vectorization with NumPy is to avoid tight Python loops. > >> I really think using cython would be a better option. >> one'd get the python2<-> python3 transition for "free". >> >> > > Perhaps. Cython has nice syntax for PEP 3118 buffers. But there are > downsides to this: IMO I think the syntax is not useful in this setting. > - Cython is not C. We want the core to be a C library independent of Python. Cython could fill the role you propose that ctypes takes. The advantage is a) (IMO) nicer syntax...well, at least can be less verbose than ctypes b) Avoids having to structure the whole API around what can be done in Python+ctypes. A pure ctypes+pure C solution would have a tendency to push stuff that logically belongs Python-module-side into pure C side, which could be a very bad thing. c) Possible to iteratively move more and more from C-extension/Cython over to C side, rather than a full rewrite in ctypes + C. There's disadvantages re compilation, but see below. I believe this is rather long term though. Stuff like this has to be done iteratively; one must not rewrite from scratch. > - Cython depends on CPython. > - Compiling Cython generated C takes time. > - Linkage can be a PITA (I use 64-bit Python now, and import libraries > for gcc are missing.) Well, if you can't link CPython extension modules you're going to have major problem with about every single scientific Python library out there. Unless you actually tackle the root of the problem, rewriting NumPy with ctypes won't help -- you also need to rewrite matplotlib, mayavi, PyTables, SciPy, mpi4py etc. etc. etc. etc. in ctypes + pure C without the CPython API. In other words, I don't think this is a very reasonable argument. (Well, perhaps NumPy is in some sense more "basic" than these others.) Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On 11 June 2010 11:12, Benjamin Root wrote: > > > On Fri, Jun 11, 2010 at 8:31 AM, Sturla Molden wrote: >> >> >> It would also make sence to evaluate expressions like "y = b*x + a" >> without a temporary array for b*x. I know roughly how to do it, but >> don't have time to look at it before next year. (Yes I know about >> numexpr, I am talking about plain Python code.) >> >> >> Sturla >> >> > > If I may chime in here with my own experience with NumPy code... > > I typically use older, "weaker" computers for my work. I am not doing > real-time modeling or some other really advanced, difficult computations. > For me, NumPy works "fast enough", even on an EeePC. My main issue is the > one given above by Sturla. I find that NumPy's memory usage can go > out-of-control very easily in long mathematical expressions. With a mix of > constants and large matricies, each step in the order of operations seems to > take up more memory. Often, I would run into a major slow-down from > trashing the swap. This is fairly trivial to get around by operating over > slices of the matrices at a time, but -- to me -- all of this talk about > optimizing the speed of the operations without addressing the temporaries > issue is like trying to tune-up the engine of a car without bothering to > take the lead weights out of the trunk. I should say, though, that I've gone through the process of removing all temporary allocation using ufunc output arguments (np.add(a,b,c)) only to discover that it didn't actually save any memory and it was slower. The nice thing about temporaries is that they're, well, temporary; they go away. On the other hand, since memory reads are very slow, optimizations that do more calculation per load/store could make a very big difference, eliminating temporaries as a side effect. Anne > Just my 2 cents. > > Ben Root > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Fri, Jun 11, 2010 at 8:31 AM, Sturla Molden wrote: > > > It would also make sence to evaluate expressions like "y = b*x + a" > without a temporary array for b*x. I know roughly how to do it, but > don't have time to look at it before next year. (Yes I know about > numexpr, I am talking about plain Python code.) > > > Sturla > > > If I may chime in here with my own experience with NumPy code... I typically use older, "weaker" computers for my work. I am not doing real-time modeling or some other really advanced, difficult computations. For me, NumPy works "fast enough", even on an EeePC. My main issue is the one given above by Sturla. I find that NumPy's memory usage can go out-of-control very easily in long mathematical expressions. With a mix of constants and large matricies, each step in the order of operations seems to take up more memory. Often, I would run into a major slow-down from trashing the swap. This is fairly trivial to get around by operating over slices of the matrices at a time, but -- to me -- all of this talk about optimizing the speed of the operations without addressing the temporaries issue is like trying to tune-up the engine of a car without bothering to take the lead weights out of the trunk. Just my 2 cents. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Den 11.06.2010 09:14, skrev Sebastien Binet: > it of course depends on the granularity at which you wrap and use > numpy-core but tight loops calling ctypes ain't gonna be pretty > performance-wise. > Tight loops in Python are never pretty. The purpose of vectorization with NumPy is to avoid tight Python loops. > I really think using cython would be a better option. > one'd get the python2<-> python3 transition for "free". > > Perhaps. Cython has nice syntax for PEP 3118 buffers. But there are downsides to this: - Cython is not C. We want the core to be a C library independent of Python. - Cython depends on CPython. - Compiling Cython generated C takes time. - Linkage can be a PITA (I use 64-bit Python now, and import libraries for gcc are missing.) Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Den 11.06.2010 10:17, skrev Pauli Virtanen: >> 1. Collect an array of pointers to each subarray (e.g. using >> std::vector or dtype**) >> 2. Dispatch on the pointer array... >> > This is actually what the current ufunc code does. > > The innermost dimension is handled via the ufunc loop, which is a simple > for loop with constant-size step and is given a number of iterations. The > array iterator objects are used only for stepping through the outer > dimensions. That is, it essentially steps through your dtype** array, > without explicitly constructing it. > > Yes, exactly my point. And because the iterator does not explicitely construct the array, it sucks for parallel programming (e.g. with OpenMP): - The iterator becomes a bottleneck to which access must be serialized with a mutex. - We cannot do proper work scheduling (load balancing) > We could in principle use OpenMP to parallelize the outer loop, as long > as the inner ufunc part is implemented in C, and does not go back into > Python. Yes. And for that we need to explicitely construct a pointer array. > But if the problem is memory-bound, it is not clear that > parallelization helps. > That is often the case, yes. Memory access patterns is one of the most important issues speed wise, particularly when working with strided array slices. It would also make sence to evaluate expressions like "y = b*x + a" without a temporary array for b*x. I know roughly how to do it, but don't have time to look at it before next year. (Yes I know about numexpr, I am talking about plain Python code.) Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Friday 11 June 2010 10:38:28 Pauli Virtanen wrote: > Fri, 11 Jun 2010 10:29:28 +0200, Hans Meine wrote: > > Ideally, algorithms would get wrapped in between two additional > > pre-/postprocessing steps: > > > > 1) Preprocessing: After broadcasting, transpose the input arrays such > > that they become C order. More specifically, sort the strides of one > > (e.g. the first/the largest) input array decreasingly, and apply the > > same transposition to the other arrays (in + out if given). > > > > 2) Perform the actual algorithm, producing a C-order output array. > > > > 3) Apply the reverse transposition to the output array. > > [clip] > > The post/preprocessing steps are fairly easy to implement in the ufunc > machinery. > > The problem here was that this would result to a subtle semantic change, > as output arrays from ufuncs would no longer necessarily be in C order. I don't see it as a problem, really. People who rely on C order will also give C order input arrays to the algorithms. > We need to explicitly to *decide* to make this change, before proceeding. > I'd say that we should simply do this, as nobody should depend on this > assumption. +1 I seem to recall that even Travis, or at least David C. gave a similar opinion in the mentioned thread in the past. > I think I there was some code ready to implement this shuffling. I'll try > to dig it out and implement the shuffling. That would be great! Ullrich Köthe has implemented this for our VIGRA/numpy bindings: http://tinyurl.com/fast-ufunc At the bottom you can see that he basically wraps all numpy.ufuncs he can find in the numpy top-level namespace automatically. Looking forward to this, Hans ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Fri, 11 Jun 2010 10:29:28 +0200, Hans Meine wrote: [clip] > Ideally, algorithms would get wrapped in between two additional > pre-/postprocessing steps: > > 1) Preprocessing: After broadcasting, transpose the input arrays such > that they become C order. More specifically, sort the strides of one > (e.g. the first/the largest) input array decreasingly, and apply the > same transposition to the other arrays (in + out if given). > > 2) Perform the actual algorithm, producing a C-order output array. > > 3) Apply the reverse transposition to the output array. [clip] The post/preprocessing steps are fairly easy to implement in the ufunc machinery. The problem here was that this would result to a subtle semantic change, as output arrays from ufuncs would no longer necessarily be in C order. We need to explicitly to *decide* to make this change, before proceeding. I'd say that we should simply do this, as nobody should depend on this assumption. I think I there was some code ready to implement this shuffling. I'll try to dig it out and implement the shuffling. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Thursday 10 June 2010 22:28:28 Pauli Virtanen wrote: > Some places where Openmp could probably help are in the inner ufunc > loops. However, improving the memory efficiency of the data access > pattern is another low-hanging fruit for multidimensional arrays. I was about to mention this when the thread started.. at least I want to raise awareness of the current problems again, so they can be taken into account while refactoring. Theoretically, this should be really low-hanging fruit, but I have no idea how much work it is. The problem is: NumPy uses C-order indexing by default. While it supports F- order arrays, too, using F-oder arrays leads to unnecessary slow-downs, since the dimensions are walked through from left to right, and not in memory order. Ideally, algorithms would get wrapped in between two additional pre-/postprocessing steps: 1) Preprocessing: After broadcasting, transpose the input arrays such that they become C order. More specifically, sort the strides of one (e.g. the first/the largest) input array decreasingly, and apply the same transposition to the other arrays (in + out if given). 2) Perform the actual algorithm, producing a C-order output array. 3) Apply the reverse transposition to the output array. This would - fix the bug that operations (e.g. addition) on F-order arrays are slower than on C-order arrays, - fix the bug that the order of the output arrays differs from that of the input arrays (this is very annoying when dealing with 3rd party libraries that use and expect F-order everywhere), and - be easy for elementwise operations, but - need further thinking / special handling of operations where the number & mapping of dimensions of the input & output arrays is more complex. Last, but not least, here is a pointer to our previous discussion of the topic in the thread "Unnecessarily bad performance of elementwise operators with Fortran-arrays": http://www.mail-archive.com/numpy-discussion@scipy.org/msg05100.html Have a nice day, Hans ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Thu, 10 Jun 2010 23:56:56 +0200, Sturla Molden wrote: [clip] > Also about array iterators in NumPy's C base (i.e. for doing something > along an axis): we don't need those. There is a different way of coding > which leads to faster code. > > 1. Collect an array of pointers to each subarray (e.g. using > std::vector or dtype**) > 2. Dispatch on the pointer array... This is actually what the current ufunc code does. The innermost dimension is handled via the ufunc loop, which is a simple for loop with constant-size step and is given a number of iterations. The array iterator objects are used only for stepping through the outer dimensions. That is, it essentially steps through your dtype** array, without explicitly constructing it. An abstraction for iteration through an array is useful, also for building the dtype** array, so we should probably retain them. For multidimensional arrays, you run here into the optimization problem of choosing the order of axes so that the memory access pattern makes a maximally efficient use of the cache. Currently, this optimization heuristic is really simple, and makes the wrong choice even in the simple 2D case. We could in principle use OpenMP to parallelize the outer loop, as long as the inner ufunc part is implemented in C, and does not go back into Python. But if the problem is memory-bound, it is not clear that parallelization helps. Another task for optimization would perhaps be to implement specialized loops for each operation, currently we do one function indirection per iteration which probably costs something. But again, it may be that the operations are already bounded by memory speed, and this would not improve performance. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
A Friday 11 June 2010 02:27:18 Sturla Molden escrigué: > >> Another thing I did when reimplementing lfilter was "copy-in copy-out" > >> for strided arrays. > > > > What is copy-in copy out ? I am not familiar with this term ? > > Strided memory access is slow. So it often helps to make a temporary > copy that are contiguous. In my experience, this technique will only work well with strided arrays if you are going to re-use the data of these temporaries in cache, or your data is unaligned. But if you are going to use the data only once (and this is very common in NumPy element-wise operations), this is rather counter- productive for strided arrays. For example, in numexpr, we made a lot of different tests comparing "copy-in copy-out" and direct access techniques for strided arrays. The result was that operations with direct access showed significantly better performance with strided arrays. On the contrary, for unaligned arrays the copy-in copy- out technique gave better results. Look at these times, where the arrays where unidimensional with a length of 1 million element each, but the results can be extrapolated to larger, multidimensional arrays (the original benchmark file is bench/vml_timing.py): -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Numexpr version: 1.3.2.dev169 NumPy version: 1.4.1rc2 Python version:2.6.1 (r261:67515, Feb 3 2009, 17:34:37) [GCC 4.3.2 [gcc-4_3-branch revision 141291]] Platform: linux2-x86_64 AMD/Intel CPU? True VML available? True VML/MKL version: Intel(R) Math Kernel Library Version 10.1.0 Product Build 081809.14 for Intel(R) 64 architecture applications -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= To start with, times between numpy and numexpr are very similar for very simple expressions (except for unaligned arrays, where "copy-in copy-out" works pretty well for numexpr): *** Expression: i2 > 0 numpy: 0.0016 numpy strided: 0.0037 numpy unaligned: 0.0086 numexpr: 0.0016 Speed-up of numexpr over numpy: 0.9512 numexpr strided: 0.0039 Speed-up of numexpr over numpy: 0.964 numexpr unaligned: 0.0042 Speed-up of numexpr over numpy: 2.0598 When doing some basic operations (mind that there are no temporaries here, so numpy should be not in great disadvantage), direct access to strided data goes between 2x and 3x faster than numpy: *** Expression: f3+f4 numpy: 0.0060 numpy strided: 0.0176 numpy unaligned: 0.0166 numexpr: 0.0052 Speed-up of numexpr over numpy: 1.1609 numexpr strided: 0.0086 Speed-up of numexpr over numpy: 2.0584 numexpr unaligned: 0.0099 Speed-up of numexpr over numpy: 1.6785 *** Expression: f3+i2 numpy: 0.0060 numpy strided: 0.0176 numpy unaligned: 0.0176 numexpr: 0.0031 Speed-up of numexpr over numpy: 1.9137 numexpr strided: 0.0061 Speed-up of numexpr over numpy: 2.8789 numexpr unaligned: 0.0078 Speed-up of numexpr over numpy: 2.2411 Notice how, until now, absolute times in numexpr and strided arrays (using the direct technique) are faster than the unaligned case (copy-in copy-out). Also, when evaluating transcendental expressions (numexpr uses Intel's Vector Math Library, VML, here), direct access is again faster than NumPy: *** Expression: exp(f3) numpy: 0.0150 numpy strided: 0.0155 numpy unaligned: 0.0222 numexpr: 0.0030 Speed-up of numexpr over numpy: 5.0268 numexpr strided: 0.0081 Speed-up of numexpr over numpy: 1.9086 numexpr
Re: [Numpy-discussion] NumPy re-factoring project
On Fri, 11 Jun 2010 00:25:17 +0200, Sturla Molden wrote: > Den 10.06.2010 22:07, skrev Travis Oliphant: > > > >> 2. The core should be a plain DLL, loadable with ctypes. (I know David > >> Cournapeau and Robert Kern is going to hate this.) But if Python can have > >> a custom loader for .pyd files, so can NumPy for it's core DLL. For ctypes > >> we just need to specify a fully qualified path to the DLL, which can be > >> read from a config file or whatever. > >> > > This approach does not build a new Python type in compiled code. There > > are speed disadvantages to this --- especially for the numpy scalars. > > There are at least four speed penalties in what I suggested: > > - the ctypes overhead is bigger than using Python's C API manually (or > Cython). > - there is a speed penalty for scalars and small arrays. (Previously > seen in numarray vs. numeric.) > - bytearray is a bit slower to create than to malloc a buffer. > - arrays with dtype=object > > The advantages are: > > - the code is easier to read and maintain (clean separation of Python and C) > - more portable code (no Python C API dependence) > - no obscure memory leaks to track down (bytearray instead of > malloc/free and no manual ref counting). > > > By the way: Is Cython mature enough to toss all C out the door? Since > Cython has syntax for PEP 3118 buffer objects, we should theoretically > be able to implement NumPy in Cython. Then we don't have the speed > penalty and no difficult C to maintain. But if the idea is to make a > Python independent C library from the core, it is probably a bit counter > productive... as a long time lurker, I really like the idea of having a .so as core numpy but I'd really hate to have to go thru ctypes as a necessary (or rather, default) layer to use numpy-core from python. it of course depends on the granularity at which you wrap and use numpy-core but tight loops calling ctypes ain't gonna be pretty performance-wise. I really think using cython would be a better option. one'd get the python2 <-> python3 transition for "free". but, again, I am just a lurker here on numpy-list :) cheers, sebastien. -- # # Dr. Sebastien Binet # Laboratoire de l'Accelerateur Lineaire # Universite Paris-Sud XI # Batiment 200 # 91898 Orsay # ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Thu, Jun 10, 2010 at 8:40 PM, Sturla Molden wrote: > Den 11.06.2010 03:02, skrev Charles R Harris: > > > > But for an initial refactoring it probably falls in the category of > > premature optimization. Another thing to avoid on the first go around > > is micro-optimization, as it tends to complicate the code and often > > doesn't do much for performance. > > > > > First, to refractor NumPy we must get rid of array iterators as Python > objects. Retaining them as Python objects defies the whole purpose. > Second, creating an array of pointers has the advantage of being more > friendly to OpenMP and multi-core CPUs. We don't need to serialize > access to an iterator, and we allow OpenMP to do load balancing. For > things like FFT on vectors along an axis in a multidimensional array > (e.g. in multi-dimensional FFTs), this is not a micro-optimization. Not > to mention that it is easier to iterate over an array than use a > complicated iterator object in C. > > How does that work doing ffts along all axis? What about slicing and views, axis rolling and transposes? Furthermore, if you are just going to add two discontiguous arrays, or take the sine, making copies is wasted effort. I think those things definitely fall in the category of optimizations for many things and will just be a distraction from the real problems of the refactoring. When I say micro-optimization, I am not talking about such things, however. Micro-optimization is such things as loop unrolling, which while it can greatly speed up some things adds a lot of obfuscating code when the first priority is to get things right. Another example would be trying to get a tad more speed using pointers instead of indexes, an optimization which also tends to be architecture dependent. Or using a ton of macros instead of subroutines or inline functions. So on and so forth. I think in these things the first priority is to have something readable and debuggable with clear lines of control flow. This is easier said than done, of course. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Den 11.06.2010 03:02, skrev Charles R Harris: > > But for an initial refactoring it probably falls in the category of > premature optimization. Another thing to avoid on the first go around > is micro-optimization, as it tends to complicate the code and often > doesn't do much for performance. First, to refractor NumPy we must get rid of array iterators as Python objects. Retaining them as Python objects defies the whole purpose. Second, creating an array of pointers has the advantage of being more friendly to OpenMP and multi-core CPUs. We don't need to serialize access to an iterator, and we allow OpenMP to do load balancing. For things like FFT on vectors along an axis in a multidimensional array (e.g. in multi-dimensional FFTs), this is not a micro-optimization. Not to mention that it is easier to iterate over an array than use a complicated iterator object in C. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Den 11.06.2010 04:19, skrev David: > > Ah, ok, I did not know this was called copy-in/copy-out, thanks for the > explanation. I agree this would be a good direction to pursue, but maybe > out of scope for the first refactoring, > > Copy-in copy-out is actually an implementation detail in Fortran compilers. It has more to do with how subroutines are called. But the purpose is to turn a strided array into a contiguous one. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On 06/11/2010 09:27 AM, Sturla Molden wrote: > > Strided memory access is slow. So it often helps to make a temporary > copy that are contiguous. Ah, ok, I did not know this was called copy-in/copy-out, thanks for the explanation. I agree this would be a good direction to pursue, but maybe out of scope for the first refactoring, cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On 06/11/2010 10:02 AM, Charles R Harris wrote: > > > But for an initial refactoring it probably falls in the category of > premature optimization. Another thing to avoid on the first go around is > micro-optimization, as it tends to complicate the code and often doesn't > do much for performance. I agree it may be difficult to add this in the initial refactorization, but I don't think it falls into the micro optimization category. The whole idea of converting strided buffers into temporary contiguous areas is already in ufunc, though. Maybe a core API to do just that would be useful. I am not familiar with the ufunc API at all, so I am not sure how it would go. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Thu, Jun 10, 2010 at 6:27 PM, Sturla Molden wrote: > Den 11.06.2010 00:57, skrev David Cournapeau: > > Do you have the code for this ? That's something I wanted to do, but > never took the time to do. Faster generic iterator would be nice, but > very hard to do in general. > > > > > > /* this computes the start adress for every vector along a dimension > (axis) of an ndarray */ > > typedef PyArrayObject ndarray; > > inline char *datapointer(ndarray *a, npy_intp *indices) > { > char *data = a->data; > int i; > npy_intp j; > for (i=0; i < a->nd; i++) { > j = indices[i]; > data += j * a->strides[i]; > } > return data; > } > > static double ** _get_axis_pointer(npy_intp *indices, int axis, >ndarray *a, double **axis_ptr_array, int dim) > { > /* recursive axis pointer search for 4 dimensions or more */ > npy_intp i; > double *axis_ptr; > if (dim == a->nd) { > /* done recursing dimensions, > compute address of this axis */ > axis_ptr = (double *) datapointer(a, indices); > *axis_ptr_array = axis_ptr; > return (axis_ptr_array + 1); > } else if (dim == axis) { > /* use first element only */ > indices[dim] = 0; > axis_ptr_array = _get_axis_pointer(indices, axis, a, > axis_ptr_array, dim+1); > return axis_ptr_array; > } else { > /* iterate range of indices */ > npy_intp length = a->dimensions[dim]; > for (i=0; i < length; i++) { > indices[dim] = i; > axis_ptr_array = _get_axis_pointer(indices, axis, a, > axis_ptr_array, dim+1); > } > return axis_ptr_array; > } > } > > > static double **get_axis_pointers(ndarray *a, int axis, npy_intp *count) > { > npy_intp indices[NPY_MAXDIMS]; > double **out, **tmp; > npy_intp i, j; > j = 1; > for (i=0; i < a->nd; i++) { > if (i != axis) > j *= a->dimensions[i]; > } > *count = j; > > out = (double **) malloc(*count * sizeof(double*)); > if (out == NULL) { > *count = 0; > return NULL; > } > tmp = out; > switch (a->nd) { > /* for one dimension we just return the data pointer */ > case 1: > *tmp = (double *)a->data; > break; > /* specialized for two dimensions */ > case 2: > switch (axis) { > case 0: > for (i=0; idimensions[1]; i++) > *tmp++ = (double *)(a->data + i*a->strides[1]); > break; > > case 1: > for (i=0; idimensions[0]; i++) > *tmp++ = (double *)(a->data + i*a->strides[0]); > break; > } > break; > /* specialized for three dimensions */ > case 3: > switch (axis) { > case 0: > for (i=0; idimensions[1]; i++) > for (j=0; jdimensions[2]; j++) > *tmp++ = (double *)(a->data + i*a->strides[1] + > j*a->strides[2]); > break; > case 1: > for (i=0; idimensions[0]; i++) > for (j=0; jdimensions[2]; j++) > *tmp++ = (double *)(a->data + i*a->strides[0] + > j*a->strides[2]); > break; > > case 2: > for (i=0; idimensions[0]; i++) > for (j=0; jdimensions[1]; j++) > *tmp++ = (double *)(a->data + i*a->strides[0] + > j*a->strides[1]); > > } > break; > /* four or more dimensions: use recursion */ > default: > for (i=0; ind; i++) indices[i] = 0; > _get_axis_pointer(indices, axis, a, out, 0); > > } > done: > return out; > > } > > > Another thing I did when reimplementing lfilter was "copy-in copy-out" > for strided arrays. > > > What is copy-in copy out ? I am not familiar with this term ? > > > > > Strided memory access is slow. So it often helps to make a temporary copy > that are contiguous. > > But for an initial refactoring it probably falls in the category of premature optimization. Another thing to avoid on the first go around is micro-optimization, as it tends to complicate the code and often doesn't do much for performance. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Den 11.06.2010 00:57, skrev David Cournapeau: Do you have the code for this ? That's something I wanted to do, but never took the time to do. Faster generic iterator would be nice, but very hard to do in general. /* this computes the start adress for every vector along a dimension (axis) of an ndarray */ typedef PyArrayObject ndarray; inline char *datapointer(ndarray *a, npy_intp *indices) { char *data = a->data; int i; npy_intp j; for (i=0; i < a->nd; i++) { j = indices[i]; data += j * a->strides[i]; } return data; } static double ** _get_axis_pointer(npy_intp *indices, int axis, ndarray *a, double **axis_ptr_array, int dim) { /* recursive axis pointer search for 4 dimensions or more */ npy_intp i; double *axis_ptr; if (dim == a->nd) { /* done recursing dimensions, compute address of this axis */ axis_ptr = (double *) datapointer(a, indices); *axis_ptr_array = axis_ptr; return (axis_ptr_array + 1); } else if (dim == axis) { /* use first element only */ indices[dim] = 0; axis_ptr_array = _get_axis_pointer(indices, axis, a, axis_ptr_array, dim+1); return axis_ptr_array; } else { /* iterate range of indices */ npy_intp length = a->dimensions[dim]; for (i=0; i < length; i++) { indices[dim] = i; axis_ptr_array = _get_axis_pointer(indices, axis, a, axis_ptr_array, dim+1); } return axis_ptr_array; } } static double **get_axis_pointers(ndarray *a, int axis, npy_intp *count) { npy_intp indices[NPY_MAXDIMS]; double **out, **tmp; npy_intp i, j; j = 1; for (i=0; i < a->nd; i++) { if (i != axis) j *= a->dimensions[i]; } *count = j; out = (double **) malloc(*count * sizeof(double*)); if (out == NULL) { *count = 0; return NULL; } tmp = out; switch (a->nd) { /* for one dimension we just return the data pointer */ case 1: *tmp = (double *)a->data; break; /* specialized for two dimensions */ case 2: switch (axis) { case 0: for (i=0; idimensions[1]; i++) *tmp++ = (double *)(a->data + i*a->strides[1]); break; case 1: for (i=0; idimensions[0]; i++) *tmp++ = (double *)(a->data + i*a->strides[0]); break; } break; /* specialized for three dimensions */ case 3: switch (axis) { case 0: for (i=0; idimensions[1]; i++) for (j=0; jdimensions[2]; j++) *tmp++ = (double *)(a->data + i*a->strides[1] + j*a->strides[2]); break; case 1: for (i=0; idimensions[0]; i++) for (j=0; jdimensions[2]; j++) *tmp++ = (double *)(a->data + i*a->strides[0] + j*a->strides[2]); break; case 2: for (i=0; idimensions[0]; i++) for (j=0; jdimensions[1]; j++) *tmp++ = (double *)(a->data + i*a->strides[0] + j*a->strides[1]); } break; /* four or more dimensions: use recursion */ default: for (i=0; ind; i++) indices[i] = 0; _get_axis_pointer(indices, axis, a, out, 0); } done: return out; } Another thing I did when reimplementing lfilter was "copy-in copy-out" for strided arrays. What is copy-in copy out ? I am not familiar with this term ? Strided memory access is slow. So it often helps to make a temporary copy that are contiguous. static void copy_in(double *restrict y, double *restrict x, npy_intp n, npy_intp s) { npy_intp i; char *tmp; if (s == sizeof(double)) memcpy(y, x, n*sizeof(double)); else { tmp = (char *)x; for (i=0; istatic void copy_out(double *restrict y, double *restrict x, npy_intp n, npy_intp s) { npy_intp i; char *tmp; if (s == sizeof(double)) memcpy(y, x, n*sizeof(double)); else { tmp = (char *)y; for (i=0; i___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Den 10.06.2010 22:07, skrev Travis Oliphant: > >> 2. The core should be a plain DLL, loadable with ctypes. (I know David >> Cournapeau and Robert Kern is going to hate this.) But if Python can have a >> custom loader for .pyd files, so can NumPy for it's core DLL. For ctypes we >> just need to specify a fully qualified path to the DLL, which can be read >> from a config file or whatever. >> > This approach does not build a new Python type in compiled code. There are > speed disadvantages to this --- especially for the numpy scalars. There are at least four speed penalties in what I suggested: - the ctypes overhead is bigger than using Python's C API manually (or Cython). - there is a speed penalty for scalars and small arrays. (Previously seen in numarray vs. numeric.) - bytearray is a bit slower to create than to malloc a buffer. - arrays with dtype=object The advantages are: - the code is easier to read and maintain (clean separation of Python and C) - more portable code (no Python C API dependence) - no obscure memory leaks to track down (bytearray instead of malloc/free and no manual ref counting). By the way: Is Cython mature enough to toss all C out the door? Since Cython has syntax for PEP 3118 buffer objects, we should theoretically be able to implement NumPy in Cython. Then we don't have the speed penalty and no difficult C to maintain. But if the idea is to make a Python independent C library from the core, it is probably a bit counter productive... Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Fri, Jun 11, 2010 at 7:25 AM, Sturla Molden wrote: > Den 10.06.2010 22:07, skrev Travis Oliphant: >> >>> 2. The core should be a plain DLL, loadable with ctypes. (I know David >>> Cournapeau and Robert Kern is going to hate this.) But if Python can have a >>> custom loader for .pyd files, so can NumPy for it's core DLL. For ctypes we >>> just need to specify a fully qualified path to the DLL, which can be read >>> from a config file or whatever. I am not sure why you say I would hate it - that's exactly what I would like numpy core to be (a library). Ctypes is an implementation detail. For reference counting, I am not sure we can get away with it unless we use something like LUA stack technique for the core. The big issue with reference counting is how to deal with non C VM - there has to be known good practices to make libraries reusable from say the CLI or the JVM, but I don't know them. > > > By the way: Is Cython mature enough to toss all C out the door? Maybe not all, but I am convinced most of it could. Today, I would not have rewritten lfilter in C, for example, I would have just used cython. One issue with cython compared to C is that it makes compilation much slower altogether, at least with gcc. I have never been able to pin-point the exact issue (it does not seem like gcc should be that slow compiling the generated C). David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Thu, 10 Jun 2010 18:48:04 +0200, Sturla Molden wrote: [clip] > 5. Allow OpenMP pragmas in the core. If arrays are above a certain size, > it should switch to multi-threading. Some places where Openmp could probably help are in the inner ufunc loops. However, improving the memory efficiency of the data access pattern is another low-hanging fruit for multidimensional arrays. I'm not sure how Openmp plays together with thread-safety; especially if used in outer constructs. Whether this is possible to do easily is not so clear, as Numpy uses e.g. iterator and loop objects, which cannot probably be shared between different openmp threads. So one would have to do some extra work to parallelize these parts manually. But probably if multithreading is desired, openmp sounds like the way to go... -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Fri, Jun 11, 2010 at 1:18 AM, Sebastien Binet wrote: > On Thu, 10 Jun 2010 10:47:10 -0500, Jason McCampbell > wrote: >> > 4) Boost has some reference counted pointers, have you looked at them? C++ >> > is admittedly a very different animal for this sort of application. >> > >> >> There is also need to replace the usage of PyDict and other uses of CPython >> for basic data structures that aren't present in C. Having access to C++ >> for this and reference counting would be nice, but has the potential to >> break builds for everyone who use the C API. I think it's worth discussing >> for the future but a bigger (and possibly more contentious) change than we >> are able to take on for this project. > > for the dict part, a probably good enough replacement could be the hash > table from: > > http://c-algorithms.sourceforge.net/ My long quest for a usable set of basic structures in C lead me to basekit. It is BSD and the code is very simple, I like it very much: http://github.com/stevedekorte/basekit It is used in the IO programming language, which is in the same ballpark as python (very dynamic ala smalltalk) and uses a GC, which means the allocation strategy used in basekit is viable for VM which do not depend on reference counting. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Den 10.06.2010 22:28, skrev Pauli Virtanen: > > Some places where Openmp could probably help are in the inner ufunc > loops. However, improving the memory efficiency of the data access > pattern is another low-hanging fruit for multidimensional arrays. > > Getting the intermediate array out of expressions like a + b would be very cool. But it is not as trivial as it seems. Possibility of controlling aliasing (cf. Fortran and C99) can help. For example, for binary array operators we know that the output array (a temporary intermediate) is not aliased with the two arguments. ufuncs involving trancendental functions would benefit from OpenMP. O(N**2) ufuncs like var and std would also benefit. We can also use copy-in copy-out for strided array access (see below). > I'm not sure how Openmp plays together with thread-safety; especially if > used in outer constructs. Whether this is possible to do easily is not so > clear, as Numpy uses e.g. iterator and loop objects, which cannot > probably be shared between different openmp threads. So one would have to > do some extra work to parallelize these parts manually. > > OpenMP has pragmas for serializing access to code, which is: #pragma omp critical { // done by all threads // serialized access (we can also name the lock) } #pragma omp atomic a += 1 // atomic operation (e.g. nice for refcounts, instead a big lock) #pragma omp single { // done by one thread (random or named) } #pragma omp master { // done by the master thread only // e.g. used when calling back to the Python C API } #pragma omp barrier // wait for all threads here That's about all you need to know about thread synchronization in OpenMP... > But probably if multithreading is desired, openmp sounds like the way to > go... > > Anyone profient in C or Fortran can learn OpenMP in 10 minutes, and it is close to infinitely more easy than manual multithreading. And now it is supported by the majority of compilers. Also about array iterators in NumPy's C base (i.e. for doing something along an axis): we don't need those. There is a different way of coding which leads to faster code. 1. Collect an array of pointers to each subarray (e.g. using std::vector or dtype**) 2. Dispatch on the pointer array... I used this to reimplement lfilter in scipy. It is just a lot faster than array iterators. It also allows us to use OpenMP in a very natural way. For multidimensional array I have a recursive function for collecting the array of subarray pointers, but for 2D and 3D with can use nested for-loops. Another thing I did when reimplementing lfilter was "copy-in copy-out" for strided arrays. This also helps a lot. It might seem paradoxical as we need to iterate over the strided array(s) to make the copies. But in practice it is faster by a factor of 2 or 3. Fortran compilers also tend to do this optimization for strided arrays. It is even mentioned in the Fortran standard as explicitely allowed, as it has consequences for array pointers (they can be invalidated by function calls if a contigiuous copy is made). Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Jun 10, 2010, at 11:48 AM, Sturla Molden wrote: > > I have a few radical suggestions: There are some good ideas there. I suspect we can't address all of them in the course of this re-factoring effort, but I really appreciate you putting them out there, because they are useful things to consider. > > 1. Use ctypes as glue to the core DLL, so we can completely forget about > refcounts and similar mess. Why put manual reference counting and error > handling in the core? It's stupid. If we could get to a single core DLL, then perhaps this would work. It may be very difficult to actually get to that point though from where we are because there is a lot to the current CPython interface that would have to be re-thought. Right now, it looks like there is still a need for an interface layer. > > 2. The core should be a plain DLL, loadable with ctypes. (I know David > Cournapeau and Robert Kern is going to hate this.) But if Python can have a > custom loader for .pyd files, so can NumPy for it's core DLL. For ctypes we > just need to specify a fully qualified path to the DLL, which can be read > from a config file or whatever. This approach does not build a new Python type in compiled code. There are speed disadvantages to this --- especially for the numpy scalars. > > 5. Allow OpenMP pragmas in the core. If arrays are above a certain size, it > should switch to multi-threading. This is an interesting idea. -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Fri, Jun 11, 2010 at 6:56 AM, Sturla Molden wrote: > > Also about array iterators in NumPy's C base (i.e. for doing something > along an axis): we don't need those. There is a different way of coding > which leads to faster code. > > 1. Collect an array of pointers to each subarray (e.g. using > std::vector or dtype**) > 2. Dispatch on the pointer array... Do you have the code for this ? That's something I wanted to do, but never took the time to do. Faster generic iterator would be nice, but very hard to do in general. > Another thing I did when reimplementing lfilter was "copy-in copy-out" > for strided arrays. What is copy-in copy out ? I am not familiar with this term ? David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Another suggestion I'd like to make is bytearray as memory buffer for the ndarray. An ndarray could just store or extend a bytearray, instead of having to deal with malloc/free and and the mess that comes with it. Python takes care of the reference counts for bytearrays, and ctypes calls the NumPy core DLL. Yes it will not work with older versions of Python. But for a complete refactoring ("NumPy 3000"), backwards compatibility should not matter much. (It's easier to backport bytearray than track down memory leaks in NumPy's core.) Sturla Den 10.06.2010 18:48, skrev Sturla Molden: I have a few radical suggestions: 1. Use ctypes as glue to the core DLL, so we can completely forget about refcounts and similar mess. Why put manual reference counting and error handling in the core? It's stupid. 2. The core should be a plain DLL, loadable with ctypes. (I know David Cournapeau and Robert Kern is going to hate this.) But if Python can have a custom loader for .pyd files, so can NumPy for it's core DLL. For ctypes we just need to specify a fully qualified path to the DLL, which can be read from a config file or whatever. 3. ctypes will take care of all the mess regarding the GIL. Again there is no need to re-invent the wheel. ctypes.CDLL releases the GIL when calling into C, and re-acquires before making callbacks to Python. ctypes.PyDLL keeps the GIL for legacy library code that are not threadsafe. ctypes will also make porting to other Python implementations easier (or even other languages: Ruby, JacaScript) easier. Not to mention that it will make NumPy impervious to changes in the Python C API. 4. Write the core in C++ or Fortran (95/2003), not C. ANSI C (aka C89). Use std::vector<> instead of malloc/free for C++, and possibly templates for generics on various dtypes. 5. Allow OpenMP pragmas in the core. If arrays are above a certain size, it should switch to multi-threading. Sturla Den 10.06.2010 15:26, skrev Charles R Harris: A few thoughts came to mind while reading the initial writeup. 1) How is the GIL handled in the callbacks. 2) What about error handling? That is tricky to get right, especially in C and with reference counting. 3) Is there a general policy as to how the reference counting should be handled in specific functions? That is, who does the reference incrementing/decrementing? 4) Boost has some reference counted pointers, have you looked at them? C++ is admittedly a very different animal for this sort of application. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Den 10.06.2010 18:48, skrev Sturla Molden: > ctypes will also make porting to other Python implementations easier > (or even other languages: Ruby, JacaScript) easier. Not to mention > that it will make NumPy impervious to changes in the Python C API. Linking is also easier with ctypes. I started getting a lot of linker errors when switching to 64-bit Python on Windows. Just throwing out all of Cython et al. in favour of plain C keeps the problem away. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
I have a few radical suggestions: 1. Use ctypes as glue to the core DLL, so we can completely forget about refcounts and similar mess. Why put manual reference counting and error handling in the core? It's stupid. 2. The core should be a plain DLL, loadable with ctypes. (I know David Cournapeau and Robert Kern is going to hate this.) But if Python can have a custom loader for .pyd files, so can NumPy for it's core DLL. For ctypes we just need to specify a fully qualified path to the DLL, which can be read from a config file or whatever. 3. ctypes will take care of all the mess regarding the GIL. Again there is no need to re-invent the wheel. ctypes.CDLL releases the GIL when calling into C, and re-acquires before making callbacks to Python. ctypes.PyDLL keeps the GIL for legacy library code that are not threadsafe. ctypes will also make porting to other Python implementations easier (or even other languages: Ruby, JacaScript) easier. Not to mention that it will make NumPy impervious to changes in the Python C API. 4. Write the core in C++ or Fortran (95/2003), not C. ANSI C (aka C89). Use std::vector<> instead of malloc/free for C++, and possibly templates for generics on various dtypes. 5. Allow OpenMP pragmas in the core. If arrays are above a certain size, it should switch to multi-threading. Sturla Den 10.06.2010 15:26, skrev Charles R Harris: A few thoughts came to mind while reading the initial writeup. 1) How is the GIL handled in the callbacks. 2) What about error handling? That is tricky to get right, especially in C and with reference counting. 3) Is there a general policy as to how the reference counting should be handled in specific functions? That is, who does the reference incrementing/decrementing? 4) Boost has some reference counted pointers, have you looked at them? C++ is admittedly a very different animal for this sort of application. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Thu, 10 Jun 2010 10:47:10 -0500, Jason McCampbell wrote: > > 4) Boost has some reference counted pointers, have you looked at them? C++ > > is admittedly a very different animal for this sort of application. > > > > There is also need to replace the usage of PyDict and other uses of CPython > for basic data structures that aren't present in C. Having access to C++ > for this and reference counting would be nice, but has the potential to > break builds for everyone who use the C API. I think it's worth discussing > for the future but a bigger (and possibly more contentious) change than we > are able to take on for this project. for the dict part, a probably good enough replacement could be the hash table from: http://c-algorithms.sourceforge.net/ cheers, sebastien. -- # # Dr. Sebastien Binet # Laboratoire de l'Accelerateur Lineaire # Universite Paris-Sud XI # Batiment 200 # 91898 Orsay # ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
Hi Chuck, Good questions. Responses inline below... Jason On Thu, Jun 10, 2010 at 8:26 AM, Charles R Harris wrote: > > > On Wed, Jun 9, 2010 at 5:27 PM, Jason McCampbell < > jmccampb...@enthought.com> wrote: > >> Hi everyone, >> >> This is a follow-up to Travis's message on the re-factoring project from >> May 25th and the subsequent discussion. For background, I am a developer at >> Enthought working on the NumPy re-factoring project with Travis and Scott. >> The immediate goal from our perspective is to re-factor the core of NumPy >> into two architectural layers: a true core that is CPython-independent and >> an interface layer that binds the core to CPython. >> >> A design proposal is now posted on the NumPy developer wiki: >> http://projects.scipy.org/numpy/wiki/NumPyRefactoring >> >> The write-up is a fairly high-level description of what we think the split >> will look like and how to deal with issues such as memory management. There >> are also placeholders listed as 'TBD' where more investigation is still >> needed and will be filled in over time. At the end of the page there is a >> section on the C API with a link to a function-by-function breakdown of the >> C API and whether the function belongs in the interface layer, the core, or >> need to be split between the two. All functions listed as 'core' will >> continue to have an interface-level wrapper with the same name to ensure >> source-compatibility. >> >> All of this, particularly the interface/core function designations, is a >> first analysis and in flux. The goal is to get the information out and >> elicit discussion and feedback from the community. >> >> > A few thoughts came to mind while reading the initial writeup. > > 1) How is the GIL handled in the callbacks. > How to handle the GIL still requires some thought. The cleanest way, IMHO, would is for the interface layer to release the lock prior to calling into the core and then each callback function in the interface is responsible for re-acquiring it. That's straightforward to define as a rule and should work well in general, but I'm worried about potential performance issues if/when a callback is called in a loop. A few optimization points is ok, but too many and it will just be a source of heisenbugs. One other option is to just use the existing release/acquire macros in NumPy and redirect them to the interface layer. Any app that isn't CPython would just leave those callback pointers NULL. It's less disruptive but leaves some very CPython-specific behavior in the core. > 2) What about error handling? That is tricky to get right, especially in C > and with reference counting. > The error reporting functions in the core will likely look a lot like the CPython functions - they seem general enough. The biggest change is the CPython ones take a PyObject as the error type. 99% of the errors reported in NumPy use one of a half-dozen pre-defined types that are easy to translate. There is at least one case where an object type (complex number) is dynamically and used as the type, but so far I believe it's only one case. The reference counting does get a little more complex because a core routine will need to decref the core object on error and the interface layer will need to similarly detect the error and potentially do it's own decref. Each layer is still responsible for it's own clean up, but there are now two opportunities to introduce leaks. > 3) Is there a general policy as to how the reference counting should be > handled in specific functions? That is, who does the reference > incrementing/decrementing? > Both layers should implement the existing policy for the objects that it manages. Essentially a function can use it's caller's reference but needs to increment the count if it's going to store it. A new instance is returned with a refcnt of 1 and the caller needs to clean it up when it's no longer needed. But that means that if the core returns a new NpyArray instance to the interface layer, the receiving function in the interface must allocate a PyObject wrapper around it and set the wrapper's refcnt to 1 before returning it. Is that what you were asking? 4) Boost has some reference counted pointers, have you looked at them? C++ > is admittedly a very different animal for this sort of application. > There is also need to replace the usage of PyDict and other uses of CPython for basic data structures that aren't present in C. Having access to C++ for this and reference counting would be nice, but has the potential to break builds for everyone who use the C API. I think it's worth discussing for the future but a bigger (and possibly more contentious) change than we are able to take on for this project. > Chuck > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussio
Re: [Numpy-discussion] NumPy re-factoring project
On Thu, Jun 10, 2010 at 7:26 AM, Charles R Harris wrote: > > > On Wed, Jun 9, 2010 at 5:27 PM, Jason McCampbell < > jmccampb...@enthought.com> wrote: > >> Hi everyone, >> >> This is a follow-up to Travis's message on the re-factoring project from >> May 25th and the subsequent discussion. For background, I am a developer at >> Enthought working on the NumPy re-factoring project with Travis and Scott. >> The immediate goal from our perspective is to re-factor the core of NumPy >> into two architectural layers: a true core that is CPython-independent and >> an interface layer that binds the core to CPython. >> >> A design proposal is now posted on the NumPy developer wiki: >> http://projects.scipy.org/numpy/wiki/NumPyRefactoring >> >> The write-up is a fairly high-level description of what we think the split >> will look like and how to deal with issues such as memory management. There >> are also placeholders listed as 'TBD' where more investigation is still >> needed and will be filled in over time. At the end of the page there is a >> section on the C API with a link to a function-by-function breakdown of the >> C API and whether the function belongs in the interface layer, the core, or >> need to be split between the two. All functions listed as 'core' will >> continue to have an interface-level wrapper with the same name to ensure >> source-compatibility. >> >> All of this, particularly the interface/core function designations, is a >> first analysis and in flux. The goal is to get the information out and >> elicit discussion and feedback from the community. >> >> > A few thoughts came to mind while reading the initial writeup. > > 1) How is the GIL handled in the callbacks. > 2) What about error handling? That is tricky to get right, especially in C > and with reference counting. > 3) Is there a general policy as to how the reference counting should be > handled in specific functions? That is, who does the reference > incrementing/decrementing? > 4) Boost has some reference counted pointers, have you looked at them? C++ > is admittedly a very different animal for this sort of application. > > I've thought that PyArray_SetNumericOps, PyArray_SetNumericOps should have getter/setter functions so that the structure doesn't have to be made public in the split up files. Can the casting functions be implemented as ufuncs? Or at least look like ufuncs so they can handle strided memory. Likewise for some of the other things in arraytypes.c.src. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NumPy re-factoring project
On Wed, Jun 9, 2010 at 5:27 PM, Jason McCampbell wrote: > Hi everyone, > > This is a follow-up to Travis's message on the re-factoring project from > May 25th and the subsequent discussion. For background, I am a developer at > Enthought working on the NumPy re-factoring project with Travis and Scott. > The immediate goal from our perspective is to re-factor the core of NumPy > into two architectural layers: a true core that is CPython-independent and > an interface layer that binds the core to CPython. > > A design proposal is now posted on the NumPy developer wiki: > http://projects.scipy.org/numpy/wiki/NumPyRefactoring > > The write-up is a fairly high-level description of what we think the split > will look like and how to deal with issues such as memory management. There > are also placeholders listed as 'TBD' where more investigation is still > needed and will be filled in over time. At the end of the page there is a > section on the C API with a link to a function-by-function breakdown of the > C API and whether the function belongs in the interface layer, the core, or > need to be split between the two. All functions listed as 'core' will > continue to have an interface-level wrapper with the same name to ensure > source-compatibility. > > All of this, particularly the interface/core function designations, is a > first analysis and in flux. The goal is to get the information out and > elicit discussion and feedback from the community. > > A few thoughts came to mind while reading the initial writeup. 1) How is the GIL handled in the callbacks. 2) What about error handling? That is tricky to get right, especially in C and with reference counting. 3) Is there a general policy as to how the reference counting should be handled in specific functions? That is, who does the reference incrementing/decrementing? 4) Boost has some reference counted pointers, have you looked at them? C++ is admittedly a very different animal for this sort of application. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] NumPy re-factoring project
Hi everyone, This is a follow-up to Travis's message on the re-factoring project from May 25th and the subsequent discussion. For background, I am a developer at Enthought working on the NumPy re-factoring project with Travis and Scott. The immediate goal from our perspective is to re-factor the core of NumPy into two architectural layers: a true core that is CPython-independent and an interface layer that binds the core to CPython. A design proposal is now posted on the NumPy developer wiki: http://projects.scipy.org/numpy/wiki/NumPyRefactoring The write-up is a fairly high-level description of what we think the split will look like and how to deal with issues such as memory management. There are also placeholders listed as 'TBD' where more investigation is still needed and will be filled in over time. At the end of the page there is a section on the C API with a link to a function-by-function breakdown of the C API and whether the function belongs in the interface layer, the core, or need to be split between the two. All functions listed as 'core' will continue to have an interface-level wrapper with the same name to ensure source-compatibility. All of this, particularly the interface/core function designations, is a first analysis and in flux. The goal is to get the information out and elicit discussion and feedback from the community. Best regards, Jason Jason McCampbell Enthought, Inc. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion