Re: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue)
with high-level API :-). -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Luis Pedro Coelho | Institute for Molecular Medicine | http://luispedro.org signature.asc Description: This is a digitally signed message part. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] C++ Example
On Saturday, March 03, 2012 04:38:53 PM David Cournapeau wrote: I don't think the code is comparable either - some of the stuff done in the C code is done in the C++ code your are calling. The C code could be significantly improved. Actually, that's not 100% accurate. The C code calls the same functions. Most of the extra cruft is that it needs to do all of this error checking and type- dispatch, while in C++ you can have RAII and templates. Even more important here: almost none of this code should be written anymore anyway, C++ or not. This is really the kind of code that should be done in cython, as it is mostly about wrapping C code into the python C API. At least last time I read up on it, cython was not able to do multi-type code, i.e., have code that works on arrays of multiple types. Does it support it now? Best, -- Luis Pedro Coelho | University of Lisbon | http://luispedro.org signature.asc Description: This is a digitally signed message part. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] C++ Example
; } } switch (input-descr-type_num) { CASE_NI_ERODE_POINT(pi, out, oo, struct_size, Bool, msk_value, bdr_value, border_flag_value, center_is_true, true, false, pchange); CASE_NI_ERODE_POINT(pi, out, oo, struct_size, UInt8, msk_value, bdr_value, border_flag_value, center_is_true, true, false, pchange); CASE_NI_ERODE_POINT(pi, out, oo, struct_size, UInt16, msk_value, bdr_value, border_flag_value, center_is_true, true, false, pchange); CASE_NI_ERODE_POINT(pi, out, oo, struct_size, UInt32, msk_value, bdr_value, border_flag_value, center_is_true, true, false, pchange); #if HAS_UINT64 CASE_NI_ERODE_POINT(pi, out, oo, struct_size, UInt64, msk_value, bdr_value, border_flag_value, center_is_true, true, false, pchange); #endif CASE_NI_ERODE_POINT(pi, out, oo, struct_size, Int8, msk_value, bdr_value, border_flag_value, center_is_true, true, false, pchange); CASE_NI_ERODE_POINT(pi, out, oo, struct_size, Int16, msk_value, bdr_value, border_flag_value, center_is_true, true, false, pchange); CASE_NI_ERODE_POINT(pi, out, oo, struct_size, Int32, msk_value, bdr_value, border_flag_value, center_is_true, true, false, pchange); CASE_NI_ERODE_POINT(pi, out, oo, struct_size, Int64, msk_value, bdr_value, border_flag_value, center_is_true, true, false, pchange); CASE_NI_ERODE_POINT(pi, out, oo, struct_size, Float32, msk_value, bdr_value, border_flag_value, center_is_true, true, false, pchange); CASE_NI_ERODE_POINT(pi, out, oo, struct_size, Float64, msk_value, bdr_value, border_flag_value, center_is_true, true, false, pchange); default: PyErr_SetString(PyExc_RuntimeError, data type not supported); goto exit; } switch (output-descr-type_num) { CASE_OUTPUT(po, out, Bool); CASE_OUTPUT(po, out, UInt8); CASE_OUTPUT(po, out, UInt16); CASE_OUTPUT(po, out, UInt32); #if HAS_UINT64 CASE_OUTPUT(po, out, UInt64); #endif CASE_OUTPUT(po, out, Int8); CASE_OUTPUT(po, out, Int16); CASE_OUTPUT(po, out, Int32); CASE_OUTPUT(po, out, Int64); CASE_OUTPUT(po, out, Float32); CASE_OUTPUT(po, out, Float64); default: PyErr_SetString(PyExc_RuntimeError, data type not supported); goto exit; } if (pchange) { *changed = 1; if (coordinate_list) { if (block == NULL || block-size == block_size) { block = NI_CoordinateListAddBlock(*coordinate_list); current = block-coordinates; } for(kk = 0; kk input-nd; kk++) *current++ = ii.coordinates[kk]; block-size++; } } if (mask) { NI_FILTER_NEXT3(fi, ii, io, mi, oo, pi, po, pm); } else { NI_FILTER_NEXT2(fi, ii, io, oo, pi, po); } } exit: if (offsets) free(offsets); if (PyErr_Occurred()) { if (coordinate_list) { NI_FreeCoordinateList(*coordinate_list); *coordinate_list = NULL; } return 0; } else { return 1; } return PyErr_Occurred() ? 0 : 1; } HTH -- Luis Pedro Coelho | Institute for Molecular Medicine | http://luispedro.org signature.asc Description: This is a digitally signed message part. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] As any array, really any array
Hello all, I often find myself writing the following code: try: features = np.asanyarray(features) except: features = np.asanyarray(features, dtype=object) I basically want to be able to use fany indexing on features and, in most cases, it will be a numpy floating point array. Otherwise, default to having it be an array of dtype=object. Is there a more elegant way to do it with numpy? Thank you, -- Luis Pedro Coelho | Carnegie Mellon University | http://luispedro.org signature.asc Description: This is a digitally signed message part. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adjacent matrix
On Monday, June 13, 2011 03:55:46 PM Thiago Franco Moraes wrote: Find all of the grid points in that lie adjacent to one or more grid points of opposite values. This is the code used to calculate that matrix: [spin] Where nx, ny and nz are the x, y, z dimensions from the im input binary matrix. I think this operation is called bwperim. My package, mahotas, supports it directly: mahotas.bwperim(bw, n) at http://packages.python.org/mahotas/ HTH, Luis signature.asc Description: This is a digitally signed message part. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Adjacent matrix
On Tuesday, June 14, 2011 08:50:47 AM Thiago Franco Moraes wrote: On Tue, Jun 14, 2011 at 8:06 AM, Luis Pedro Coelho l...@cmu.edu wrote: On Monday, June 13, 2011 03:55:46 PM Thiago Franco Moraes wrote: Find all of the grid points in that lie adjacent to one or more grid points of opposite values. This is the code used to calculate that matrix: [spin] Where nx, ny and nz are the x, y, z dimensions from the im input binary matrix. I think this operation is called bwperim. My package, mahotas, supports it directly: mahotas.bwperim(bw, n) at http://packages.python.org/mahotas/ HTH, Luis Thanks Luis. But mahotas.bwperim only supports 2D images, my images are 3D from Computerized Tomography (often very large) and I need connectivity 26. If the images are binary, an alternative is mahotas.labeled.borders() It returns the pixels which have more than one value in there neighbourhoods. HTH Luis signature.asc Description: This is a digitally signed message part. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ANN: Mahotas 0.6.4
Hello all, I'm happy to announce a new release of mahotas, my computer vision library for Python. WHAT'S MAHOTAS -- Mahotas is a library which includes several image processing algorithm. It works over numpy arrays. All its computation heavy functions are implemented in C++ (using templates so that they work for every data type without conversions). The result is a fast, optimised image processing and computer vision library. It is under heavy development and has no known bugs. Reported bugs almost always get fixed in a day. WHAT'S NEW -- Major change is that Mac OS compilation now works! Thanks to K-Michael Aye for the patch that saved the day. Minor changes include fixes to ``cwatershed`` and adding the tests to the source distribution (use `python setup.py test` to run them). Some of these. INFO *API Docs*: http://packages.python.org/mahotas/ *Mailing List*: Use the pythonvision mailing list http://groups.google.com/group/pythonvision?pli=1 for questions, bug submissions, etc. *Author*: Luis Pedro Coelho (with code by Zachary Pincus [from scikits.image], Peter J. Verveer [from scipy.ndimage], and Davis King [from dlib] *License*: GPLv2 Thank you, -- Luis Pedro Coelho | Carnegie Mellon University | http://luispedro.org signature.asc Description: This is a digitally signed message part ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANNOUNCE: mahotas 0.5
On Friday, September 10, 2010 03:40:33 am Sebastian Haase wrote: Hi Luis, thanks for the announcement. How would you compare mahotas to scipy's ndimage ? Are you using ndimage in mahotas at all ? Hi Sebastian, In general there is little overlap (there are 1 or 2 functions which are replicated). I wrote mahotas mostly to get functions that I wasn't finding elsewhere (or, occasionally, to make them faster). Ndimage, in some ways, contains more basic functions, mahotas has a bit more advanced functions (at the cost of having a somewhat idiosyncratically chosen set of functions). I do use ndimage in a couple of places (mostly to do convolution). So, they complement themselves rather than compete. Here are a couple of differences in philosophy: - ndimage is *always* n-D. Mahotas is mostly n-D but some functions are specialised to 2-D (patches always welcome if you want to extend them). - mahotas is written in templated C++, ndimage is C with macros. - ndimage tries to be very generic in its interfaces (which makes some of them hard to use at first), mahotas tries to have natural interfaces. HTH, Luis signature.asc Description: This is a digitally signed message part. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ANNOUNCE: mahotas 0.5
Hello everyone, My numpy based image processing toolbox has just had a new release: 0.5 New features are: Distance transform bwperim() freeimage interface [borrowed and improved from scikits.image] zernike moment computation There were some fixes to the namespace (in particular, functions that were previously available as mahotas.bbox.bbox are now just mahotas.bbox), which is good, but might break code. Sorry, but the new system is just so much better. Many small fixes and performance improvements all around. You can get it from pypi: http://pypi.python.org/pypi/mahotas (including through easy_install mahotas or pip install mahotas). Bleeding edge is at: http://github.com/luispedro/mahotas API Documentation at http://packages.python.org/mahotas/ (very complete, but rather ugly; will fix uglyness). Mailing-List (for Python-based computer vision related subjects in general including this package): http://groups.google.com/group/pythonvision -- Luis Pedro Coelho | Carnegie Mellon University | http://luispedro.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Thoughts on persistence/object tracking in scientific code
Hello, I coincidently started my own implementation of a system to manage intermediate results last week, which I called jug. I wasn't planning to make such an alpha version public just now, but it seems to be on topic. The main idea is to use hashes to map function arguments to paths on the filesystem, which store the result (nothing extraordinary here). I also added the capability of having tasks (the basic unit) take the results of other tasks and defining an implicit dependency DAG. A simple locking mechanism enables light-weight task-level parellization (this was the second of my goals: help me make my stuff parallel). A trick that helps is that I don't really use the argument values to hash (which would be unwieldy for big arrays). I use the computation path (e.g., this is the value obtained from f(g('something'),2)). Since, at least in my problems, things tend to always map back into simple file-system paths, the hash computation doesn't even need to load the intermediate results. I will make the git repository publicly available once I figure out how to do that. I append the tutorial I wrote, which explains the system. HTH, Luís Pedro Coelho PhD Student in Computational Biology Carnegie Mellon University Jug Tutorial What is jug? Jug is a simple way to write easily parallelisable programs in Python. It also handles intermediate results for you. Example --- This is a simple worked-through example which illustrates what jug does. Problem ~~~ Assume that I want to do the following to a collection of images: (1) for each image, compute some features (2) cluster these features using k-means. In order to find out the number of clusters, I try several values and pick the best result. For each value of k, because of the random initialisation, I run the clustering 10 times. I could write the following simple code: :: imgs = glob('*.png') features = [computefeatures(img,parameter=2) for img in imgs] clusters = [] bics = [] for k in xrange(2,200): for repeat in xrange(10): clusters.append(kmeans(features,k=k,random_seed=repeat)) bics.append(compute_bic(clusters[-1])) Nr_clusters = argmin(bics) // 10 Very simple and solves the problem. However, if I want to take advantage of the obvious parallelisation of the problem, then I need to write much more complicated code. My traditional approach is to break this down into smaller scripts. I'd have one to compute features for some images, I'd have another to merge all the results together and do some of the clustering, and, finally, one to merge all the results of the different clusterings. These would need to be called with different parameters to explore different areas of the parameter space, so I'd have a couple of scripts just for calling the main computation scripts. Intermediate results would be saved and loaded by the different processes. This has several problems. The biggest are (1) The need to manage intermediate files. These are normally files with long names like *features_for_img_0_with_parameter_P.pp*. (2) The code gets much more complex. There are minor issues with having to issue several jobs (and having the cluster be idle in the meanwhile), or deciding on how to partition the jobs so that they take roughly the same amount of time, but the two above are the main ones. Jug solves all these problems! Tasks ~ The main unit of jug is a Task. Any function can be used to generate a Task. A Task can depend on the results of other Tasks. The original idea for jug was a Makefile-like environment for declaring Tasks. I have moved beyond that, but it might help you think about what Tasks are. You create a Task by giving it a function which performs the work and its arguments. The arguments can be either literal values or other tasks (in which case, the function will be called with the *result* of those tasks!). Jug also understands lists of tasks (all standard Python containers will be supported in a later version). For example, the following code declares the necessary tasks for our problem: :: imgs = glob('*.png') feature_tasks = [Task(computefeatures,img,parameter=2) for img in imgs] cluster_tasks = [] bic_tasks = [] for k in xrange(2,200): for repeat in xrange(10): cluster_tasks.append(Task(kmeans,feature_tasks,k=k,random_seed=repeat)) bic_tasks.append(Task(compute_bic,cluster_tasks[-1])) Nr_clusters = Task(argmin,bic_tasks) Task Generators ~~~ In the code above, there is a lot of code of the form *Task(function,args)*, so maybe it should read *function(args)*. A simple helper function aids this process: :: from jug.task import Task def TaskGenerator(function): def gen(*args,**kwargs): return Task(function,*args,**kwargs) return gen computefeatures =
Re: [Numpy-discussion] Thoughts on persistence/object tracking in scientific code
On Monday 29 December 2008 14:51:48 Luis Pedro Coelho wrote: I will make the git repository publicly available once I figure out how to do that. You can get my code with: git clone http://coupland.cbi.cmu.edu/jug As I said, I consider this alpha code and am only making it publicly available at this stage because it came up. The license is LGPL. bye, Luis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Thoughts on persistence/object tracking in scientific code
Hello all, On Monday 29 December 2008 17:40:07 Gael Varoquaux wrote: It is interesting to see that you take a slightly different approach than the others already discussed. This probably stems from the fact that you are mostly interested by parallelism, whereas there are other adjacent problems that can be solved by similar abstractions. In particular, I have the impression that you do not deal with what I call lazy-revaluation. In other words, I am not sure if you track results enough to know whether a intermediate result should be re-run, or if you run a 'clean' between each run to avoid this problem. I do. As long as the hash (the arguments to the function) is the same, the code loads objects from disk instead of computing results. I don't track the actual source code, though, only whether parameters have changed (but this could be a later addition). I must admit I went away from using hash to store objects to the disk because I am very much interested in traceability, and I wanted my objects to have meaningful names, and to be stored in convenient formats (pickle, numpy .npy, hdf5, or domain-specific). I have now realized that explicit naming is convenient, but it should be optional. But using a hash is not so impenetrable as long as you can easily get to the files you want. If I want to load the results of a partial computation, all I have to do is to generate the same Task objects as the initial computation and load those: I can run the jugfile.py inside ipython and call the appropriate load() methods. ipython jugfile.py : interesting = [t for t in tasks if t.name == 'something.other'] : intermediate = interesting[0].load() I did notice too that using the argument value to hash was bound to failure in all but the simplest case. This is the immediate limitation to the famous memoize pattern when applied to scientific code. If I understand well, what you do is that you track the 'history' of the object and use it as a hash to the object, right? I had come to the conclusion that the history of objects should be tracked, but I hadn't realized that using it as a hash was also a good way to solve the scoping problem. Thanks for the trick. Yes, let's say I have the following: feats = [Task(features,img) for img in glob('*.png')] cluster = Task(kmeans,feats,k=10) then the hash for cluster is computed from its arguments: * kmeans : the function name * feats: this is a list of tasks, therefore I use its hash, which is defined by its argument, which is a simple string. * k=10: this is a literal. I don't need to use the value computed by feats to compute the hash for cluster. Your task-based approach, and the API you have built around it, reminds my a bit of twisted deferred. Have you studied this API? No. I will look into it. Thanks. bye, Luis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] No Copy Reduce Operations
Travis E. Oliphant wrote: Your approach using C++ templates is interesting, and I'm very glad for your explanation and your releasing of the code as open source.I'm not prepared to start using C++ in NumPy, however, so your code will have to serve as an example only. I will keep this as a separate package. I will write back one I have put it up somewhere. If this is not going into numpy, then I will do things a little differently, namely, I will have a very simple python layer which cleans up the arguments before calling the C++ implementation. bye, Luis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] No Copy Reduce Operations
Hello all, Numpy arrays come with several reduce operations: sum(), std(), argmin(), min(), The traditional implementation of these suffers from two big problems: It is slow and it often allocates intermediate memory. I have code that is failing with OOM (out of memory) exceptions in calls to ndarray.std(). I regularly handle arrays with 100 million entries (have a couple of million objects * 20 features per object = 100 million doubles), so this is a real problem for me. This being open-source, I decided to solve the problem. My first idea was to try to improve the numpy code. I failed to see how to do that while supporting everything that numpy does (multiple types, for example), so I started an implementation of reduce operations that uses C++ templates to make code optimised into the types it actually uses, choosing the right version to use at run time. In the spirit or release-early/release-often, I attach the first version of this code that runs. BASIC IDEA === ndarray.std does basically the following (The examples are in pseudo-code even though the implementation happens to be in C): def stddev(A): mu = A.mean() diff=(A-mu) maybe_conj=(diff if not complex(A) else diff.conjugate()) diff *= maybe_conj return diff.sum() With a lot of temporary arrays. My code does: def stddev(A): mu = A.mean() # No getting around this temporary std = 0 for i in xrange(A.size): diff = (A[i]-mu) if complex(A): diff *= conjugate(diff) else: diff *= diff std += diff return sqrt(diff/A.size) Of course, my code does it while taking into account the geometry of the array. It handles arrays with arbitrary strides. I do it while avoiding copying the array at any time (while most of the existing code will transpose/copy the array so that it is well behaved in memory). IMPLEMENTATION === I have written some template infrastructure so that, if I wanted a very fast entropy calculation, on a normalised array, you could do: template ... void compute_entropy(BaseIterator data, BaseIterator past, ResultsIterator results) { while (data != past) { if (*data) *result += *data * std::log(*data); ++data; ++result; } } and the machinery will instantiate this in several variations, deciding at run-time which one to call. You just have to write a C interface function like PyObject* fast_entropy(PyArrayObject *self, PyObject *args, PyObject *kwds) { int axis=NPY_MAXDIMS; PyArray_Descr *dtype=NULL; PyArrayObject *out=NULL; static char *kwlist[] = {array,axis, dtype, out, NULL}; if (!PyArg_ParseTupleAndKeywords(args, kwds, O|OOO, kwlist, self, PyArray_AxisConverter,axis, PyArray_DescrConverter2, dtype, PyArray_OutputConverter, out)) { Py_XDECREF(dtype); return NULL; } int num = _get_type_num_double(self-descr, dtype); Py_XDECREF(dtype); return compress_dispatchEntropyCompute(self, out, axis, num, EmptyType()); // This decides which function to call } For contiguous arrays, axis=None, this becomes void compute_entropy(Base* data, Base* past, Results* result) { while (data != past) { if (*data) *result += *data * std::log(*data); ++data; } } which is probably as fast as it can be. If the array is not contiguous, this becomes void compute_entropy(numpy_iteratorBase data, numpy_iteratorBase past, Results* result) { while (data != past) { if (*data) *result += *data * std::log(*data); ++data; } } where numpy_iterator is a type that knows how to iterate over numpy arrays following strides. If axis is not None, then the result will not be a single value, it will be an array. The machinery will automatically create the array of the right size and pass it to you with so that the following gets instantiated: void compute_entropy(numpy_iteratorBase data, numpy_iteratorBase past, numpy_iteratorResultType results) { while (data != past) { if (*data) *results += *data * std::log(*data); ++data; ++results; } } The results parameter has the strides set up correctly to iterate over results, including looping back when necessary so that the code works as it should. Notice that the ++results operation seems to be dropping in and out. In fact, I was oversimplifying above. There is no instantiation with Result*, but with no_iterationResult which behaves like Result*, but with an empty operator ++(). You never change your template implementation. (The code above was for explanatory