Re: [Numpy-discussion] ndarray.T2 for 2D transpose
On Mon, Apr 11, 2016 at 5:24 PM Chris Barkerwrote: > On Fri, Apr 8, 2016 at 4:37 PM, Ian Henriksen < > insertinterestingnameh...@gmail.com> wrote: > > >> If we introduced the T2 syntax, this would be valid: >> >> a @ b.T2 >> >> It makes the intent much clearer. >> > > would: > > a @ colvector(b) > > work too? or is T2 generalized to more than one column? (though I suppose > colvector() could support more than one column also -- weird though that > might be.) > > -CHB > Right, so I've opted to withdraw my support for having the T2 syntax prepend dimensions when the array has fewer than two dimensions. Erroring out in the 1D case addresses my concerns well enough. The colvec/rowvec idea seems nice too, but it matters a bit less to me, so I'll leave that discussion open for others to follow up on. Having T2 be a broadcasting transpose is a bit more general than any semantics for rowvec/colvec that I can think of. Here are specific arrays that, in the a @ b.T2 can only be handled using some sort of transpose: a = np.random.rand(2, 3, 4) b = np.random.rand(2, 1, 3, 4) Using these inputs, the expression a @ b.T2 would have the shape (2, 2, 3, 3). All the T2 property would be doing is a transpose that has similar broadcasting semantics to matmul, solve, inv, and the other gufuncs. The primary difference with those other functions is that transposes would be done as views whereas the other operations, because of the computations they perform, all have to return new output arrays. Hope this helps, -Ian Henriksen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ndarray.T2 for 2D transpose
On Fri, Apr 8, 2016 at 4:37 PM, Ian Henriksen < insertinterestingnameh...@gmail.com> wrote: > If we introduced the T2 syntax, this would be valid: > > a @ b.T2 > > It makes the intent much clearer. > would: a @ colvector(b) work too? or is T2 generalized to more than one column? (though I suppose colvector() could support more than one column also -- weird though that might be.) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)
On Mon, Apr 11, 2016 at 5:39 AM, Matěj Týčwrote: > * ... I do see some value in providing a canonical right way to > construct shared memory arrays in NumPy, but I'm not very happy with > this solution, ... terrible code organization (with the global > variables): > * I understand that, however this is a pattern of Python > multiprocessing and everybody who wants to use the Pool and shared > data either is familiar with this approach or has to become familiar > with[2, 3]. The good compromise is to have a separate module for each > parallel calculation, so global variables are not a problem. > OK, we can agree to disagree on this one. I still don't think I could get code using this pattern checked in at my work (for good reason). > * If there's some way to we can paper over the boilerplate such that > users can use it without understanding the arcana of multiprocessing, > then yes, that would be great. But otherwise I'm not sure there's > anything to be gained by putting it in a library rather than referring > users to the examples on StackOverflow [1] [2]. > * What about telling users: "You can use numpy with multiprocessing. > Remeber the multiprocessing.Value and multiprocessing.Aray classes? > numpy.shm works exactly the same way, which means that it shares their > limitations. Refer to an example: ." Notice that > although those SO links contain all of the information, it is very > difficult to get it up and running for a newcomer like me few years > ago. > I guess I'm still not convinced this is the best we can with the multiprocessing library. If we're going to do this, then we definitely need to have the fully canonical example. For example, could you make the shared array a global variable and then still pass references to functions called by the processes anyways? The examples on stackoverflow that we're both looking are varied enough that it's not obvious to me that this is as good as it gets. * This needs tests and justification for custom pickling methods, > which are not used in any of the current examples. ... > * I am sorry, but don't fully understand that point. The custom > pickling method of shmarray has to be there on Windows, but users > don't have to know about it at all. As noted earlier, the global > variable is the only way of using standard Python multiprocessing.Pool > with shared objects. > That sounds like a fine justification, but given that it wasn't obvious you needs a comment saying as much in the source code :). Also, it breaks pickle, which is another limitation that needs to be documented. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)
Dear Numpy developers, I propose a pull request https://github.com/numpy/numpy/pull/7533 that features numpy arrays that can be shared among processes (with some effort). Why: In CPython, multiprocessing is the only way of how to exploit multi-core CPUs if your parallel code can't avoid creating Python objects. In that case, CPython's GIL makes threads unusable. However, unlike with threading, sharing data among processes is something that is non-trivial and platform-dependent. Although numpy (and certainly some other packages) implement some operations in a way that GIL is not a concern, consider another case: You have a large amount of data in a form of a numpy array and you want to pass it to a function of an arbitrary Python module that also expects numpy array (e.g. list of vertices coordinates as an input and array of the corresponding polygon as an output). Here, it is clear GIL is an issue you and since you want a numpy array on both ends, now you would have to copy your numpy array to a multiprocessing.Array (to pass the data) and then to convert it back to ndarray in the worker process. This contribution would streamline it a bit - you would create an array as you are used to, pass it to the subprocess as you would do with the multiprocessing.Array, and the process can work with a numpy array right away. How: The idea is to create a numpy array in a buffer that can be shared among processes. Python has support for this in its standard library, so the current solution creates a multiprocessing.Array and then passes it as the "buffer" to the ndarray.__new__. That would be it on Unixes, but on Windows, there has to be a a custom pickle method, otherwise the array "forgets" that its buffer is that special and the sharing doesn't work. Some of what has been said in the pull request & my answer to that: * ... I do see some value in providing a canonical right way to construct shared memory arrays in NumPy, but I'm not very happy with this solution, ... terrible code organization (with the global variables): * I understand that, however this is a pattern of Python multiprocessing and everybody who wants to use the Pool and shared data either is familiar with this approach or has to become familiar with[2, 3]. The good compromise is to have a separate module for each parallel calculation, so global variables are not a problem. * Can you explain why the ndarray subclass is needed? Subclasses can be rather annoying to get right, and also for other reasons. * The shmarray class needs the custom pickler (but only on Windows). * If there's some way to we can paper over the boilerplate such that users can use it without understanding the arcana of multiprocessing, then yes, that would be great. But otherwise I'm not sure there's anything to be gained by putting it in a library rather than referring users to the examples on StackOverflow [1] [2]. * What about telling users: "You can use numpy with multiprocessing. Remeber the multiprocessing.Value and multiprocessing.Aray classes? numpy.shm works exactly the same way, which means that it shares their limitations. Refer to an example: ." Notice that although those SO links contain all of the information, it is very difficult to get it up and running for a newcomer like me few years ago. * This needs tests and justification for custom pickling methods, which are not used in any of the current examples. ... * I am sorry, but don't fully understand that point. The custom pickling method of shmarray has to be there on Windows, but users don't have to know about it at all. As noted earlier, the global variable is the only way of using standard Python multiprocessing.Pool with shared objects. [1]: http://stackoverflow.com/questions/10721915/shared-memory-objects-in-python-multiprocessing [2]: http://stackoverflow.com/questions/7894791/use-numpy-array-in-shared-memory-for-multiprocessing [3]: http://stackoverflow.com/questions/1675766/how-to-combine-pool-map-with-array-shared-memory-in-python-multiprocessing ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] f2py: ram usage
Using order='F' solved the problem. Thanks for reply. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] f2py: ram usage
Yes, f2py is probably copying the arrays; you can check this by appending -DF2PY_REPORT_ON_ARRAY_COPY=1 to your call to f2py. I normally prefer to keep the numpy arrays C-order (most efficient for numpy) and simply pass the array transpose to the f2py-ized fortran routine. This means that the fortran array indices are reversed, but this is the most natural way in any case. --George Nurser On 10 April 2016 at 11:53, Sebastian Bergwrote: > On So, 2016-04-10 at 12:04 +0200, Vasco Gervasi wrote: > > Hi all, > > I am trying to write some code to do calculation onto an array: for > > each row I need to do some computation and have a number as return. > > To speed up the process I wrote a fortran subroutine that is called > > from python [using f2py] for each row of the array, so the input of > > this subroutine is a row and the output is a number. > > This method works but I saw some speed advantage if I pass the entire > > array to fortran and then, inside fortran, call the subroutine that > > does the math; so in this case I pass an array and return a vector. > > But I noticed that when python pass the array to fortran, the array > > is copied and the RAM usage double. > > I expect that the fortran code needs your arrays to be fortran > contiguous, so the wrappers need to copy them. > > The easiest solution may be to create your array in python with the > `order="F"` flag. NumPy will have a tendency to prefer C-order and uses > it as default though when doing something with an "F" ordered array. > > That said, I have never used f2py, so these are just well founded > guesses. > > - Sebastian > > > > > Is there a way to "move" the array to fortran, I don't care if the > > array is lost after the call to fortran. > > The pyd module is generated using: python f2py.py -c --opt="-ffree > > -form -Ofast" -m F2PYMOD F2PYMOD.f90 > > > > Thanks > > Vasco > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion