Re: [Numpy-discussion] Overlapping time series
Daniele Nicolodi wrote: > Sure, I realize that, thank for the clarification. The arrays are quite > small, then the three loops and the temporary take negligible time and > memory in the overall processing. If they are small, a Python loop would do the job as well. And if it doesn't, it is just a matter of adding a couple of decorators to use Numba JIT. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
On 11/02/2014 18:18, Sturla Molden wrote: > Daniele Nicolodi wrote: > >> That's more or less my current approach (except that I use the fact that >> the data is evenly samples to use np.where(np.diff(t1) != dt) to detect >> the regions of continuous data, to avoid the loop. > > I hope you realize that np.where(np.diff(t1) != dt) generates three loops, > as well as two temporary arrays and one output array. If you do what I > suggested, you get one loop and no temporaries. But you will need Numba or > Cython to get full speed. Sure, I realize that, thank for the clarification. The arrays are quite small, then the three loops and the temporary take negligible time and memory in the overall processing. I was lazy and I did not want to write in Cython, however I didn't benchmark the exact solution you propose written in pure python... Cheers, Daniele ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Geometrically defined masking arrays; how to optimize?
A small improvement: # Dimensions N = 100 M = 200 # Coordinates of the centre x0 = 40 y0 = 70 x, y = np.meshgrid(np.arange(N) - x0, np.arange(M) - y0, sparse=True) # Center at 40, 70, radius 20 out_circle = (x * x + y * y < 20**2) out_ring = out_circle - (x * x + y * y < 10**2) plt.imshow(out_circle) plt.figure() plt.imshow(out_ring) plt.show() Using sparse you can avoid the repetition of the arrays, getting the same functionality. Also, if the image is big, you can delegate the computations of out_circle and out_ring to numexpr for speed. /David. On 11 February 2014 22:16, Daπid wrote: > Here it is an example: > > import numpy as np > import pylab as plt > > N = 100 > M = 200 > x, y = np.meshgrid(np.arange(N), np.arange(M)) > > # Center at 40, 70, radius 20 > x -= 40 > y -= 70 > out_circle = (x * x + y * y < 20**2) > > out_ring = out_circle - (x * x + y * y < 10**2) > > plt.imshow(out_circle) > plt.figure() > plt.imshow(out_ring) > plt.show() > > Note that I have avoided taking the costly square root of each element by > just taking the square of the radius. It can also be generalised to > ellipses or rectangles, if you need it. > > Also, don't use 0 as a False value, and don't force it to be a 0. Instead, > use "if not ring:" > > > /David > > > > > > On 11 February 2014 21:56, Wolfgang Draxinger < > wolfgang.draxin...@physik.uni-muenchen.de> wrote: > >> Hi, >> >> I implemented the following helper function to create masking arrays: >> >> def gen_mask(self, ring, sector): >> in_segment = None >> if ring == 0: >> radius = self.scales()[0] >> def in_center_circle(xy): >> dx = xy[0] - self.xy[0] >> dy = xy[1] - self.xy[1] >> r = math.sqrt( dx**2 + dy**2 ) >> return r < radius >> in_segment = in_center_circle >> else: >> angle = ( self.a_sector(sector, ring), >> self.a_sector( (sector+1) % self.n_sectors(), >> ring) ) >> radii = self.scales()[ring:ring+1] >> def in_segment_ring(xy): >> dx = xy[0] - self.xy[0] >> dy = xy[1] - self.xy[1] >> r = math.sqrt( dx**2 + dy**2 ) >> a = math.atan2( dy, dx ) >> return r >= radii[0] and r < radii[1] \ >>and a >= angle[0] and a < angle[1] >> in_segment = in_segment_ring >> >> width,height = self.arr.shape >> >> mask = numpy.zeros(shape=(width, height), dtype=numpy.bool) >> for y in range(height): >> for x in range(width): >> mask[x][y] = in_segment((x,y)) >> return mask >> >> self.scales() generates a list of radius scaling factors. >> self.a_sector() gives the dividing angle between sectors "sector" and >> "sector+1" on the given ring. >> >> The function works, it generates the masks as I need it. The problem is >> - of course - that it's quite slow due to the inner loops the perform >> the geometrical test if the given element of the array self.arr is >> withing the bounds of the ring-sector for which the mask is generated. >> >> I wonder if you guys have some ideas, on how I could accelerate this. >> This can be perfectly well constructed by boolean combination of >> elementary geometrical objects. For example a ring would be >> >> ring(p, r0, r1): disk(p, r1) xor disk(p, r0) # where r0 < r1 >> >> The sector would then be >> >> ring_sector(p, r, s): ring(p, r, r + ...) and sector(p, s, s+1) >> >> I'd like to avoid doing this using some C helper routine. I'm looking >> for something that is the most efficient when it comes to >> "speedup"/"time invested to develop this". >> >> >> Cheers, >> >> Wolfgang >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Geometrically defined masking arrays; how to optimize?
Here it is an example: import numpy as np import pylab as plt N = 100 M = 200 x, y = np.meshgrid(np.arange(N), np.arange(M)) # Center at 40, 70, radius 20 x -= 40 y -= 70 out_circle = (x * x + y * y < 20**2) out_ring = out_circle - (x * x + y * y < 10**2) plt.imshow(out_circle) plt.figure() plt.imshow(out_ring) plt.show() Note that I have avoided taking the costly square root of each element by just taking the square of the radius. It can also be generalised to ellipses or rectangles, if you need it. Also, don't use 0 as a False value, and don't force it to be a 0. Instead, use "if not ring:" /David On 11 February 2014 21:56, Wolfgang Draxinger < wolfgang.draxin...@physik.uni-muenchen.de> wrote: > Hi, > > I implemented the following helper function to create masking arrays: > > def gen_mask(self, ring, sector): > in_segment = None > if ring == 0: > radius = self.scales()[0] > def in_center_circle(xy): > dx = xy[0] - self.xy[0] > dy = xy[1] - self.xy[1] > r = math.sqrt( dx**2 + dy**2 ) > return r < radius > in_segment = in_center_circle > else: > angle = ( self.a_sector(sector, ring), > self.a_sector( (sector+1) % self.n_sectors(), > ring) ) > radii = self.scales()[ring:ring+1] > def in_segment_ring(xy): > dx = xy[0] - self.xy[0] > dy = xy[1] - self.xy[1] > r = math.sqrt( dx**2 + dy**2 ) > a = math.atan2( dy, dx ) > return r >= radii[0] and r < radii[1] \ >and a >= angle[0] and a < angle[1] > in_segment = in_segment_ring > > width,height = self.arr.shape > > mask = numpy.zeros(shape=(width, height), dtype=numpy.bool) > for y in range(height): > for x in range(width): > mask[x][y] = in_segment((x,y)) > return mask > > self.scales() generates a list of radius scaling factors. > self.a_sector() gives the dividing angle between sectors "sector" and > "sector+1" on the given ring. > > The function works, it generates the masks as I need it. The problem is > - of course - that it's quite slow due to the inner loops the perform > the geometrical test if the given element of the array self.arr is > withing the bounds of the ring-sector for which the mask is generated. > > I wonder if you guys have some ideas, on how I could accelerate this. > This can be perfectly well constructed by boolean combination of > elementary geometrical objects. For example a ring would be > > ring(p, r0, r1): disk(p, r1) xor disk(p, r0) # where r0 < r1 > > The sector would then be > > ring_sector(p, r, s): ring(p, r, r + ...) and sector(p, s, s+1) > > I'd like to avoid doing this using some C helper routine. I'm looking > for something that is the most efficient when it comes to > "speedup"/"time invested to develop this". > > > Cheers, > > Wolfgang > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Geometrically defined masking arrays; how to optimize?
Hi, I implemented the following helper function to create masking arrays: def gen_mask(self, ring, sector): in_segment = None if ring == 0: radius = self.scales()[0] def in_center_circle(xy): dx = xy[0] - self.xy[0] dy = xy[1] - self.xy[1] r = math.sqrt( dx**2 + dy**2 ) return r < radius in_segment = in_center_circle else: angle = ( self.a_sector(sector, ring), self.a_sector( (sector+1) % self.n_sectors(), ring) ) radii = self.scales()[ring:ring+1] def in_segment_ring(xy): dx = xy[0] - self.xy[0] dy = xy[1] - self.xy[1] r = math.sqrt( dx**2 + dy**2 ) a = math.atan2( dy, dx ) return r >= radii[0] and r < radii[1] \ and a >= angle[0] and a < angle[1] in_segment = in_segment_ring width,height = self.arr.shape mask = numpy.zeros(shape=(width, height), dtype=numpy.bool) for y in range(height): for x in range(width): mask[x][y] = in_segment((x,y)) return mask self.scales() generates a list of radius scaling factors. self.a_sector() gives the dividing angle between sectors "sector" and "sector+1" on the given ring. The function works, it generates the masks as I need it. The problem is - of course - that it's quite slow due to the inner loops the perform the geometrical test if the given element of the array self.arr is withing the bounds of the ring-sector for which the mask is generated. I wonder if you guys have some ideas, on how I could accelerate this. This can be perfectly well constructed by boolean combination of elementary geometrical objects. For example a ring would be ring(p, r0, r1): disk(p, r1) xor disk(p, r0) # where r0 < r1 The sector would then be ring_sector(p, r, s): ring(p, r, r + ...) and sector(p, s, s+1) I'd like to avoid doing this using some C helper routine. I'm looking for something that is the most efficient when it comes to "speedup"/"time invested to develop this". Cheers, Wolfgang ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestions for GSoC Projects
Hi, 08.02.2014 06:16, Stéfan van der Walt kirjoitti: > On 8 Feb 2014 04:51, "Ralf Gommers" wrote: >> >>> Members of the dipy team would also be interested. >> >> That's specifically for the spherical harmonics topic right? > > Right. Spherical harmonics are used as bases in many of DiPy's > reconstruction algorithms. If help is needed with a GSoC project for scipy.special, I'm in principle available to chip in co-mentoring, or just trying to help answer questions. Best, Pauli ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestions for GSoC Projects
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, 04.02.2014 20:30, jennifer stone kirjoitti: > 3. As stated earlier, we have spherical harmonic functions (with > much scope >>> for dev) we are yet to have elliptical and cylindrical harmonic >>> function, which may be developed. >> >> This sounds very doable. How much work do you think would be >> involved? > > As Stefan so rightly pointed out, the function for spherical > harmonic function, sph_harm at present calls lpmn thus evaluating > all orders gives me a feeling that it would be very well possible to avoid > that by maybe avoiding the dependence on lpmn. > > Further, we can introduce ellipsoidal harmonic functions of first > kind and the second kind. I am confident about about the > implementation of ellipsoidal H function of first kind but don't > know much about the second kind. But I believe we can work it out > in due course.And cylindrical harmonics can be carried out using > Bessel functions. It's not so often someone wants to work on scipy.special, so you'd be welcome to improve it :) The general structure of work on special functions goes as follows: - - Check if there is a license-compatible implementation that someone has already written. This is usually not the case. - - Find formulas for evaluating the function in terms of more primitive operations. (Ie. power series, asymptotic series, continued fractions, expansions in terms of other special functions, ...) - - Determine the parameter region where the expansions converge in a floating point implementation, and select algorithms appropriately. Here it helps if you find a research paper where the author has already thought about what sort of an approach works best. - - Life is usually made *much* easier thanks to Fredrik Johansson's prior work on arbitrary-precision arithmetic library mpmath http://code.google.com/p/mpmath/ It can usually be used to check the "true" values of the functions. Also it contains implementations of algorithms for evaluating special functions, but because mpmath works with arbitrary precision numbers, these algorithms are not directly usable for floating-point calculations, as in floating point you cannot adjust the precision of the calculation dynamically. Moreover, the arbitrary-precision arithmetic can be slow compared to a more optimized floating point implementations. Spherical harmonics might be a reasonable part of a GSoC proposal. However, note that there exists also a *second* Legendre polynomial function `lpmv`, which doesn't store the values of the previous N functions. There's one numerical problem in the current way of evaluation via ~Pmn(cos(theta)), which is that this approach seems to lose relatively much precision at large orders for certain values of theta. I don't recall now exactly how imprecise it becomes at large orders, but it may be necessary to check. Adding new special functions also sounds like an useful project. Here, it helps if they are something that you expect you will need later on :) There's also the case that several of the functions in Scipy have only implementations for real-valued inputs, although the functions would be defined on the whole complex plane. A list of the situation is here: https://github.com/scipy/scipy/blob/master/scipy/special /generate_ufuncs.py#L85 Lowercase d correspond to real-valued implementations, uppercase D to complex-valued. I'm not at the moment completely sure which would have the highest priority --- whether you need this or not really depends on the application. If you want additional ideas about possible things to fix in scipy.special, take a look at this file: https://github.com/scipy/scipy/blob/master/scipy/special/tests /test_mpmath.py#L648 The entries marked @knownfailure* have some undiagnosed issues in the implementation, which might be useful to look into. However: most of these have to do with corner cases in hypergeometric functions. Trying to address those is likely a risky GSoC topic, as the multi-argument hyp* functions are challenging to evaluate in floating point. (mpmath and Mathematica can evaluate them in most parameter regimes, but AFAIK both require arbitrary-precision methods for this.) So I think there would be a large number of possible things to do here, and help would be appreciated. - -- Pauli Virtanen -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) iEYEARECAAYFAlL6iwAACgkQ6BQxb7O0pWBfOgCfYHAB12N4FWDmrqx8/ORTBRps pXYAoL3ufAiShe+0qTEGfEvrmDgr1X0p =kAwF -END PGP SIGNATURE- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
Hi, On Tue, Feb 11, 2014 at 11:40 AM, Pauli Virtanen wrote: > 11.02.2014 21:20, alex kirjoitti: > [clip] >> In the spirit of offsetting this bias and because this thread is >> lacking in examples of projects that use numpy.matrix, here's another >> data point: cvxpy (https://github.com/cvxgrp/cvxpy) is a serious >> active project that supports the numpy.matrix interface, for example >> as in >> https://github.com/cvxgrp/cvxpy/tree/master/cvxpy/interface/numpy_interface. > > Here's some more data: > > http://nullege.com/codes/search?cq=numpy.matrix These are quite revealing. Some of the top hits are from old nipy code. We don't have any use of np.matrix in the current nipy trunk. Another couple of hits are from the 'Pylon' package: https://github.com/rwl/pylon - of this form: scipy.io.mmwrite("./data/fDC.mtx", numpy.matrix(f_dc)) When I ran the code without the numpy.matrix call it was easy to see how that happened: In [4]: scipy.io.mmwrite('fDc.mtx', f_dc) --- ValueErrorTraceback (most recent call last) ... ValueError: expected matrix Of course, what the message means is that you should pass a 2D array: scipy.io.mmwrite('fDc.mtx', f_dc[None]) works fine. So I think this is another example of confusion caused by np.matrix. https://github.com/scipy/scipy/pull/3310 Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
11.02.2014 21:20, alex kirjoitti: [clip] > In the spirit of offsetting this bias and because this thread is > lacking in examples of projects that use numpy.matrix, here's another > data point: cvxpy (https://github.com/cvxgrp/cvxpy) is a serious > active project that supports the numpy.matrix interface, for example > as in > https://github.com/cvxgrp/cvxpy/tree/master/cvxpy/interface/numpy_interface. Here's some more data: http://nullege.com/codes/search?cq=numpy.matrix http://nullege.com/codes/search?cq=numpy.array -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
On Tue, Feb 11, 2014 at 12:05 PM, Matthew Brett wrote: > Hi, > > On Tue, Feb 11, 2014 at 8:55 AM, wrote: >> >> >> On Tue, Feb 11, 2014 at 11:25 AM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> On Tue, Feb 11, 2014 at 4:16 AM, Jacco Hoekstra - LR >>> wrote: >>> > For our students, the matrix class is really appealing as we use a lot >>> > of linear algebra and expressions with matrices simply look better with an >>> > operator instead of a function: >>> > >>> > x=A.I*b >>> > >>> > looks much better than >>> > >>> > x = np.dot(np.linalg.inv(A),b) >>> >>> Yes, but: >>> >>> 1) as Alan has mentioned, the dot method helps a lot. >>> >>> import numpy.linalg as npl >>> >>> x = npl.inv(A).dot(b) >>> >>> 2) Overloading the * operator means that you've lost * to do >>> element-wise operations. MATLAB has a different operator for that, >>> '.*' - and it's very easy to forget the dot. numpy makes this more >>> explicit - you read 'dot' as 'dot'. >>> >>> > And this gets worse when the expression is longer: >>> > >>> > x = R.I*A*R*b >>> > >>> > becomes: >>> > >>> > x = np.dot( np.linalg.inv(R), np.dot(A, np.dot(R, b))) >>> >>> x = npl.inv(R).dot(A.dot(R.dot(b)) >>> >>> > Actually, not being involved in earlier discussions on this topic, I was >>> > a bit surprised by this and do not see the problem of having the matrix >>> > class as nobody is obliged to use it. I tried to find the reasons, but did >>> > not find it in the thread mentioned. Maybe someone could summarize the >>> > main >>> > problem with keeping this class for newbies on this list like me? >>> > >>> > Anyway, I would say that there is a clear use for the matrix class: >>> > readability of linear algebra code and hence a lower chance of errors, so >>> > higher productivity. >>> >>> Yes, but it looks like there are not any developers on this list who >>> write substantial code with the np.matrix class, so, if there is a >>> gain in productivity, it is short-lived, soon to be replaced by a >>> cost. >> >> >> selection bias ! >> >> I have seen lots of Matlab and GAUSS code written by academic >> econometricians that have been programming for years but are not >> "developers", code that is "inefficient" and numerically not very stable but >> looks just like the formulas. > > Yes, I used to use matlab myself. > > There is certainly biased sampling on this list, so it's very > difficult to say whether there is a large constituency of np.matrix > users out there, it's possible. I hope not, because I think they > would honestly be better served with ndarray even if some of the lines > in their script don't look quite as neat. In the spirit of offsetting this bias and because this thread is lacking in examples of projects that use numpy.matrix, here's another data point: cvxpy (https://github.com/cvxgrp/cvxpy) is a serious active project that supports the numpy.matrix interface, for example as in https://github.com/cvxgrp/cvxpy/tree/master/cvxpy/interface/numpy_interface. This project is from a somewhat famous Stanford lab (http://www.stanford.edu/~boyd/index.html) and they are currently running a MOOC (http://en.wikipedia.org/wiki/Massive_open_online_course) for convex optimization (https://class.stanford.edu/courses/Engineering/CVX101/Winter2014/about). This course currently uses a MATLAB-based modeling system (http://cvxr.com/cvx/) but they are trying to switch to, or at least support, Python. But they have not yet been able to get cvxpy to a mature enough state to use for their course. Maybe the simple numpy.matrix syntax has accelerated their progress so that they have been able to reach cvxpy's currently somewhat usable state, or maybe the extra work to deal with numpy.matrix has slowed their progress so that we are using MATLAB instead of Python as the standard for training Stanford undergrads and random internet MOOC participants, or maybe their progress has little to do with numpy.matrix. I'm not sure which is the case. But in any case, they use numpy.matrix explicitly in their project. Alex ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
On Tue, Feb 11, 2014 at 12:14 PM, Chris Barker wrote: > On Tue, Feb 11, 2014 at 8:25 AM, Matthew Brett > wrote: >> >> > Anyway, I would say that there is a clear use for the matrix class: >> > readability of linear algebra code and hence a lower chance of errors, so >> > higher productivity. >> >> Yes, but it looks like there are not any developers on this list who >> write substantial code with the np.matrix class, so, if there is a >> gain in productivity, it is short-lived, soon to be replaced by a >> cost. > > > to re-iterate: > > matrix is NOT for newbies, nor is it for higher productivity or fewer errors > in production code -- the truth is, the ratio of linear algebra expressions > like the above to element-wise, etc. operations that ndarray is well suited > to is tiny in "real" code. Did anyone that used MATLAB for significant > problems get not get really really annoyed by all the typing of ".*" ? > > What matrix is good for is what someone here described as "domain specific > language" -- i.e. that little bit of code that really is doing mostly linear > algebra. This point would suggest that the "domain specific language" defined by the numpy.matrix semantics would be particularly useful for representations of linear operators for which elementwise modification might be less efficient (for example as in some implementations of sparse matrices) or essentially unavailable (for example as in matrix-free linear operators). Alex ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
On Tue, Feb 11, 2014 at 12:14 PM, Chris Barker wrote: > On Tue, Feb 11, 2014 at 8:25 AM, Matthew Brett wrote: > >> > Anyway, I would say that there is a clear use for the matrix class: >> readability of linear algebra code and hence a lower chance of errors, so >> higher productivity. >> >> Yes, but it looks like there are not any developers on this list who >> write substantial code with the np.matrix class, so, if there is a >> gain in productivity, it is short-lived, soon to be replaced by a >> cost. >> > > to re-iterate: > > matrix is NOT for newbies, nor is it for higher productivity or fewer > errors in production code -- the truth is, the ratio of linear algebra > expressions like the above to element-wise, etc. operations that ndarray is > well suited to is tiny in "real" code. Did anyone that used MATLAB for > significant problems get not get really really annoyed by all the typing of > ".*" ? > > What matrix is good for is what someone here described as "domain specific > language" -- i.e. that little bit of code that really is > doing mostly linera algebra. So it's a nice tool for teaching and > experimenting with linear-algebra-based concepts. > http://www.mathworks.com/matlabcentral/fileexchange/27095-tsls-2sls/content/tsls.m https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/regression/gmm.py#L134 If I were not strict ndarray coder, I might prefer to wrap an ndarray and use only matrices inside a function and ndarrays outside for the code that is not linear algebra. > > To address Alan's question about duck-typing -- one of the things folks > like to do with duck-typed functions and method is return the type that is > passed in when possible: i.e use asanyarray(), rather than asarray() -- but > this is really going to be broken with matrix, as the symantics have > changed. So we could say "don't expect that to work with matrix", but that > requires one of: > > 1) folks always use asarray() and return an array, rather than a matrix to > the caller -- not too bad, folks that want matrix can use np.matrix to get > it back (a bit ugly, though..) however, this means that any other array > subclass will get mangled as well... > scipy.linalg has an arraywrap on input and output. (at least when I looked a few years ago) (statsmodels has a pandas wrapper that converts arguments and returns to have ndarrays internally) some packages have helper functions to make a consistent interface to ndarrays and sparce "matrices" scipy.stats still doesn't protect against masked arrays and nans. IMO: that's life. Removing matrices from numpy doesn't make the problem go away. Although the problem could be pushed to other packages. But if nobody uses matrices, then we would have at least **one** problem less. > > 2) folks use asanyarray(), and it will break, maybe painfully, when a > matrix is passed in -- folks using matrixes would need to use .A when > calling such functions. This really seems ripe for confusion. > There are many ndarray subclasses out there, and I have no idea how they behave. pandas.Series was until recently a ndarray subclass, that didn't quite behave like one. We had to fix some bugs in statsmodels where we accidentially use asanyarray instead of asarray. Josef > > The truth is that the "right" way to do all this is to > have different operators, rather than different objects, but that's been > tried and did not fly. > > -Chris > > > > > > > > > > > > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] record arrays with char*?
On Tue, Feb 11, 2014 at 8:57 AM, Christopher Jordan-Squire wrote: > Thanks for the answers! My responses are inline. my C is a bit weak, so forgive my misunderstanding, but: >> { >> char* ind; >> double val, wght; >> } data[] = { {"camera", 15, 2}, {"necklace", 100, 20}, {"vase", 90, 20}, >> {"pictures", 60, 30}, {"tv", 40, 40}, {"video", 15, 30}}; Here is my C weakness -- what does this struct look like in memory? i.e is that first element a pointer, and the actually string data is somewhere else in memory? In which case, where? and how does that memory get managed? Or are the bytes at the begining of the struct with the string data in it? in the "real" case, you say the char* is a string representation of a hex value for a memory address -- so you would know a priory exactly what the length of that sting is, so you could use a numpy struct like: ('S8,f8,f8') i.e 8 bytes to store the string at the begining of the struct. then & the_array[i] would be the address of the beggining of the string -- which may be what you want. HTH, -Chris > >> > >> At the C level, data is passed to the function by directly giving its > >> address. (i.e. the C function takes as an argument (unsigned long) > >> data, casting the data pointer to an int) > > > > > > wow -- that's prone to error! but I"m still not sure which pointer you're > > talking about -- a pointer to this struct? > > > > I mean data. data is an array of structs, but in C that's > (essentially) the same as a pointer to a struct. So data is the > pointer I'm referring to. > > >> I'd like to create something similar using record arrays, such as > >> > >> np.array([("camera", 15, 2), ("necklace", 100, 20), ... ], > >> dtype='object,f8,f8'). > > > > > > > >> > >> Unfortunately this fails because > >> (1) In cython I need to determine the address of the first element and > >> I can't take the address of a an input whose type I don't know (the > >> exact type will vary on the application, so more or fewer fields may > >> be in the C struct) > > > > > > still a bit confused, but if this is types as an array in Cython, you > should > > be abel to do somethign like: > > > > &the_array[i] > > > > to get the address of the ith element. > > > > To do that I need to be able to tell cython the type of the memory > view I give. There are very few examples for non-primitive arrays in > cython, so I'm not sure what that would look like. Or at least I > *think* I need to do that, based on the cython errors I'm getting. > > >> (2) I don't think a python object type is what I want--I need a char* > >> representation of the string. (Unfortunately I can't test this because > >> I haven't solved (1) -- how do you pass a record array around in > >> cython and/or take its address?) > > > > > > well, and object type will give you a pointer to a pyobject. If you know > for > > sure that that pyobject is a string object (probably want a bytes object > -- > > you son't want unicode here), then you should be abel to get the address > of > > the underlying char array. But that would require passing something > > different off to the C code that the address of that element. > > > > You could use an unsigned long for that first field, as you are assuming > > that in the C code anyway but I don't hink there is a way in numpy to > set > > that to a pointer to a char allocated elsewhere -- where would it be > > allocated? > > > > The strings are char*, the unsigned long cast was simply to cast data > to a memory address. (The specific setup of the library was passing a > string with the memory location in hexadecimal. This seems weird, but > it's because the C functions being called are an intermediary to a > simple (third-party) virtual machine running a program compiled from > another language. It doesn't deal with pointers directly, so the other > language is just passed the memory address, in hex and as a string, > for where the data resides. Along with the number of elements in the > block of memory pointed to.) > > > So I would give up on expecting to store the struct directly in numpy > array, > > and rather, put something reasonable (maybe what you have above) in the > > numpy array, and build the C struct you need from that rather than > passing a > > pointer in directly. > > > > That sounds reasonable. I really wanted to avoid this because, as I > mentioned above, I'm just trying to generate the data in numpy and > pass it to this virtual machine running a program compiled from > another language. The form of the struct depends on what was done in > the other language. It could easily have more or fewer fields, have > the fields reordered, etc.. I wanted to avoid having to write wrappers > for all such possibilities and instead use numpy record arrays as > mapping exactly to C structs. But if that's really the only way to go > then I guess I'll have to write struct wrappers in cython. > > > -Chris > > > > > > > > -- > > > > Christopher Barker, Ph.
Re: [Numpy-discussion] Overlapping time series
Daniele Nicolodi wrote: > That's more or less my current approach (except that I use the fact that > the data is evenly samples to use np.where(np.diff(t1) != dt) to detect > the regions of continuous data, to avoid the loop. I hope you realize that np.where(np.diff(t1) != dt) generates three loops, as well as two temporary arrays and one output array. If you do what I suggested, you get one loop and no temporaries. But you will need Numba or Cython to get full speed. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
On Tue, Feb 11, 2014 at 8:25 AM, Matthew Brett wrote: > > Anyway, I would say that there is a clear use for the matrix class: > readability of linear algebra code and hence a lower chance of errors, so > higher productivity. > > Yes, but it looks like there are not any developers on this list who > write substantial code with the np.matrix class, so, if there is a > gain in productivity, it is short-lived, soon to be replaced by a > cost. > to re-iterate: matrix is NOT for newbies, nor is it for higher productivity or fewer errors in production code -- the truth is, the ratio of linear algebra expressions like the above to element-wise, etc. operations that ndarray is well suited to is tiny in "real" code. Did anyone that used MATLAB for significant problems get not get really really annoyed by all the typing of ".*" ? What matrix is good for is what someone here described as "domain specific language" -- i.e. that little bit of code that really is doing mostly linera algebra. So it's a nice tool for teaching and experimenting with linear-algebra-based concepts. To address Alan's question about duck-typing -- one of the things folks like to do with duck-typed functions and method is return the type that is passed in when possible: i.e use asanyarray(), rather than asarray() -- but this is really going to be broken with matrix, as the symantics have changed. So we could say "don't expect that to work with matrix", but that requires one of: 1) folks always use asarray() and return an array, rather than a matrix to the caller -- not too bad, folks that want matrix can use np.matrix to get it back (a bit ugly, though..) however, this means that any other array subclass will get mangled as well... 2) folks use asanyarray(), and it will break, maybe painfully, when a matrix is passed in -- folks using matrixes would need to use .A when calling such functions. This really seems ripe for confusion. The truth is that the "right" way to do all this is to have different operators, rather than different objects, but that's been tried and did not fly. -Chris Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
Hi, On Tue, Feb 11, 2014 at 8:55 AM, wrote: > > > On Tue, Feb 11, 2014 at 11:25 AM, Matthew Brett > wrote: >> >> Hi, >> >> On Tue, Feb 11, 2014 at 4:16 AM, Jacco Hoekstra - LR >> wrote: >> > For our students, the matrix class is really appealing as we use a lot >> > of linear algebra and expressions with matrices simply look better with an >> > operator instead of a function: >> > >> > x=A.I*b >> > >> > looks much better than >> > >> > x = np.dot(np.linalg.inv(A),b) >> >> Yes, but: >> >> 1) as Alan has mentioned, the dot method helps a lot. >> >> import numpy.linalg as npl >> >> x = npl.inv(A).dot(b) >> >> 2) Overloading the * operator means that you've lost * to do >> element-wise operations. MATLAB has a different operator for that, >> '.*' - and it's very easy to forget the dot. numpy makes this more >> explicit - you read 'dot' as 'dot'. >> >> > And this gets worse when the expression is longer: >> > >> > x = R.I*A*R*b >> > >> > becomes: >> > >> > x = np.dot( np.linalg.inv(R), np.dot(A, np.dot(R, b))) >> >> x = npl.inv(R).dot(A.dot(R.dot(b)) >> >> > Actually, not being involved in earlier discussions on this topic, I was >> > a bit surprised by this and do not see the problem of having the matrix >> > class as nobody is obliged to use it. I tried to find the reasons, but did >> > not find it in the thread mentioned. Maybe someone could summarize the main >> > problem with keeping this class for newbies on this list like me? >> > >> > Anyway, I would say that there is a clear use for the matrix class: >> > readability of linear algebra code and hence a lower chance of errors, so >> > higher productivity. >> >> Yes, but it looks like there are not any developers on this list who >> write substantial code with the np.matrix class, so, if there is a >> gain in productivity, it is short-lived, soon to be replaced by a >> cost. > > > selection bias ! > > I have seen lots of Matlab and GAUSS code written by academic > econometricians that have been programming for years but are not > "developers", code that is "inefficient" and numerically not very stable but > looks just like the formulas. Yes, I used to use matlab myself. There is certainly biased sampling on this list, so it's very difficult to say whether there is a large constituency of np.matrix users out there, it's possible. I hope not, because I think they would honestly be better served with ndarray even if some of the lines in their script don't look quite as neat. But in any case, I still don't think that dropping np.matrix is an option in the short or even the medium term. The discussion - I think - is whether we should move towards standardizing to ndarray semantics for clarity and simplicity of future development (and teaching), Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] record arrays with char*?
Thanks for the answers! My responses are inline. On Tue, Feb 11, 2014 at 8:36 AM, Chris Barker wrote: > On Mon, Feb 10, 2014 at 10:43 PM, Christopher Jordan-Squire > wrote: >> >> I'm trying to wrap some C code using cython. The C code can take >> inputs in two modes: dense inputs and sparse inputs. For dense inputs >> the array indexing is naive. I have wrappers for that. In the sparse >> case the matrix entries are typically indexed via names. So, for >> example, the library documentation includes this as input you could >> give: >> >> struct >> { >> char* ind; >> double val, wght; >> } data[] = { {"camera", 15, 2}, {"necklace", 100, 20}, {"vase", 90, 20}, >> {"pictures", 60, 30}, {"tv", 40, 40}, {"video", 15, 30}}; >> >> At the C level, data is passed to the function by directly giving its >> address. (i.e. the C function takes as an argument (unsigned long) >> data, casting the data pointer to an int) > > > wow -- that's prone to error! but I"m still not sure which pointer you're > talking about -- a pointer to this struct? > I mean data. data is an array of structs, but in C that's (essentially) the same as a pointer to a struct. So data is the pointer I'm referring to. >> I'd like to create something similar using record arrays, such as >> >> np.array([("camera", 15, 2), ("necklace", 100, 20), ... ], >> dtype='object,f8,f8'). > > > >> >> Unfortunately this fails because >> (1) In cython I need to determine the address of the first element and >> I can't take the address of a an input whose type I don't know (the >> exact type will vary on the application, so more or fewer fields may >> be in the C struct) > > > still a bit confused, but if this is types as an array in Cython, you should > be abel to do somethign like: > > &the_array[i] > > to get the address of the ith element. > To do that I need to be able to tell cython the type of the memory view I give. There are very few examples for non-primitive arrays in cython, so I'm not sure what that would look like. Or at least I *think* I need to do that, based on the cython errors I'm getting. >> (2) I don't think a python object type is what I want--I need a char* >> representation of the string. (Unfortunately I can't test this because >> I haven't solved (1) -- how do you pass a record array around in >> cython and/or take its address?) > > > well, and object type will give you a pointer to a pyobject. If you know for > sure that that pyobject is a string object (probably want a bytes object -- > you son't want unicode here), then you should be abel to get the address of > the underlying char array. But that would require passing something > different off to the C code that the address of that element. > > You could use an unsigned long for that first field, as you are assuming > that in the C code anyway but I don't hink there is a way in numpy to set > that to a pointer to a char allocated elsewhere -- where would it be > allocated? > The strings are char*, the unsigned long cast was simply to cast data to a memory address. (The specific setup of the library was passing a string with the memory location in hexadecimal. This seems weird, but it's because the C functions being called are an intermediary to a simple (third-party) virtual machine running a program compiled from another language. It doesn't deal with pointers directly, so the other language is just passed the memory address, in hex and as a string, for where the data resides. Along with the number of elements in the block of memory pointed to.) > So I would give up on expecting to store the struct directly in numpy array, > and rather, put something reasonable (maybe what you have above) in the > numpy array, and build the C struct you need from that rather than passing a > pointer in directly. > That sounds reasonable. I really wanted to avoid this because, as I mentioned above, I'm just trying to generate the data in numpy and pass it to this virtual machine running a program compiled from another language. The form of the struct depends on what was done in the other language. It could easily have more or fewer fields, have the fields reordered, etc.. I wanted to avoid having to write wrappers for all such possibilities and instead use numpy record arrays as mapping exactly to C structs. But if that's really the only way to go then I guess I'll have to write struct wrappers in cython. > -Chris > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R(206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.or
Re: [Numpy-discussion] deprecate numpy.matrix
On Tue, Feb 11, 2014 at 11:25 AM, Matthew Brett wrote: > Hi, > > On Tue, Feb 11, 2014 at 4:16 AM, Jacco Hoekstra - LR > wrote: > > For our students, the matrix class is really appealing as we use a lot > of linear algebra and expressions with matrices simply look better with an > operator instead of a function: > > > > x=A.I*b > > > > looks much better than > > > > x = np.dot(np.linalg.inv(A),b) > > Yes, but: > > 1) as Alan has mentioned, the dot method helps a lot. > > import numpy.linalg as npl > > x = npl.inv(A).dot(b) > > 2) Overloading the * operator means that you've lost * to do > element-wise operations. MATLAB has a different operator for that, > '.*' - and it's very easy to forget the dot. numpy makes this more > explicit - you read 'dot' as 'dot'. > > > And this gets worse when the expression is longer: > > > > x = R.I*A*R*b > > > > becomes: > > > > x = np.dot( np.linalg.inv(R), np.dot(A, np.dot(R, b))) > > x = npl.inv(R).dot(A.dot(R.dot(b)) > > > Actually, not being involved in earlier discussions on this topic, I was > a bit surprised by this and do not see the problem of having the matrix > class as nobody is obliged to use it. I tried to find the reasons, but did > not find it in the thread mentioned. Maybe someone could summarize the main > problem with keeping this class for newbies on this list like me? > > > > Anyway, I would say that there is a clear use for the matrix class: > readability of linear algebra code and hence a lower chance of errors, so > higher productivity. > > Yes, but it looks like there are not any developers on this list who > write substantial code with the np.matrix class, so, if there is a > gain in productivity, it is short-lived, soon to be replaced by a > cost. > selection bias ! I have seen lots of Matlab and GAUSS code written by academic econometricians that have been programming for years but are not "developers", code that is "inefficient" and numerically not very stable but looks just like the formulas. fast prototyping for code that is, at most used, for a few papers. (just to avoid misunderstanding: there are also econometricians that are "developers", and write code that is intended for reuse.) (but maybe numpy users are all "developers") Josef > > Cheers, > > Matthew > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] record arrays with char*?
On Mon, Feb 10, 2014 at 10:43 PM, Christopher Jordan-Squire wrote: > I'm trying to wrap some C code using cython. The C code can take > inputs in two modes: dense inputs and sparse inputs. For dense inputs > the array indexing is naive. I have wrappers for that. In the sparse > case the matrix entries are typically indexed via names. So, for > example, the library documentation includes this as input you could > give: > > struct > { > char* ind; > double val, wght; > } data[] = { {"camera", 15, 2}, {"necklace", 100, 20}, {"vase", 90, 20}, > {"pictures", 60, 30}, {"tv", 40, 40}, {"video", 15, 30}}; > > At the C level, data is passed to the function by directly giving its > address. (i.e. the C function takes as an argument (unsigned long) > data, casting the data pointer to an int) > wow -- that's prone to error! but I"m still not sure which pointer you're talking about -- a pointer to this struct? I'd like to create something similar using record arrays, such as > > np.array([("camera", 15, 2), ("necklace", 100, 20), ... ], > dtype='object,f8,f8'). > > Unfortunately this fails because > (1) In cython I need to determine the address of the first element and > I can't take the address of a an input whose type I don't know (the > exact type will vary on the application, so more or fewer fields may > be in the C struct) > still a bit confused, but if this is types as an array in Cython, you should be abel to do somethign like: &the_array[i] to get the address of the ith element. (2) I don't think a python object type is what I want--I need a char* > representation of the string. (Unfortunately I can't test this because > I haven't solved (1) -- how do you pass a record array around in > cython and/or take its address?) > well, and object type will give you a pointer to a pyobject. If you know for sure that that pyobject is a string object (probably want a bytes object -- you son't want unicode here), then you should be abel to get the address of the underlying char array. But that would require passing something different off to the C code that the address of that element. You could use an unsigned long for that first field, as you are assuming that in the C code anyway but I don't hink there is a way in numpy to set that to a pointer to a char allocated elsewhere -- where would it be allocated? So I would give up on expecting to store the struct directly in numpy array, and rather, put something reasonable (maybe what you have above) in the numpy array, and build the C struct you need from that rather than passing a pointer in directly. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
Hi, On Tue, Feb 11, 2014 at 4:16 AM, Jacco Hoekstra - LR wrote: > For our students, the matrix class is really appealing as we use a lot of > linear algebra and expressions with matrices simply look better with an > operator instead of a function: > > x=A.I*b > > looks much better than > > x = np.dot(np.linalg.inv(A),b) Yes, but: 1) as Alan has mentioned, the dot method helps a lot. import numpy.linalg as npl x = npl.inv(A).dot(b) 2) Overloading the * operator means that you've lost * to do element-wise operations. MATLAB has a different operator for that, '.*' - and it's very easy to forget the dot. numpy makes this more explicit - you read 'dot' as 'dot'. > And this gets worse when the expression is longer: > > x = R.I*A*R*b > > becomes: > > x = np.dot( np.linalg.inv(R), np.dot(A, np.dot(R, b))) x = npl.inv(R).dot(A.dot(R.dot(b)) > Actually, not being involved in earlier discussions on this topic, I was a > bit surprised by this and do not see the problem of having the matrix class > as nobody is obliged to use it. I tried to find the reasons, but did not find > it in the thread mentioned. Maybe someone could summarize the main problem > with keeping this class for newbies on this list like me? > > Anyway, I would say that there is a clear use for the matrix class: > readability of linear algebra code and hence a lower chance of errors, so > higher productivity. Yes, but it looks like there are not any developers on this list who write substantial code with the np.matrix class, so, if there is a gain in productivity, it is short-lived, soon to be replaced by a cost. Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
On 11/02/2014 15:38, Sturla Molden wrote: > Daniele Nicolodi wrote: > >> I was probably not that clear: I have two 2xN arrays, one for each data >> recording, one column for time (taken from the same clock for both >> measurements) and one with data values. Each array has some gaps. > > If you want all subarrays where both timeseries are sampled, a single loop > through the data > can fix that. First find the smallest common timestamp. This is the first > starting point. Then loop and follow the timestamps. As long as they are in > synch, do just continue. If one timeseries suddenly skips forward (i.e. > there is a gap), you have an end point. Then slice between the start and > the end point, and append the view array to a list. Follow the timeseries > that did not skip until timestamps are synchronous again, and you have the > next starting point. Then just continue like this until the the two > timeseries are exhausted. It is an O(n) strategy, so it will not be > inefficient. If you are worried about the loop and performance, both Numba > and Cython can transform this into C speed. Numba takes less effort to use. > But Python loops are actually much faster than most scientists used to > Matlab and the like would expect. Thanks Sturla. That's more or less my current approach (except that I use the fact that the data is evenly samples to use np.where(np.diff(t1) != dt) to detect the regions of continuous data, to avoid the loop. Cheers, Daniele ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
2014-02-11 14:55 GMT+01:00 Andreas Hilboll : > On 11.02.2014 14:47, Daniele Nicolodi wrote: > > On 11/02/2014 14:41, Andreas Hilboll wrote: > >> On 11.02.2014 14:22, Daniele Nicolodi wrote: > >>> On 11/02/2014 14:10, Andreas Hilboll wrote: > On 11.02.2014 14:08, Daniele Nicolodi wrote: > > Hello, > > > > I have two time series (2xN dimensional arrays) recorded on the same > > time basis, but each with it's own dead times (and start and end > > recording times). I would like to obtain two time series containing > > only the time overlapping segments of the data. > > > > Does numpy or scipy offer something that may help in this? > > > > I can imagine strategies about how to approach the problem, but none > > that would be efficient. Ideas? > > Take a look at pandas. It has built-in time series functionality. > >>> > >>> Even using Pandas (and I would like to avoid to have to depend on it) > it > >>> is not clear to me how I would achieve what I want. Am I missing > something? > >> > >> If the two time series are pandas.Series objects and are called s1 and > s2: > >> > >> new1 = s1.ix[s2.dropna().index].dropna() > >> new2 = s2.ix[s1.dropna().index].dropna() > >> new1 = new1.ix[s2.dropna().index].dropna() > >> > >> Looks hackish, so there might be a more elegant solution. For further > >> questions about how to use pandas, please look at the pydata mailing > >> list or stackoverflow. > > > > Correct me if I'm wrong, but this assumes that missing data points are > > represented with Nan. In my case missing data points are just missing. > > pandas doesn't care. > > In pandas, you could simply do something like this (assuming the time is set as the index): pd.concat([s1, s2], axis=1) and then remove the nan's (where the index was not overlapping) or use `join='inner'` Joris > Andreas. > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
Daniele Nicolodi wrote: > I was probably not that clear: I have two 2xN arrays, one for each data > recording, one column for time (taken from the same clock for both > measurements) and one with data values. Each array has some gaps. If you want all subarrays where both timeseries are sampled, a single loop through the data can fix that. First find the smallest common timestamp. This is the first starting point. Then loop and follow the timestamps. As long as they are in synch, do just continue. If one timeseries suddenly skips forward (i.e. there is a gap), you have an end point. Then slice between the start and the end point, and append the view array to a list. Follow the timeseries that did not skip until timestamps are synchronous again, and you have the next starting point. Then just continue like this until the the two timeseries are exhausted. It is an O(n) strategy, so it will not be inefficient. If you are worried about the loop and performance, both Numba and Cython can transform this into C speed. Numba takes less effort to use. But Python loops are actually much faster than most scientists used to Matlab and the like would expect. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
On 2/11/2014 5:25 AM, Pauli Virtanen wrote: > the ndarray is also lacking some useful things, as > you point out. But I think the right solution would be to stuff > the required additions into ndarray, rather than retaining the > otherwise incompatible np.matrix as a crutch. Given that we now have the `dot` method, if we could get the other convenience attributes even as methods (say .I(), .H(), and .mpow()) that would greatly reduce the need for `matrix`. Just to be clear, the `matrix` object is not *really* the issue in the discussion of the scipy library, right? Since it already knows how to be seen as an ndarray, the library can always work with m.A when doing any linear algebra. From what I've read in this thread, the real issues for scipy seem to lie with the sparse matrix objects... ? Alan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
At 06:07 AM 2/11/2014, you wrote: >On 11/02/2014 14:56, Sturla Molden wrote: > > Daniele Nicolodi wrote: > > > >> Correct me if I'm wrong, but this assumes that missing data points are > >> represented with Nan. In my case missing data points are just missing. > > > > Then your data cannot be stored in a 2 x N array as you indicated. > >I was probably not that clear: I have two 2xN arrays, one for each data >recording, one column for time (taken from the same clock for both >measurements) and one with data values. Each array has some gaps. gaps at the ends I assume... use numpy.where() with the time channel as the condition - Ray ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
On 11/02/2014 14:56, Sturla Molden wrote: > Daniele Nicolodi wrote: > >> Correct me if I'm wrong, but this assumes that missing data points are >> represented with Nan. In my case missing data points are just missing. > > Then your data cannot be stored in a 2 x N array as you indicated. I was probably not that clear: I have two 2xN arrays, one for each data recording, one column for time (taken from the same clock for both measurements) and one with data values. Each array has some gaps. Cheers, Daniele ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
> On 11.02.2014 14:08, Daniele Nicolodi wrote: >> Hello, >> >> I have two time series (2xN dimensional arrays) recorded on the same >> time basis, but each with it's own dead times (and start and end >> recording times). I would like to obtain two time series containing >> only the time overlapping segments of the data. >> >> Does numpy or scipy offer something that may help in this? >> >> I can imagine strategies about how to approach the problem, but none >> that would be efficient. Ideas? or? st1 = 5 >>> np.concatenate((a1[:,st1:],a2[:,:a1.shape[1]-st1])) array([[ 5, 6, 7, 8, 9], [15, 16, 17, 18, 19], [ 5, 6, 7, 8, 9], [15, 16, 17, 18, 19]]) - Ray ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
Daniele Nicolodi wrote: > Correct me if I'm wrong, but this assumes that missing data points are > represented with Nan. In my case missing data points are just missing. Then your data cannot be stored in a 2 x N array as you indicated. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
> On 11.02.2014 14:08, Daniele Nicolodi wrote: >> Hello, >> >> I have two time series (2xN dimensional arrays) recorded on the same >> time basis, but each with it's own dead times (and start and end >> recording times). I would like to obtain two time series containing >> only the time overlapping segments of the data. >> >> Does numpy or scipy offer something that may help in this? >> >> I can imagine strategies about how to approach the problem, but none >> that would be efficient. Ideas? What is the gate/tach, ie pointer to start/stop? I work with both tachometers and EKGs and do similar windowing, usually just using gates as slices so as not to make copies. I also found this interesting and bookmarked it http://www.johnvinyard.com/blog/?p=268 which you might like. Just to be clear, you have 2 2D arrays and want a 4x(N-m) shape, like mixing two stereo tracks? >>> a1 = np.arange(0,20).reshape((2,-1)) >>> a2 = np.arange(5,25).reshape((2,-1)) >>> np.concatenate((a1,a2)) array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [ 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], [15, 16, 17, 18, 19, 20, 21, 22, 23, 24]]) >>> st1 = 2 >>> end1 = 7 >>> st2 = 1 >>> end2 = 6 >>> np.concatenate((a1[:,st1:end1],a2[:,st2:end2])) array([[ 2, 3, 4, 5, 6], [12, 13, 14, 15, 16], [ 6, 7, 8, 9, 10], [16, 17, 18, 19, 20]]) - Ray ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
On 11.02.2014 14:47, Daniele Nicolodi wrote: > On 11/02/2014 14:41, Andreas Hilboll wrote: >> On 11.02.2014 14:22, Daniele Nicolodi wrote: >>> On 11/02/2014 14:10, Andreas Hilboll wrote: On 11.02.2014 14:08, Daniele Nicolodi wrote: > Hello, > > I have two time series (2xN dimensional arrays) recorded on the same > time basis, but each with it's own dead times (and start and end > recording times). I would like to obtain two time series containing > only the time overlapping segments of the data. > > Does numpy or scipy offer something that may help in this? > > I can imagine strategies about how to approach the problem, but none > that would be efficient. Ideas? Take a look at pandas. It has built-in time series functionality. >>> >>> Even using Pandas (and I would like to avoid to have to depend on it) it >>> is not clear to me how I would achieve what I want. Am I missing something? >> >> If the two time series are pandas.Series objects and are called s1 and s2: >> >> new1 = s1.ix[s2.dropna().index].dropna() >> new2 = s2.ix[s1.dropna().index].dropna() >> new1 = new1.ix[s2.dropna().index].dropna() >> >> Looks hackish, so there might be a more elegant solution. For further >> questions about how to use pandas, please look at the pydata mailing >> list or stackoverflow. > > Correct me if I'm wrong, but this assumes that missing data points are > represented with Nan. In my case missing data points are just missing. pandas doesn't care. Andreas. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
Daniele Nicolodi wrote: > I can imagine strategies about how to approach the problem, but none > that would be efficient. Ideas? I would just loop from the start and loop from the end and find out where to clip. Then slice in between. If Python loops take too much time, JIT compile them with Numba. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
On 11/02/2014 14:41, Andreas Hilboll wrote: > On 11.02.2014 14:22, Daniele Nicolodi wrote: >> On 11/02/2014 14:10, Andreas Hilboll wrote: >>> On 11.02.2014 14:08, Daniele Nicolodi wrote: Hello, I have two time series (2xN dimensional arrays) recorded on the same time basis, but each with it's own dead times (and start and end recording times). I would like to obtain two time series containing only the time overlapping segments of the data. Does numpy or scipy offer something that may help in this? I can imagine strategies about how to approach the problem, but none that would be efficient. Ideas? >>> >>> Take a look at pandas. It has built-in time series functionality. >> >> Even using Pandas (and I would like to avoid to have to depend on it) it >> is not clear to me how I would achieve what I want. Am I missing something? > > If the two time series are pandas.Series objects and are called s1 and s2: > > new1 = s1.ix[s2.dropna().index].dropna() > new2 = s2.ix[s1.dropna().index].dropna() > new1 = new1.ix[s2.dropna().index].dropna() > > Looks hackish, so there might be a more elegant solution. For further > questions about how to use pandas, please look at the pydata mailing > list or stackoverflow. Correct me if I'm wrong, but this assumes that missing data points are represented with Nan. In my case missing data points are just missing. Cheers, Daniele ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
On 11.02.2014 14:22, Daniele Nicolodi wrote: > On 11/02/2014 14:10, Andreas Hilboll wrote: >> On 11.02.2014 14:08, Daniele Nicolodi wrote: >>> Hello, >>> >>> I have two time series (2xN dimensional arrays) recorded on the same >>> time basis, but each with it's own dead times (and start and end >>> recording times). I would like to obtain two time series containing >>> only the time overlapping segments of the data. >>> >>> Does numpy or scipy offer something that may help in this? >>> >>> I can imagine strategies about how to approach the problem, but none >>> that would be efficient. Ideas? >> >> Take a look at pandas. It has built-in time series functionality. > > Even using Pandas (and I would like to avoid to have to depend on it) it > is not clear to me how I would achieve what I want. Am I missing something? If the two time series are pandas.Series objects and are called s1 and s2: new1 = s1.ix[s2.dropna().index].dropna() new2 = s2.ix[s1.dropna().index].dropna() new1 = new1.ix[s2.dropna().index].dropna() Looks hackish, so there might be a more elegant solution. For further questions about how to use pandas, please look at the pydata mailing list or stackoverflow. HTH, Andreas. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
On 11/02/2014 14:10, Andreas Hilboll wrote: > On 11.02.2014 14:08, Daniele Nicolodi wrote: >> Hello, >> >> I have two time series (2xN dimensional arrays) recorded on the same >> time basis, but each with it's own dead times (and start and end >> recording times). I would like to obtain two time series containing >> only the time overlapping segments of the data. >> >> Does numpy or scipy offer something that may help in this? >> >> I can imagine strategies about how to approach the problem, but none >> that would be efficient. Ideas? > > Take a look at pandas. It has built-in time series functionality. Even using Pandas (and I would like to avoid to have to depend on it) it is not clear to me how I would achieve what I want. Am I missing something? Cheers, Daniele ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Overlapping time series
On 11.02.2014 14:08, Daniele Nicolodi wrote: > Hello, > > I have two time series (2xN dimensional arrays) recorded on the same > time basis, but each with it's own dead times (and start and end > recording times). I would like to obtain two time series containing > only the time overlapping segments of the data. > > Does numpy or scipy offer something that may help in this? > > I can imagine strategies about how to approach the problem, but none > that would be efficient. Ideas? Take a look at pandas. It has built-in time series functionality. Andreas. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Overlapping time series
Hello, I have two time series (2xN dimensional arrays) recorded on the same time basis, but each with it's own dead times (and start and end recording times). I would like to obtain two time series containing only the time overlapping segments of the data. Does numpy or scipy offer something that may help in this? I can imagine strategies about how to approach the problem, but none that would be efficient. Ideas? Thank you. Best, Daniele ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
Pauli Virtanen wrote: > [1] http://fperez.org/py4science/numpy-pep225/numpy-pep225.html I have to agree with Robert Kern here. One operator that we can (ab)use for matrix multiplication would suffice. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
For our students, the matrix class is really appealing as we use a lot of linear algebra and expressions with matrices simply look better with an operator instead of a function: x=A.I*b looks much better than x = np.dot(np.linalg.inv(A),b) And this gets worse when the expression is longer: x = R.I*A*R*b becomes: x = np.dot( np.linalg.inv(R), np.dot(A, np.dot(R, b))) Actually, not being involved in earlier discussions on this topic, I was a bit surprised by this and do not see the problem of having the matrix class as nobody is obliged to use it. I tried to find the reasons, but did not find it in the thread mentioned. Maybe someone could summarize the main problem with keeping this class for newbies on this list like me? Anyway, I would say that there is a clear use for the matrix class: readability of linear algebra code and hence a lower chance of errors, so higher productivity. Just my 2cts, Jacco Hoekstra P.S. Also: A = np.mat("2 3 ; -1 0") is very handy! -Original Message- From: numpy-discussion-boun...@scipy.org [mailto:numpy-discussion-boun...@scipy.org] On Behalf Of Pauli Virtanen Sent: dinsdag 11 februari 2014 12:47 To: numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] deprecate numpy.matrix Sturla Molden gmail.com> writes: > Pauli Virtanen iki.fi> wrote: > > It is not a good thing that there is no well defined "domain > > specific language" for matrix algebra in Python. > > Perhaps Python should get some new operators? It might still be possible to advocate for this in core Python, even though the ship has sailed long ago. Some previous discussion: [1] http://fperez.org/py4science/numpy-pep225/numpy-pep225.html [2] http://www.python.org/dev/peps/pep-0225/ [3] http://www.python.org/dev/peps/pep-0211/ (My own take would be that one extra operator is enough for most purposes, and would be easier to push for.) [clip] > On the serious side, I don't think there really is a good solution to > this problem at all. This is true. However, I'd prefer to have one solution over several conflicting ones. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
Sturla Molden gmail.com> writes: > Pauli Virtanen iki.fi> wrote: > > It is not a good thing that there is no well defined > > "domain specific language" for matrix algebra in Python. > > Perhaps Python should get some new operators? It might still be possible to advocate for this in core Python, even though the ship has sailed long ago. Some previous discussion: [1] http://fperez.org/py4science/numpy-pep225/numpy-pep225.html [2] http://www.python.org/dev/peps/pep-0225/ [3] http://www.python.org/dev/peps/pep-0211/ (My own take would be that one extra operator is enough for most purposes, and would be easier to push for.) [clip] > On the serious side, I don't think there really is a good solution to this > problem at all. This is true. However, I'd prefer to have one solution over several conflicting ones. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
Pauli Virtanen wrote: > It is not a good thing that there is no well defined > "domain specific language" for matrix algebra in Python. Perhaps Python should get some new operators? __dot__ __cross__ __kronecker__ Obviously, using them in real Python code would require unicode. ;-) On the serious side, I don't think there really is a good solution to this problem at all. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
My 2pc: I personally hardly ever use matrix, even in linear algebra dense code. It can be nice though, to use matrix semantics within a restricted scope. When I first came to numpy, the ability to choose linear algebra versus array semantics seemed like a really neat thing to me; though in practice the array semantics are so much more useful you really don't mind having to write the occasional np.dot. There seems to be some resistance form the developer side in having to maintain this architecture, which I cannot comment on, but from a user perspective, I am perfectly fine with the way things are. On Tue, Feb 11, 2014 at 11:16 AM, Todd wrote: > > On Feb 11, 2014 3:23 AM, "Alan G Isaac" wrote: > > > > On 2/10/2014 7:39 PM, Pauli Virtanen wrote: > > > The issue here is semantics for basic linear algebra operations, such > as > > > matrix multiplication, that work for different matrix objects, > including > > > ndarrays. > > > > > > I'll see if I can restate my suggestion in another way, > > because I do not feel you are responding to it. > > (I might be wrong.) > > > > What is a duck? If you ask it to quack, it quacks. > > OK, but what is it to quack? > > > > Here, quacking is behaving like an ndarray (in your view, > > as I understand it) when asked. But how do we ask? > > Your view (if I understand) is we ask via the operations > > supported by ndarrays. But maybe that is the wrong way > > for the library to ask this question. > > > > If so, then scipy libraries could ask an object > > to behave like an an ndarray by calling, e.g., > > __asarray__ on it. It becomes the responsibility > > of the object to return something appropriate > > when __asarray__ is called. Objects that know how to do > > this will provide __asarray__ and respond > > appropriately. Other types can be coerced if > > that is the documented behavior (e.g., lists). > > The libraries will then be written for a single > > type of behavior. What it means to "quack" is > > pretty easily documented, and a matrix object > > already knows how (e.g., m.A). Presumably in > > this scenario __asarray__ would return an object > > that behaves like an ndarray and a converter for > > turning the final result into the desired object > > type (e.g., into a `matrix` if necessary). > > > > Hope that clearer, even if it proves a terrible idea. > > > > Alan Isaac > > I don't currently use the matrix class, but having taken many linear > algebra classes I can see the appeal, and if I end up teaching the subject > I think I would appreciate having it available. > > On the other hand, I certainly can see the possibility for confusion, and > I don't think it is something that should be used unless someone has a > really good reason. > > So I come out somewhere in the middle here. So, although this may end up > being a terrible idea, I would like to purpose what I think is a > compromise: instead of just removing matrix, split it out into a scikit. > That way, it it's still available for those who need it, but will be less > likely to be used accidentally, and won't be interfering with the rest of > numpy and scipy development. > > Specifically, I would split matrix into a scikit, while in the same > release deprecate np.matrix. They can then exist in parallel for a few > releases to allow code to be ported away from it. > > However, I would suggest that before the split, all linear algebra > routines should be available as functions or methods in numpy proper. > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
Alan G Isaac gmail.com> writes: [clip] > Here, quacking is behaving like an ndarray (in your view, > as I understand it) when asked. But how do we ask? > Your view (if I understand) is we ask via the operations > supported by ndarrays. But maybe that is the wrong way > for the library to ask this question. It is not a good thing that there is no well defined "domain specific language" for matrix algebra in Python. Rather, some code is written with one convention and other code with a different convention. The conventions disagree on how to express basic operations, such as matrix multiplication. Moreover, the ndarray is also lacking some useful things, as you point out. But I think the right solution would be to stuff the required additions into ndarray, rather than retaining the otherwise incompatible np.matrix as a crutch. > If so, then scipy libraries could ask an object > to behave like an an ndarray by calling, e.g., > __asarray__ on it. It becomes the responsibility > of the object to return something appropriate > when __asarray__ is called. Objects that know how to do > this will provide __asarray__ and respond > appropriately. Another way to achieve similar thing as your suggestion is to add a coercion function in the vein of scipy.sparse.aslinearoperator. It could deal with known-failure cases (np.matrix, scipy.sparse matrices) and for the rest just assume the object satisfies the ndarray API and pass them through. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Inheriting from ndarray Was: deprecate numpy.matrix
On Feb 11, 2014 5:01 AM, "Alexander Belopolsky" wrote: > > > On Mon, Feb 10, 2014 at 11:31 AM, Nathaniel Smith wrote: >> >> And in the long run, I >> think the goal is to move people away from inheriting from np.ndarray. > > > This is music to my ears, There are a lot of units (meter, foot, built, etc.) packages that use inheriting from numpy to handle their units. I wouldn't call it a shortcut, it is a common design decision. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] deprecate numpy.matrix
On Feb 11, 2014 3:23 AM, "Alan G Isaac" wrote: > > On 2/10/2014 7:39 PM, Pauli Virtanen wrote: > > The issue here is semantics for basic linear algebra operations, such as > > matrix multiplication, that work for different matrix objects, including > > ndarrays. > > > I'll see if I can restate my suggestion in another way, > because I do not feel you are responding to it. > (I might be wrong.) > > What is a duck? If you ask it to quack, it quacks. > OK, but what is it to quack? > > Here, quacking is behaving like an ndarray (in your view, > as I understand it) when asked. But how do we ask? > Your view (if I understand) is we ask via the operations > supported by ndarrays. But maybe that is the wrong way > for the library to ask this question. > > If so, then scipy libraries could ask an object > to behave like an an ndarray by calling, e.g., > __asarray__ on it. It becomes the responsibility > of the object to return something appropriate > when __asarray__ is called. Objects that know how to do > this will provide __asarray__ and respond > appropriately. Other types can be coerced if > that is the documented behavior (e.g., lists). > The libraries will then be written for a single > type of behavior. What it means to "quack" is > pretty easily documented, and a matrix object > already knows how (e.g., m.A). Presumably in > this scenario __asarray__ would return an object > that behaves like an ndarray and a converter for > turning the final result into the desired object > type (e.g., into a `matrix` if necessary). > > Hope that clearer, even if it proves a terrible idea. > > Alan Isaac I don't currently use the matrix class, but having taken many linear algebra classes I can see the appeal, and if I end up teaching the subject I think I would appreciate having it available. On the other hand, I certainly can see the possibility for confusion, and I don't think it is something that should be used unless someone has a really good reason. So I come out somewhere in the middle here. So, although this may end up being a terrible idea, I would like to purpose what I think is a compromise: instead of just removing matrix, split it out into a scikit. That way, it it's still available for those who need it, but will be less likely to be used accidentally, and won't be interfering with the rest of numpy and scipy development. Specifically, I would split matrix into a scikit, while in the same release deprecate np.matrix. They can then exist in parallel for a few releases to allow code to be ported away from it. However, I would suggest that before the split, all linear algebra routines should be available as functions or methods in numpy proper. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion