Re: [Numpy-discussion] www.numpy.org url issues
I forwarded this msg to John, in case he isn't watching this list. I recall that around that time (Y2K) John grabbed a few domains of public projects and donated them as soon as the project was ready for it. (To keep the squatters at bay I guess.) -robert On Oct 13, 2010, at 8:03 AM, Benjamin Root wrote: On Wed, Oct 13, 2010 at 12:11 AM, David Warde-Farley warde...@iro.umontreal.ca wrote: I'm not sure who registered/owns numpy.org, but it looks like a frame sitting on top of numpy.scipy.org. whois says that it is a John Turner of Technical Computing Solutions in Tennessee. Looks like he has owned it since 2000. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Question on timeseries, for financial application
On Dec 13, 2009, at 1:31 AM, Pierre GM wrote: On Dec 13, 2009, at 12:11 AM, Robert Ferrell wrote: Have you considered creating a TimeSeries for each data series, and then putting them all together in a dict, keyed by symbol? That's an idea One disadvantage of one big monster numpy array for all the series is that not all series may have a full set of 1800 data points. So the array isn't really nicely rectangular. Bah, there's adjust_endpoints to take care of that. Maybe this will work for the OP. In my work, if a series is missing data the desirable thing is to use the data I have. I don't' want to truncate existing series to fit the short ones, nor pad to fit the long ones. Really depends on the analysis the OP is trying to do. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Question on timeseries, for financial application
On Dec 13, 2009, at 7:07 AM, josef.p...@gmail.com wrote: On Sun, Dec 13, 2009 at 3:31 AM, Pierre GM pgmdevl...@gmail.com wrote: On Dec 13, 2009, at 12:11 AM, Robert Ferrell wrote: Have you considered creating a TimeSeries for each data series, and then putting them all together in a dict, keyed by symbol? That's an idea As far as I understand, that's what pandas.DataFrame does. pandas.DataMatrix used 2d array to store data One disadvantage of one big monster numpy array for all the series is that not all series may have a full set of 1800 data points. So the array isn't really nicely rectangular. Bah, there's adjust_endpoints to take care of that. Not sure exactly what kind of analysis you want to do, but grabbing a series from a dict is quite fast. Thomas, as robert F. pointed out, everything depends on the kind of analysis you want. If you want to normalize your series, having all of them in a big array is the best (plain array, not structured, so that you can apply .mean and .std directly without having to loop on the series). If you need to apply the same function over all the series, here again having a big ndarray is easiest. Give us an example of what you wanna do. Or a structured array with homogeneous type that allows fast creation of views for data analysis. These kinds of financial series don't have that much data (speaking from the early 21st century point of view). The OP says 1000 series, 1800 observations per series. Maybe 5 data items per observation, 4 bytes each. That's well under 50MB. I've found it satisfactory to keep the data someplace that's handy to get at, and easy to use. When I want to do analysis I pull it into whatever format is best for that analysis. Depending on the needs, it may not be necessary to try to arrange the data so you can get a view for analysis - the time for a copy can be negligible if the analysis takes a while. -r ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Question on timeseries, for financial application
Have you considered creating a TimeSeries for each data series, and then putting them all together in a dict, keyed by symbol? One disadvantage of one big monster numpy array for all the series is that not all series may have a full set of 1800 data points. So the array isn't really nicely rectangular. Not sure exactly what kind of analysis you want to do, but grabbing a series from a dict is quite fast. -r On Dec 12, 2009, at 6:08 PM, THOMAS BROWNE wrote: Hello all, Quite new to numpy / timeseries module, please forgive the elementary question. I wish to do quite to do a bunch of multivariate analysis on 1000 different financial markets series, each holding about 1800 data points (5 years of daily data). What's the best way to put this into a TimeSeries object? Should I use a structured data type (in which case I can reference each series by name), or should I put it into one big numpy array object (in which case I guess I'll have to keep track of the series name in an internal structure)? What are the advantages and disadvantages of each? Ideally I'd have liked the ease and simplicity of being able to reference each series by name, while maintaining the fast speed and clean structure of one big numpy array. Any way of getting both? Once I have a multivariate TimeSeries, how do I add another series to it? Thanks for the help. Thomas. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Objected-oriented SIMD API for Numpy
On Oct 22, 2009, at 1:35 AM, Sturla Molden wrote: Robert Kern skrev: No, I think you're right. Using SIMD to refer to numpy-like operations is an abuse of the term not supported by any outside community that I am aware of. Everyone else uses SIMD to describe hardware instructions, not the application of a single syntactical element of a high level language to a non-trivial data structure containing lots of atomic data elements. Then you should pick up a book on parallel computing. It is common to differentiate between four classes of computers: SISD, MISD, SIMD, and MIMD machines. A SISD system is the classical von Neuman machine. A MISD system is a pipelined von Neuman machine, for example the x86 processor. A SIMD system is one that has one CPU dedicated to control, and a large collection of subordinate ALUs for computation. Each ALU has a small amount of private memory. The IBM Cell processor is the typical SIMD machine. A special class of SIMD machines are the so-called vector machines, of which the most famous is the Cray C90. The MMX and SSE instructions in Intel Pentium processors are an example of vector instructions. Some computer scientists regard vector machines a subtype of MISD systems, orthogonal to piplines, because there are no subordinate ALUs with private memory. MIMD systems multiple independent CPUs. MIMD systems comes in two categories: shared-memory processors (SMP) and distributed-memory machines (also called cluster computers). The dual- and quad-core x86 processors are shared-memory MIMD machines. Many people associate the word SIMD with SSE due to Intel marketing. But to the extent that vector machines are MISD orthogonal to piplined von Neuman machines, SSE cannot be called SIMD. NumPy is a software simulated vector machine, usually executed on MISD hardware. To the extent that vector machines (such as SSE and C90) are SIMD, we must call NumPy an object-oriented SIMD library. This is not the terminology I am familiar with. Calling NumPy an object-oriented SIMD library is very confusing for me. I worked in the parallel computer world for a while (back in the dark ages) and this terminology would have been confusing to everyone I dealt with. I've also read many parallel computing books. In my experience SIMD refers to hardware, not software. There is no reason that NumPy can't be written to run great (get good speed-ups) on an 8-core shared memory system. That would be a MIMD system, and there's nothing about it that doesn't fit with the NumPy abstraction. And, although SIMD can be a subset of MIMD, there are things that can be done in NumPy that be parallelized on MIMD machines but not on SIMD machines (e.g. the NumPy vector type is flexible enough it can store a list of tasks, and the operations on that vector can be parallelized easily on a shared memory MIMD machine - task parallelism - but not on a SIMD machine). If we say that NumPy is a software simulated vector machine or an object-oriented SIMD library we are pigeonholing NumPy in a way which is too limiting and isn't accurate. As a user it feels to me that NumPy is built around various algebra abstractions, many of which map well onto vector machine operations. That means that many of the operations are amenable to efficient implementation on SIMD hardware. But, IMO, one of the nice features of NumPy is it is built around high- level operations, and I would hate to see the project go down a path which insists that everything in NumPy be efficient on all SIMD hardware. Of course, I would also love to see implementations which take as much advantage of available HW as possible (e.g. exploit SIMD HW if available). That's my $0.02, worth only a couple cents less than that. -robert ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Scientific Computing with Python, September 18, 2009
On Sep 11, 2009, at 5:07 PM, Neal Becker wrote: I'd love to participate in these webinars. Problem is, AFAICT, gotomeeting only supports windows. I'm not certain that is correct. I've participated in some of these, and Im' running OS X (10.5). I think those were gotomeeting, although don't honestly recall. Assuming nothing's changed, though, worked great on OS X. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Slices of structured arrays
Is there a way to get slices of a structured array and keep the field names? For instance, I've got dtype=[('x','f4'),('y','f4'), ('z','f4')] and I want to get just the x y slices into a new array with dtype=[('x','f4'),('y','f4')]. I can just make a new dtype, and extract what I need, but I'm wondering if there's some simple way to do this that I haven't found. Here's what I know works: # Make a len 10 array with 3 fields, 'x', 'y', 'z' In [647]: xyz = np.array(zip(*np.random.random_integers(low=10, size=(3,10))), dtype=[('x', 'f4'), ('y', 'f4'), ('z', 'f4')]) # Get just the 'x' and 'y' fields In [648]: xy = np.array( zip(xyz['x'], xyz['y'] ), dtype=[('x','f4'), ('y', 'f4')]) In [649]: xyz['x'] Out[649]: array([ 4., 1., 1., 5., 1., 2., 9., 8., 1., 9.], dtype=float32) In [650]: xy['x'] Out[650]: array([ 4., 1., 1., 5., 1., 2., 9., 8., 1., 9.], dtype=float32) That works, but just feels like there's probably an elegant solution I don't know. I couldn't find anything in the docs, but I may not have been using the right search words. thanks, -robert ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Slices of structured arrays
On Jun 1, 2009, at 4:41 PM, Robert Kern wrote: On Mon, Jun 1, 2009 at 17:32, Robert Ferrell ferr...@diablotech.com wrote: Is there a way to get slices of a structured array and keep the field names? For instance, I've got dtype=[('x','f4'),('y','f4'), ('z','f4')] and I want to get just the x y slices into a new array with dtype=[('x','f4'),('y','f4')]. I can just make a new dtype, and extract what I need, but I'm wondering if there's some simple way to do this that I haven't found. Here's what I know works: # Make a len 10 array with 3 fields, 'x', 'y', 'z' In [647]: xyz = np.array(zip(*np.random.random_integers(low=10, size=(3,10))), dtype=[('x', 'f4'), ('y', 'f4'), ('z', 'f4')]) # Get just the 'x' and 'y' fields In [648]: xy = np.array( zip(xyz['x'], xyz['y'] ), dtype=[('x','f4'), ('y', 'f4')]) In [649]: xyz['x'] Out[649]: array([ 4., 1., 1., 5., 1., 2., 9., 8., 1., 9.], dtype=float32) In [650]: xy['x'] Out[650]: array([ 4., 1., 1., 5., 1., 2., 9., 8., 1., 9.], dtype=float32) That works, but just feels like there's probably an elegant solution I don't know. I couldn't find anything in the docs, but I may not have been using the right search words. In numpy 1.4, there will be a function that does this, numpy.lib.recfunctions.drop_fields(). In the meantime, you can copy-and-paste it into your own code: http://svn.scipy.org/svn/numpy/trunk/numpy/lib/recfunctions.py Or use it from it's original source, matplotlib.mlab.rec_drop_fields(), if you have matplotlib. That's perfect. I've got matplotlib, so I'll use that for now. thanks, -robert ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] add xirr to numpy financial functions?
On May 25, 2009, at 10:59 PM, Joe Harrington wrote: Let's keep this thread focussed on the original issue: just add a floating array of times to irr or a new xirr continuous interest no more Anyone can use the timeseries package to produce a floating array of times from normal dates, if those are the dates they want. If they want some specialized financial date, they may want a different conversion, however. All we should provide in NumPy would be the simplest tool. Specialized dates and date-time conversion belong elsewhere. If we're *not* skipping dates, there is no need for xirr, just use irr, which exists. scikits.financial seems like a great idea, and then knock yourselves out for date conversions and definitions of compounding. Just think big and design it first. But let's keep this thread on the simple question for NumPy. My vote is against adding xirr to NumPy. In my experience, if you want internal rate of return, then you also want time weighted return, for instance, and all of sudden it becomes surprising that NumPy tantalizes with a some of the needed capability but not all of it. I read in an old thread that irr was included partly because OLPC was including NumPy and it was great that kids would have a tool to help them understand the present value of money. In my opinion, cumprod() is an even better teaching tool for that. I'm not advocating reducing functionality in NumPy, but I prefer the idea of keeping NumPy as an array core, and having higher-level capability available as add-ons (scipy, scikit, etc...) -r ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] add xirr to numpy financial functions?
I haven't read all the messages in detail, and I'm a consumer not a producer, but I'll comment anyways. I'd love to see additional financial functionality, but I'd like to see them in a scikit, not in numpy. I think to be useful they are too complicated to go into numpy. A couple of my many reasons: 1. Doing a precise, bang-up job with dates is paramount to any interesting implementation of many financial functions. I've found timeseries to be a great package - there are some things I'd like to see, but overall it is at the foundation of all of my financial analysis. Any moderately interesting extension of the current capabilities would rapidly end up trying to duplicate much of the timeseries functionality, IMO. Rather than partially re-implement the wheel in numpy, as a consumer I'd like to see financial stuff built on a common basis, and timeseries would be a great start. 2. I've read enough of this discussion to hear a requirement for both good date handling and capable solvers - just for xirr. To do a really interesting job on an interesting amount of capability requires even more dependencies, I think. Although it might be tempting to include a few more lightweight financial functions in numpy, I doubt they will be that useful. Most of the lightweight ones are easy enough to whip up when you need them. Also, an approximation that's good today isn't the right one tomorrow - only the really robust stuff seems to survive the test of time, in my limited experience. A start on a really solid scikits financial package would be awesome, though. A few months ago, when the open source software for pricing CDS's was released (http://www.cdsmodel.com/information/cds-model) I took a look and noticed that it had a ton of code for dealing with dates. (I also didn't see any tests in the code. I wonder what that means. Scary for anybody that might want to modify it.) I thought if I had an extra 100 hours in every day it would be fun to re-write that code in numpy/scipy and release it. -r ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] add xirr to numpy financial functions?
On May 25, 2009, at 9:15 PM, Matt Knox wrote: josef.pktd at gmail.com writes: So, while python won't get any industrial strength finance package, a more modest designer package would be feasible, if there were any interest in it (which I haven't seen). ... The even more modest question is whether we would want to match open office in it's finance part. These are pretty different use cases from those use cases where you have quantlib all set up and running. As you have hinted, the scope of what will/should be covered with numpy financial functions needs to be defined better before putting more such functions into numpy. If that scope turns out to be something comparable to what excel or openoffice offers, that's fine, but I think a maturation period outside the numpy core (in the form of a scikit or otherwise) would be still be a good idea to avoid getting stuck with a poorly thought out API. +1 for a maturation period outside the numpy core. As for my personal feelings on how much financial functionality numpy/scipy should offer... I would agree that QuantLib-like functionality is far beyond what numpy can/should try to achieve. More basic functionality like OpenOffice or Excel probably seems about right. Although maybe it is more appropriate for scipy than numpy. +1 for something outside numpy. Even OpenOffice or Excel financial capability might, perhaps, go into scipy, but why not have it optional? -r ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Masked array usage
I have a question about assigning to masked arrays. a is a len ==3 masked array, with 2 unmasked elements. b is a len == 2 array. I want to put the elements of b into the unmasked elements of a. How do I do that? In [598]: a Out[598]: masked_array(data = [1 -- 3], mask = [False True False], fill_value=99) In [599]: b Out[599]: array([7, 8]) I'd like an operation that gives me: masked_array(data = [7 -- 8], mask = [False True False], fill_value=99) Seems like it shouldn't be that hard, but I can't figure it out. Any suggestions? thanks, -robert ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Masked array usage
Sweet. So simple. That works great. thanks, -robert On Nov 27, 2008, at 8:41 AM, Angus McMorland wrote: 2008/11/27 Robert Ferrell [EMAIL PROTECTED]: I have a question about assigning to masked arrays. a is a len ==3 masked array, with 2 unmasked elements. b is a len == 2 array. I want to put the elements of b into the unmasked elements of a. How do I do that? In [598]: a Out[598]: masked_array(data = [1 -- 3], mask = [False True False], fill_value=99) In [599]: b Out[599]: array([7, 8]) I'd like an operation that gives me: masked_array(data = [7 -- 8], mask = [False True False], fill_value=99) Seems like it shouldn't be that hard, but I can't figure it out. Any suggestions? How about: c = a.copy() c[~a.mask] = b Angus. -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion