[Numpy-discussion] Overlapping ranges

2009-03-16 Thread Peter Saffrey
I'm trying to file a set of data points, defined by genome coordinates, into bins, also based on genome coordinates. Each data point is (chromosome, start, end, point) and each bin is (chromosome, start, end). I have about 140 million points to file into around 100,000 bins. Both are (roughly)

[Numpy-discussion] Standard functions (z-score) on nan (again)

2008-09-25 Thread Peter Saffrey
I've bodged my way through my median problems (see previous postings). Now I need to take a z-score of an array that might contain nans. At the moment, if the array, which is 7000 elements, contains 1 nan or more, all the results come out as nan. My other problem is that my array is indexed from

Re: [Numpy-discussion] Medians that ignore values

2008-09-22 Thread Peter Saffrey
David Cournapeau david at ar.media.kyoto-u.ac.jp writes: Still, it is indeed really slow for your case; when I fixed nanmean and co, I did not know much about numpy, I just wanted them to give the right answer :) I think this can be made faster, specially for your case (where the axis along

Re: [Numpy-discussion] Medians that ignore values

2008-09-22 Thread Peter Saffrey
David Cournapeau david at ar.media.kyoto-u.ac.jp writes: Unfortunately, we can't, because we would loose generality: we need to compute median on any axis, not only the last one. The proper solution would be to have a sort/max/min/etc... which knows about nan in numpy, which is what Chuck and

Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Peter Saffrey
David Cournapeau david at ar.media.kyoto-u.ac.jp writes: You can use nanmean (from scipy.stats): I rejoiced when I saw this answer, because it looks like a function I can just drop in and it works. Unfortunately, nanmedian seems to be quite a bit slower than just using lists (ignoring nan

Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Peter Saffrey
David Cournapeau david at ar.media.kyoto-u.ac.jp writes: It may be that nanmedian is slow. But I would sincerly be surprised if it were slower than python list, except for some pathological cases, or maybe a bug in nanmedian. What do your data look like ? (size, number of nan, etc...) I've

Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Peter Saffrey
Pierre GM pgmdevlist at gmail.com writes: I think there were some changes on the C side of numpy between 1.0 and 1.1, you may have to recompile scipy and matplotlib from sources. What versions are you using for those 2 packages ? $ dpkg -l | grep scipy ii python-scipy

Re: [Numpy-discussion] Medians that ignore values

2008-09-19 Thread Peter Saffrey
Alan G Isaac aisaac at american.edu writes: Recently I needed to fill a 2d array with values from computations that could go wrong. I created an array of NaN and then replaced the elements where the computation produced a useful value. I then applied ``nanmax``, to get the maximum of the

[Numpy-discussion] Medians that ignore values

2008-09-18 Thread Peter Saffrey
I have data from biological experiments that is represented as a list of about 5000 triples. I would like to convert this to a list of the median of each triple. I did some profiling and found that numpy was much about 12 times faster for this application than using regular Python lists and a