I'm trying to file a set of data points, defined by genome coordinates, into
bins, also based on genome coordinates. Each data point is (chromosome, start,
end, point) and each bin is (chromosome, start, end). I have about 140 million
points to file into around 100,000 bins. Both are (roughly)
I've bodged my way through my median problems (see previous postings). Now I
need to take a z-score of an array that might contain nans. At the moment, if
the array, which is 7000 elements, contains 1 nan or more, all the results come
out as nan.
My other problem is that my array is indexed from
David Cournapeau david at ar.media.kyoto-u.ac.jp writes:
Still, it is indeed really slow for your case; when I fixed nanmean and
co, I did not know much about numpy, I just wanted them to give the
right answer :) I think this can be made faster, specially for your case
(where the axis along
David Cournapeau david at ar.media.kyoto-u.ac.jp writes:
Unfortunately, we can't, because we would loose generality: we need to
compute median on any axis, not only the last one. The proper solution
would be to have a sort/max/min/etc... which knows about nan in numpy,
which is what Chuck and
David Cournapeau david at ar.media.kyoto-u.ac.jp writes:
You can use nanmean (from scipy.stats):
I rejoiced when I saw this answer, because it looks like a function I can just
drop in and it works. Unfortunately, nanmedian seems to be quite a bit slower
than just using lists (ignoring nan
David Cournapeau david at ar.media.kyoto-u.ac.jp writes:
It may be that nanmedian is slow. But I would sincerly be surprised if
it were slower than python list, except for some pathological cases, or
maybe a bug in nanmedian. What do your data look like ? (size, number of
nan, etc...)
I've
Pierre GM pgmdevlist at gmail.com writes:
I think there were some changes on the C side of numpy between 1.0 and 1.1,
you may have to recompile scipy and matplotlib from sources. What versions
are you using for those 2 packages ?
$ dpkg -l | grep scipy
ii python-scipy
Alan G Isaac aisaac at american.edu writes:
Recently I needed to fill a 2d array with values
from computations that could go wrong.
I created an array of NaN and then replaced
the elements where the computation produced
a useful value. I then applied ``nanmax``,
to get the maximum of the
I have data from biological experiments that is represented as a list of
about 5000 triples. I would like to convert this to a list of the median
of each triple. I did some profiling and found that numpy was much about
12 times faster for this application than using regular Python lists and
a