On Dec 14, 2006, at 2:56 AM, Cameron Walsh wrote:
At some point I might try and test
different cache sizes for different data-set sizes and see what the
effect is. For now, 65536 seems a good number and I would be happy to
see this replace the current numpy.histogram.
I experimented a
This same idea could be used to parallelize the histogram computation.
Then you could really get into large (many Gb/TB/PB) data sets. I
might try to find time to do this with ipython1, but someone else
could do this as well.
Brian
On 12/13/06, Rick White [EMAIL PROTECTED] wrote:
On Dec 12,
Rick White wrote:
Just so we don't get too smug about the speed, if I do this in IDL on
the same machine it is 10 times faster (0.28 seconds instead of 4
seconds). I'm sure the IDL version uses the much faster approach of
just sweeping through the array once, incrementing counts in the
Hi,
I spent some time a while ago on an histogram function for numpy. It uses
digitize and bincount instead of sorting the data. If I remember right, it
was significantly faster than numpy's histogram, but I don't know how it
will behave with very large data sets.
I attached the file if you
On Dec 12, 2006, at 10:27 PM, Cameron Walsh wrote:
I'm trying to generate histograms of extremely large datasets. I've
tried a few methods, listed below, all with their own shortcomings.
Mailing-list archive and google searches have not revealed any
solutions.
The numpy.histogram function
Hi all,
Absolutely gorgeous, I confirm the 1.6x speed-up over the weave
version, i.e. a 25x speed-up over the existing version.
It would be good if the redefinition of the range function could be
changed in the numpy modules, before it goes into subversion, to
avoid the need for Rick's line