Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 4:36 AM, Neil Girdhar mistersh...@gmail.com wrote: Yeah, I'm not arguing, I'm just curious about your reasoning. That explains why not C++. Why would you want to do this in C and not Python? Well, the algorithm has to iterate over all the inputs, updating the

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Neil Girdhar
You got it. I remember this from when I worked at Google and we would process (many many) logs. With enough bins, the approximation is still really close. It's great if you want to make an automatic plot of data. Calling numpy.partition a hundred times is probably slower than calling P^2 with

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Benjamin Root
Then you can set about convincing matplotlib and friends to use it by default Just to note, this proposal was originally made over in the matplotlib project. We sent it over here where its benefits would have wider reach. Matplotlib's plan is not to change the defaults, but to offload as much as

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 8:06 AM, Neil Girdhar mistersh...@gmail.com wrote: You got it. I remember this from when I worked at Google and we would process (many many) logs. With enough bins, the approximation is still really close. It's great if you want to make an automatic plot of data.

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 9:14 AM, Eric Moore e...@redtetrahedron.org wrote: This blog post, and the links within also seem relevant. Appears to have python code available to try things out as well.

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Eric Moore
This blog post, and the links within also seem relevant. Appears to have python code available to try things out as well. https://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest -Eric On Wed, Apr 15, 2015 at 11:24 AM, Benjamin Root

[Numpy-discussion] [ANN] python-blosc v1.2.5

2015-04-15 Thread Valentin Haenel
= Announcing python-blosc 1.2.5 = What is new? This release contains support for Blosc v1.5.4 including changes to how the GIL is kept. This was required because Blosc was refactored in the v1.5.x line to remove global

Re: [Numpy-discussion] IDE's for numpy development?

2015-04-15 Thread Joseph Martinot-Lagarde
Le 08/04/2015 21:19, Yuxiang Wang a écrit : I think spyder supports code highlighting in C and that's all... There's no way to compile in Spyder, is there? Well, you could write a compilation script using Scons and run it from spyder ! :) But no, spyder is very python-oriented and there is no

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread josef.pktd
On Wed, Apr 15, 2015 at 6:08 PM, josef.p...@gmail.com wrote: On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote: Does it work for you to set outer = np.multiply.outer ? It's actually faster on my machine. I assume it does because np.corrcoeff uses it, and it's the

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread josef.pktd
On Wed, Apr 15, 2015 at 6:40 PM, Nathaniel Smith n...@pobox.com wrote: On Wed, Apr 15, 2015 at 6:08 PM, josef.p...@gmail.com wrote: On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote: Does it work for you to set outer = np.multiply.outer ? It's actually faster on

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread Neil Girdhar
I don't understand. Are you at pycon by any chance? On Wed, Apr 15, 2015 at 6:12 PM, josef.p...@gmail.com wrote: On Wed, Apr 15, 2015 at 6:08 PM, josef.p...@gmail.com wrote: On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote: Does it work for you to set outer =

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread Nathaniel Smith
On Wed, Apr 15, 2015 at 6:08 PM, josef.p...@gmail.com wrote: On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote: Does it work for you to set outer = np.multiply.outer ? It's actually faster on my machine. I assume it does because np.corrcoeff uses it, and it's the

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Neil Girdhar
Cool, thanks for looking at this. P2 might still be better even if the whole dataset is in memory because of cache misses. Partition, which I guess is based on quickselect, is going to run over all of the data as many times as there are bins roughly, whereas p2 only runs over it once. From a

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread josef.pktd
On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote: Does it work for you to set outer = np.multiply.outer ? It's actually faster on my machine. I assume it does because np.corrcoeff uses it, and it's the same type of use cases. However, I'm not using it very often (I

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread Neil Girdhar
Does it work for you to set outer = np.multiply.outer ? It's actually faster on my machine. On Wed, Apr 15, 2015 at 5:29 PM, josef.p...@gmail.com wrote: On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar mistersh...@gmail.com wrote: Yes, I totally agree. If I get started on the PR to

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread josef.pktd
On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar mistersh...@gmail.com wrote: Yes, I totally agree. If I get started on the PR to deprecate np.outer, maybe I can do it as part of the same PR? On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg sebast...@sipsolutions.net wrote: Just a general

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread Sebastian Berg
Just a general thing, if someone has a few minutes, I think it would make sense to add the ufunc.reduce thing to all of these functions at least in the See Also or Notes section in the documentation. These special attributes are not that well known, and I think that might be a nice way to make it

Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread Neil Girdhar
Yes, I totally agree. If I get started on the PR to deprecate np.outer, maybe I can do it as part of the same PR? On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg sebast...@sipsolutions.net wrote: Just a general thing, if someone has a few minutes, I think it would make sense to add the

Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Neil Girdhar
Yeah, I'm not arguing, I'm just curious about your reasoning. That explains why not C++. Why would you want to do this in C and not Python? On Wed, Apr 15, 2015 at 1:48 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Tue, Apr 14, 2015 at 6:16 PM, Neil Girdhar mistersh...@gmail.com