Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 4:36 AM, Neil Girdhar mistersh...@gmail.com wrote:

 Yeah, I'm not arguing, I'm just curious about your reasoning.  That
 explains why not C++.  Why would you want to do this in C and not Python?


Well, the algorithm has to iterate over all the inputs, updating the
estimated percentile positions at every iteration. Because the estimated
percentiles may change in every iteration, I don't think there is an easy
way of vectorizing the calculation with numpy. So I think it would be very
slow if done in Python.

Looking at this in some more details, how is this typically used? Because
it gives you approximate values that should split your sample into
similarly filled bins, but because the values are approximate, to compute a
proper histogram you would still need to do the binning to get the exact
results, right? Even with this drawback P-2 does have an algorithmic
advantage, so for huge inputs and many bins it should come ahead. But for
many medium sized problems it may be faster to simply use np.partition,
which gives you the whole thing in a single go. And it would be much
simpler to implement.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Neil Girdhar
You got it.  I remember this from when I worked at Google and we would
process (many many) logs.  With enough bins, the approximation is still
really close.  It's great if you want to make an automatic plot of data.
Calling numpy.partition a hundred times is probably slower than calling P^2
with n=100 bins.  I don't think it does O(n) computations per point.  I
think it's more like O(log(n)).

Best,

Neil

On Wed, Apr 15, 2015 at 10:02 AM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 On Wed, Apr 15, 2015 at 4:36 AM, Neil Girdhar mistersh...@gmail.com
 wrote:

 Yeah, I'm not arguing, I'm just curious about your reasoning.  That
 explains why not C++.  Why would you want to do this in C and not Python?


 Well, the algorithm has to iterate over all the inputs, updating the
 estimated percentile positions at every iteration. Because the estimated
 percentiles may change in every iteration, I don't think there is an easy
 way of vectorizing the calculation with numpy. So I think it would be very
 slow if done in Python.

 Looking at this in some more details, how is this typically used? Because
 it gives you approximate values that should split your sample into
 similarly filled bins, but because the values are approximate, to compute a
 proper histogram you would still need to do the binning to get the exact
 results, right? Even with this drawback P-2 does have an algorithmic
 advantage, so for huge inputs and many bins it should come ahead. But for
 many medium sized problems it may be faster to simply use np.partition,
 which gives you the whole thing in a single go. And it would be much
 simpler to implement.

 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
 de dominación mundial.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Benjamin Root
Then you can set about convincing matplotlib and friends to
use it by default

Just to note, this proposal was originally made over in the matplotlib
project. We sent it over here where its benefits would have wider reach.
Matplotlib's plan is not to change the defaults, but to offload as much as
possible to numpy so that it can support these new features if they are
available. We might need to do some input validation so that users running
older version of numpy can get a sensible error message.

Cheers!
Ben Root


On Tue, Apr 14, 2015 at 7:12 PM, Nathaniel Smith n...@pobox.com wrote:

 On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Can I suggest that we instead add the P-square algorithm for the dynamic
  calculation of histograms?
  (
 http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf
 )
 
  This is already implemented in C++'s boost library
  (
 http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp
 )
 
  I implemented it in Boost Python as a module, which I'm happy to share.
  This is much better than fixed-width histograms in practice.  Rather than
  adjusting the number of bins, it adjusts what you really want, which is
 the
  resolution of the bins throughout the domain.

 This definitely sounds like a useful thing to have in numpy or scipy
 (though if it's possible to do without using Boost/C++ that would be
 nice). But yeah, we should leave the existing histogram alone (in this
 regard) and add a new name for this like adaptive_histogram or
 something. Then you can set about convincing matplotlib and friends to
 use it by default :-)

 -n

 --
 Nathaniel J. Smith -- http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 8:06 AM, Neil Girdhar mistersh...@gmail.com wrote:

 You got it.  I remember this from when I worked at Google and we would
 process (many many) logs.  With enough bins, the approximation is still
 really close.  It's great if you want to make an automatic plot of data.
 Calling numpy.partition a hundred times is probably slower than calling P^2
 with n=100 bins.  I don't think it does O(n) computations per point.  I
 think it's more like O(log(n)).


Looking at it again, it probably is O(n) after all: it does a binary
search, which is O(log n), but it then goes on to update all the n bin
counters and estimations, so O(n) I'm afraid. So there is no algorithmic
advantage over partition/percentile: if there are m samples and n bins, P-2
that O(n) m times, while partition does O(m) n times, so both end up being
O(m n). It seems to me that the big thing of P^2 is not having to hold the
full dataset in memory. Online statistics (is that the name for this?),
even if only estimations, is a cool thing, but I am not sure numpy is the
place for them. That's not to say that we couldn't eventually have P^2
implemented for histogram, but I would start off with a partition based one.

Would SciPy have a place for online statistics? Perhaps there's room for
yet another scikit?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Jaime Fernández del Río
On Wed, Apr 15, 2015 at 9:14 AM, Eric Moore e...@redtetrahedron.org wrote:

 This blog post, and the links within also seem relevant.  Appears to have
 python code available to try things out as well.


 https://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest


Very cool indeed... The original works is licensed under an Apache 2.0
license (https://github.com/tdunning/t-digest/blob/master/LICENSE). I am
not fluent in legalese, so not sure whether that means we can use it or
not, seems awfully more complicated than what we normally use.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Eric Moore
This blog post, and the links within also seem relevant.  Appears to have
python code available to try things out as well.

https://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest

-Eric

On Wed, Apr 15, 2015 at 11:24 AM, Benjamin Root ben.r...@ou.edu wrote:

 Then you can set about convincing matplotlib and friends to
 use it by default

 Just to note, this proposal was originally made over in the matplotlib
 project. We sent it over here where its benefits would have wider reach.
 Matplotlib's plan is not to change the defaults, but to offload as much as
 possible to numpy so that it can support these new features if they are
 available. We might need to do some input validation so that users running
 older version of numpy can get a sensible error message.

 Cheers!
 Ben Root


 On Tue, Apr 14, 2015 at 7:12 PM, Nathaniel Smith n...@pobox.com wrote:

 On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Can I suggest that we instead add the P-square algorithm for the dynamic
  calculation of histograms?
  (
 http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf
 )
 
  This is already implemented in C++'s boost library
  (
 http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp
 )
 
  I implemented it in Boost Python as a module, which I'm happy to share.
  This is much better than fixed-width histograms in practice.  Rather
 than
  adjusting the number of bins, it adjusts what you really want, which is
 the
  resolution of the bins throughout the domain.

 This definitely sounds like a useful thing to have in numpy or scipy
 (though if it's possible to do without using Boost/C++ that would be
 nice). But yeah, we should leave the existing histogram alone (in this
 regard) and add a new name for this like adaptive_histogram or
 something. Then you can set about convincing matplotlib and friends to
 use it by default :-)

 -n

 --
 Nathaniel J. Smith -- http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [ANN] python-blosc v1.2.5

2015-04-15 Thread Valentin Haenel
=
Announcing python-blosc 1.2.5
=

What is new?


This release contains support for Blosc v1.5.4 including changes to how
the GIL is kept. This was required because Blosc was refactored in the
v1.5.x line to remove global variables and to use context objects
instead. As such, it became necessary to keep the GIL while calling
Blosc from Python code that uses the multiprocessing module.

In addition, is now possible to change the blocksize used by Blosc using
``set_blocksize``. When using this however, bear in mind that the
blocksize has been finely tuned to be a good default value and that
randomly messing with this value may have unforeseen and unpredictable
consequences on the performance of Blosc.

Additionally, we can now compile on Posix architectures, thanks again to
Andreas Schwab for that one.

For more info, you can have a look at the release notes in:

https://github.com/Blosc/python-blosc/wiki/Release-notes

More docs and examples are available in the documentation site:

http://python-blosc.blosc.org


What is it?
===

Blosc (http://www.blosc.org) is a high performance compressor
optimized for binary data.  It has been designed to transmit data to
the processor cache faster than the traditional, non-compressed,
direct memory fetch approach via a memcpy() OS call.

Blosc is the first compressor that is meant not only to reduce the size
of large datasets on-disk or in-memory, but also to accelerate object
manipulations that are memory-bound
(http://www.blosc.org/docs/StarvingCPUs.pdf).  See
http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on
how much speed it can achieve in some datasets.

Blosc works well for compressing numerical arrays that contains data
with relatively low entropy, like sparse data, time series, grids with
regular-spaced values, etc.

python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for
the Blosc compression library.

There is also a handy tool built on Blosc called Bloscpack
(https://github.com/Blosc/bloscpack). It features a commmand line
interface that allows you to compress large binary datafiles on-disk.
It also comes with a Python API that has built-in support for
serializing and deserializing Numpy arrays both on-disk and in-memory at
speeds that are competitive with regular Pickle/cPickle machinery.


Installing
==

python-blosc is in PyPI repository, so installing it is easy:

$ pip install -U blosc  # yes, you should omit the python- prefix


Download sources


The sources are managed through github services at:

http://github.com/Blosc/python-blosc


Documentation
=

There is Sphinx-based documentation site at:

http://python-blosc.blosc.org/


Mailing list


There is an official mailing list for Blosc at:

bl...@googlegroups.com
http://groups.google.es/group/blosc


Licenses


Both Blosc and its Python wrapper are distributed using the MIT license.
See:

https://github.com/Blosc/python-blosc/blob/master/LICENSES

for more details.



  **Enjoy data!**
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] IDE's for numpy development?

2015-04-15 Thread Joseph Martinot-Lagarde
Le 08/04/2015 21:19, Yuxiang Wang a écrit :
 I think spyder supports code highlighting in C and that's all...
 There's no way to compile in Spyder, is there?

Well, you could write a compilation script using Scons and run it from 
spyder ! :)

But no, spyder is very python-oriented and there is no way to compile C 
in spyder.
For information the next version should have a better support for 
plugins so it could be done as a third-party extension.

Joseph


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread josef.pktd
On Wed, Apr 15, 2015 at 6:08 PM,  josef.p...@gmail.com wrote:
 On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote:
 Does it work for you to set

 outer = np.multiply.outer

 ?

 It's actually faster on my machine.

 I assume it does because np.corrcoeff uses it, and it's the same type
 of use cases.
 However, I'm not using it very often (I prefer broadcasting), but I've
 seen it often enough when reviewing code.

 This is mainly to point out that it could be a popular function (that
 maybe shouldn't be deprecated)

 https://github.com/search?utf8=%E2%9C%93q=np.outer
 416914

After thinking another minute:

I think it should not be deprecated, it's like toepliz. We can use it
also to normalize 2d arrays where columns and rows are different not
symmetric as in the corrcoef case.

Josef



 Josef



 On Wed, Apr 15, 2015 at 5:29 PM, josef.p...@gmail.com wrote:

 On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Yes, I totally agree.  If I get started on the PR to deprecate np.outer,
  maybe I can do it as part of the same PR?
 
  On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg
  sebast...@sipsolutions.net
  wrote:
 
  Just a general thing, if someone has a few minutes, I think it would
  make sense to add the ufunc.reduce thing to all of these functions at
  least in the See Also or Notes section in the documentation.
 
  These special attributes are not that well known, and I think that
  might
  be a nice way to make it easier to find.
 
  - Sebastian
 
  On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote:
   I am, yes.
  
   On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com
   wrote:
   Ok, I didn't know that.  Are you at pycon by any chance?
  
   On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith
   n...@pobox.com wrote:
   On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar
   mistersh...@gmail.com wrote:
Yes, I totally agree with you regarding np.sum and
   np.product, which is why
I didn't suggest np.add.reduce, np.multiply.reduce.
   I wasn't sure whether
cumsum and cumprod might be on the line in your
   judgment.
  
   Ah, I see. I think we should treat them the same for
   now -- all the
   comments I made apply to a lesser or greater extent
   (in particular,
   cumsum and cumprod both do the thing where they
   dispatch to .cumsum()
   .cumprod() method).
  
   -n
  
   --
   Nathaniel J. Smith -- http://vorpus.org
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
  
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


 I'm just looking at this thread.

 I see outer used quite often

 corrcoef = cov / np.outer(std, std)

 (even I use it sometimes instead of
 cov / std[:,None] / std

 Josef
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread josef.pktd
On Wed, Apr 15, 2015 at 6:40 PM, Nathaniel Smith n...@pobox.com wrote:
 On Wed, Apr 15, 2015 at 6:08 PM,  josef.p...@gmail.com wrote:
 On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote:
 Does it work for you to set

 outer = np.multiply.outer

 ?

 It's actually faster on my machine.

 I assume it does because np.corrcoeff uses it, and it's the same type
 of use cases.
 However, I'm not using it very often (I prefer broadcasting), but I've
 seen it often enough when reviewing code.

 This is mainly to point out that it could be a popular function (that
 maybe shouldn't be deprecated)

 https://github.com/search?utf8=%E2%9C%93q=np.outer
 416914

 For future reference, that's not the number -- you have to click
 through to Code and then look at a single-language result to get
 anything remotely meaningful. In this case b/c they're different by an
 order of magnitude, and in general because sometimes the top line
 number is completely made up (like it has no relation to the
 per-language numbers on the left and then changes around randomly if
 you simply reload the page).

 (So 29,397 is what you want in this case.)

 Also that count then tends to have tons of duplicates (e.g. b/c there
 are hundreds of copies of numpy itself on github), so you need a big
 grain of salt when looking at the absolute number, but it can be
 useful, esp. for relative comparisons.

My mistake, rushing too much.
github show only 25 code references in numpy itself.

in quotes, python only  (namespace conscious packages on github)
(I think github counts modules not instances)

np.cumsum 11,022
np.cumprod 1,290
np.outer 6,838

statsmodels
np.cumsum 21
np.cumprod  2
np.outer 15

Josef


 -n
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread Neil Girdhar
I don't understand.  Are you at pycon by any chance?

On Wed, Apr 15, 2015 at 6:12 PM, josef.p...@gmail.com wrote:

 On Wed, Apr 15, 2015 at 6:08 PM,  josef.p...@gmail.com wrote:
  On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Does it work for you to set
 
  outer = np.multiply.outer
 
  ?
 
  It's actually faster on my machine.
 
  I assume it does because np.corrcoeff uses it, and it's the same type
  of use cases.
  However, I'm not using it very often (I prefer broadcasting), but I've
  seen it often enough when reviewing code.
 
  This is mainly to point out that it could be a popular function (that
  maybe shouldn't be deprecated)
 
  https://github.com/search?utf8=%E2%9C%93q=np.outer
  416914

 After thinking another minute:

 I think it should not be deprecated, it's like toepliz. We can use it
 also to normalize 2d arrays where columns and rows are different not
 symmetric as in the corrcoef case.

 Josef


 
  Josef
 
 
 
  On Wed, Apr 15, 2015 at 5:29 PM, josef.p...@gmail.com wrote:
 
  On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar mistersh...@gmail.com
  wrote:
   Yes, I totally agree.  If I get started on the PR to deprecate
 np.outer,
   maybe I can do it as part of the same PR?
  
   On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg
   sebast...@sipsolutions.net
   wrote:
  
   Just a general thing, if someone has a few minutes, I think it would
   make sense to add the ufunc.reduce thing to all of these functions
 at
   least in the See Also or Notes section in the documentation.
  
   These special attributes are not that well known, and I think that
   might
   be a nice way to make it easier to find.
  
   - Sebastian
  
   On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote:
I am, yes.
   
On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com
wrote:
Ok, I didn't know that.  Are you at pycon by any chance?
   
On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith
n...@pobox.com wrote:
On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar
mistersh...@gmail.com wrote:
 Yes, I totally agree with you regarding np.sum
 and
np.product, which is why
 I didn't suggest np.add.reduce,
 np.multiply.reduce.
I wasn't sure whether
 cumsum and cumprod might be on the line in your
judgment.
   
Ah, I see. I think we should treat them the same
 for
now -- all the
comments I made apply to a lesser or greater
 extent
(in particular,
cumsum and cumprod both do the thing where they
dispatch to .cumsum()
.cumprod() method).
   
-n
   
--
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
   
http://mail.scipy.org/mailman/listinfo/numpy-discussion
   
   
   
   
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
   
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
 
 
  I'm just looking at this thread.
 
  I see outer used quite often
 
  corrcoef = cov / np.outer(std, std)
 
  (even I use it sometimes instead of
  cov / std[:,None] / std
 
  Josef
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread Nathaniel Smith
On Wed, Apr 15, 2015 at 6:08 PM,  josef.p...@gmail.com wrote:
 On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote:
 Does it work for you to set

 outer = np.multiply.outer

 ?

 It's actually faster on my machine.

 I assume it does because np.corrcoeff uses it, and it's the same type
 of use cases.
 However, I'm not using it very often (I prefer broadcasting), but I've
 seen it often enough when reviewing code.

 This is mainly to point out that it could be a popular function (that
 maybe shouldn't be deprecated)

 https://github.com/search?utf8=%E2%9C%93q=np.outer
 416914

For future reference, that's not the number -- you have to click
through to Code and then look at a single-language result to get
anything remotely meaningful. In this case b/c they're different by an
order of magnitude, and in general because sometimes the top line
number is completely made up (like it has no relation to the
per-language numbers on the left and then changes around randomly if
you simply reload the page).

(So 29,397 is what you want in this case.)

Also that count then tends to have tons of duplicates (e.g. b/c there
are hundreds of copies of numpy itself on github), so you need a big
grain of salt when looking at the absolute number, but it can be
useful, esp. for relative comparisons.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Neil Girdhar
Cool, thanks for looking at this.  P2 might still be better even if the
whole dataset is in memory because of cache misses.  Partition, which I
guess is based on quickselect, is going to run over all of the data as many
times as there are bins roughly, whereas p2 only runs over it once.  From a
cache miss standpoint, I think p2 is better?  Anyway, it might be worth
maybe coding to verify any performance advantages?  Not sure if it should
be in numpy or not since it really should accept an iterable rather than a
numpy vector, right?

Best,

Neil

On Wed, Apr 15, 2015 at 12:40 PM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 On Wed, Apr 15, 2015 at 8:06 AM, Neil Girdhar mistersh...@gmail.com
 wrote:

 You got it.  I remember this from when I worked at Google and we would
 process (many many) logs.  With enough bins, the approximation is still
 really close.  It's great if you want to make an automatic plot of data.
 Calling numpy.partition a hundred times is probably slower than calling P^2
 with n=100 bins.  I don't think it does O(n) computations per point.  I
 think it's more like O(log(n)).


 Looking at it again, it probably is O(n) after all: it does a binary
 search, which is O(log n), but it then goes on to update all the n bin
 counters and estimations, so O(n) I'm afraid. So there is no algorithmic
 advantage over partition/percentile: if there are m samples and n bins, P-2
 that O(n) m times, while partition does O(m) n times, so both end up being
 O(m n). It seems to me that the big thing of P^2 is not having to hold the
 full dataset in memory. Online statistics (is that the name for this?),
 even if only estimations, is a cool thing, but I am not sure numpy is the
 place for them. That's not to say that we couldn't eventually have P^2
 implemented for histogram, but I would start off with a partition based one.

 Would SciPy have a place for online statistics? Perhaps there's room for
 yet another scikit?

 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
 de dominación mundial.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread josef.pktd
On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar mistersh...@gmail.com wrote:
 Does it work for you to set

 outer = np.multiply.outer

 ?

 It's actually faster on my machine.

I assume it does because np.corrcoeff uses it, and it's the same type
of use cases.
However, I'm not using it very often (I prefer broadcasting), but I've
seen it often enough when reviewing code.

This is mainly to point out that it could be a popular function (that
maybe shouldn't be deprecated)

https://github.com/search?utf8=%E2%9C%93q=np.outer
416914

Josef



 On Wed, Apr 15, 2015 at 5:29 PM, josef.p...@gmail.com wrote:

 On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Yes, I totally agree.  If I get started on the PR to deprecate np.outer,
  maybe I can do it as part of the same PR?
 
  On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg
  sebast...@sipsolutions.net
  wrote:
 
  Just a general thing, if someone has a few minutes, I think it would
  make sense to add the ufunc.reduce thing to all of these functions at
  least in the See Also or Notes section in the documentation.
 
  These special attributes are not that well known, and I think that
  might
  be a nice way to make it easier to find.
 
  - Sebastian
 
  On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote:
   I am, yes.
  
   On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com
   wrote:
   Ok, I didn't know that.  Are you at pycon by any chance?
  
   On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith
   n...@pobox.com wrote:
   On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar
   mistersh...@gmail.com wrote:
Yes, I totally agree with you regarding np.sum and
   np.product, which is why
I didn't suggest np.add.reduce, np.multiply.reduce.
   I wasn't sure whether
cumsum and cumprod might be on the line in your
   judgment.
  
   Ah, I see. I think we should treat them the same for
   now -- all the
   comments I made apply to a lesser or greater extent
   (in particular,
   cumsum and cumprod both do the thing where they
   dispatch to .cumsum()
   .cumprod() method).
  
   -n
  
   --
   Nathaniel J. Smith -- http://vorpus.org
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
  
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


 I'm just looking at this thread.

 I see outer used quite often

 corrcoef = cov / np.outer(std, std)

 (even I use it sometimes instead of
 cov / std[:,None] / std

 Josef
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread Neil Girdhar
Does it work for you to set

outer = np.multiply.outer

?

It's actually faster on my machine.

On Wed, Apr 15, 2015 at 5:29 PM, josef.p...@gmail.com wrote:

 On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar mistersh...@gmail.com
 wrote:
  Yes, I totally agree.  If I get started on the PR to deprecate np.outer,
  maybe I can do it as part of the same PR?
 
  On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg 
 sebast...@sipsolutions.net
  wrote:
 
  Just a general thing, if someone has a few minutes, I think it would
  make sense to add the ufunc.reduce thing to all of these functions at
  least in the See Also or Notes section in the documentation.
 
  These special attributes are not that well known, and I think that might
  be a nice way to make it easier to find.
 
  - Sebastian
 
  On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote:
   I am, yes.
  
   On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com
 wrote:
   Ok, I didn't know that.  Are you at pycon by any chance?
  
   On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith
   n...@pobox.com wrote:
   On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar
   mistersh...@gmail.com wrote:
Yes, I totally agree with you regarding np.sum and
   np.product, which is why
I didn't suggest np.add.reduce, np.multiply.reduce.
   I wasn't sure whether
cumsum and cumprod might be on the line in your
   judgment.
  
   Ah, I see. I think we should treat them the same for
   now -- all the
   comments I made apply to a lesser or greater extent
   (in particular,
   cumsum and cumprod both do the thing where they
   dispatch to .cumsum()
   .cumprod() method).
  
   -n
  
   --
   Nathaniel J. Smith -- http://vorpus.org
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
  
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
  
  
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


 I'm just looking at this thread.

 I see outer used quite often

 corrcoef = cov / np.outer(std, std)

 (even I use it sometimes instead of
 cov / std[:,None] / std

 Josef
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread josef.pktd
On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar mistersh...@gmail.com wrote:
 Yes, I totally agree.  If I get started on the PR to deprecate np.outer,
 maybe I can do it as part of the same PR?

 On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg sebast...@sipsolutions.net
 wrote:

 Just a general thing, if someone has a few minutes, I think it would
 make sense to add the ufunc.reduce thing to all of these functions at
 least in the See Also or Notes section in the documentation.

 These special attributes are not that well known, and I think that might
 be a nice way to make it easier to find.

 - Sebastian

 On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote:
  I am, yes.
 
  On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com wrote:
  Ok, I didn't know that.  Are you at pycon by any chance?
 
  On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith
  n...@pobox.com wrote:
  On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar
  mistersh...@gmail.com wrote:
   Yes, I totally agree with you regarding np.sum and
  np.product, which is why
   I didn't suggest np.add.reduce, np.multiply.reduce.
  I wasn't sure whether
   cumsum and cumprod might be on the line in your
  judgment.
 
  Ah, I see. I think we should treat them the same for
  now -- all the
  comments I made apply to a lesser or greater extent
  (in particular,
  cumsum and cumprod both do the thing where they
  dispatch to .cumsum()
  .cumprod() method).
 
  -n
 
  --
  Nathaniel J. Smith -- http://vorpus.org
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



I'm just looking at this thread.

I see outer used quite often

corrcoef = cov / np.outer(std, std)

(even I use it sometimes instead of
cov / std[:,None] / std

Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread Sebastian Berg
Just a general thing, if someone has a few minutes, I think it would
make sense to add the ufunc.reduce thing to all of these functions at
least in the See Also or Notes section in the documentation.

These special attributes are not that well known, and I think that might
be a nice way to make it easier to find.

- Sebastian

On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote:
 I am, yes.
 
 On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com wrote:
 Ok, I didn't know that.  Are you at pycon by any chance?
 
 On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith
 n...@pobox.com wrote:
 On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar
 mistersh...@gmail.com wrote:
  Yes, I totally agree with you regarding np.sum and
 np.product, which is why
  I didn't suggest np.add.reduce, np.multiply.reduce.
 I wasn't sure whether
  cumsum and cumprod might be on the line in your
 judgment.
 
 Ah, I see. I think we should treat them the same for
 now -- all the
 comments I made apply to a lesser or greater extent
 (in particular,
 cumsum and cumprod both do the thing where they
 dispatch to .cumsum()
 .cumprod() method).
 
 -n
 
 --
 Nathaniel J. Smith -- http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors

2015-04-15 Thread Neil Girdhar
Yes, I totally agree.  If I get started on the PR to deprecate np.outer,
maybe I can do it as part of the same PR?

On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 Just a general thing, if someone has a few minutes, I think it would
 make sense to add the ufunc.reduce thing to all of these functions at
 least in the See Also or Notes section in the documentation.

 These special attributes are not that well known, and I think that might
 be a nice way to make it easier to find.

 - Sebastian

 On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote:
  I am, yes.
 
  On Apr 14, 2015 9:17 PM, Neil Girdhar mistersh...@gmail.com wrote:
  Ok, I didn't know that.  Are you at pycon by any chance?
 
  On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith
  n...@pobox.com wrote:
  On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar
  mistersh...@gmail.com wrote:
   Yes, I totally agree with you regarding np.sum and
  np.product, which is why
   I didn't suggest np.add.reduce, np.multiply.reduce.
  I wasn't sure whether
   cumsum and cumprod might be on the line in your
  judgment.
 
  Ah, I see. I think we should treat them the same for
  now -- all the
  comments I made apply to a lesser or greater extent
  (in particular,
  cumsum and cumprod both do the thing where they
  dispatch to .cumsum()
  .cumprod() method).
 
  -n
 
  --
  Nathaniel J. Smith -- http://vorpus.org
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-15 Thread Neil Girdhar
Yeah, I'm not arguing, I'm just curious about your reasoning.  That
explains why not C++.  Why would you want to do this in C and not Python?

On Wed, Apr 15, 2015 at 1:48 AM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 On Tue, Apr 14, 2015 at 6:16 PM, Neil Girdhar mistersh...@gmail.com
 wrote:

 If you're going to C, is there a reason not to go to C++ and include the
 already-written Boost code?  Otherwise, why not use Python?


 I think we have an explicit rule against C++, although I may be wrong. Not
 sure how much of boost we would have to make part of numpy to use that, the
 whole accumulators lib I'm guessing? Seems like an awful lot given what we
 are after.

 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
 de dominación mundial.

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion