Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-12 Thread Ralf Gommers
On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 On Sun, Apr 12, 2015 at 12:19 AM, Varun nayy...@gmail.com wrote:


 http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta
 tistics/A
 http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/statistics/A
 utomating%20Binwidth%20Choice%20for%20Histogram.ipynb

 Long story short, histogram visualisations that depend on numpy (such as
 matplotlib, or  nearly all of them) have poor default behaviour as I have
 to
 constantly play around with  the number of bins to get a good idea of
 what I'm
 looking at. The bins=10 works ok for  up to 1000 points or very normal
 data,
 but has poor performance for anything else, and  doesn't account for
 variability either. I don't have a method easily available to scale the
 number
 of bins given the data.

 R doesn't suffer from these problems and provides methods for use with
 it's
 hist  method. I would like to provide similar functionality for
 matplotlib, to
 at least provide  some kind of good starting point, as histograms are very
 useful for initial data discovery.

 The notebook above provides an explanation of the problem as well as some
 proposed  alternatives. Use different datasets (type and size) to see the
 performance of the  suggestions. All of the methods proposed exist in R
 and
 literature.

 I've put together an implementation to add this new functionality, but am
 hesitant to  make a pull request as I would like some feedback from a
 maintainer before doing so.


 +1 on the PR.


+1 as well.

Unfortunately we can't change the default of 10, but a number of string
methods, with a bins=auto or some such name prominently recommended in
the docstring, would be very good to have.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-12 Thread Jaime Fernández del Río
On Sun, Apr 12, 2015 at 12:19 AM, Varun nayy...@gmail.com wrote:


 http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta
 tistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb

 Long story short, histogram visualisations that depend on numpy (such as
 matplotlib, or  nearly all of them) have poor default behaviour as I have
 to
 constantly play around with  the number of bins to get a good idea of what
 I'm
 looking at. The bins=10 works ok for  up to 1000 points or very normal
 data,
 but has poor performance for anything else, and  doesn't account for
 variability either. I don't have a method easily available to scale the
 number
 of bins given the data.

 R doesn't suffer from these problems and provides methods for use with it's
 hist  method. I would like to provide similar functionality for
 matplotlib, to
 at least provide  some kind of good starting point, as histograms are very
 useful for initial data discovery.

 The notebook above provides an explanation of the problem as well as some
 proposed  alternatives. Use different datasets (type and size) to see the
 performance of the  suggestions. All of the methods proposed exist in R and
 literature.

 I've put together an implementation to add this new functionality, but am
 hesitant to  make a pull request as I would like some feedback from a
 maintainer before doing so.


+1 on the PR.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-12 Thread Varun
Using a URL shortener for the notebook to get around the 80 char width limit

http://goo.gl/JmfTRJ

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Automatic number of bins for numpy histograms

2015-04-12 Thread Varun
http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta
tistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb

Long story short, histogram visualisations that depend on numpy (such as
matplotlib, or  nearly all of them) have poor default behaviour as I have to
constantly play around with  the number of bins to get a good idea of what I'm
looking at. The bins=10 works ok for  up to 1000 points or very normal data,
but has poor performance for anything else, and  doesn't account for
variability either. I don't have a method easily available to scale the number
of bins given the data.

R doesn't suffer from these problems and provides methods for use with it's
hist  method. I would like to provide similar functionality for matplotlib, to
at least provide  some kind of good starting point, as histograms are very
useful for initial data discovery.

The notebook above provides an explanation of the problem as well as some
proposed  alternatives. Use different datasets (type and size) to see the
performance of the  suggestions. All of the methods proposed exist in R and
literature.

I've put together an implementation to add this new functionality, but am
hesitant to  make a pull request as I would like some feedback from a
maintainer before doing so.

https://github.com/numpy/numpy/compare/master...nayyarv:master

I've provided them as functions for easy refactoring, as it can be argued that
it should be  in it's own function/file/class, or alternatively can be turned
into simple if, elif statements.  I believe this belongs in numpy as it is
where the functionality exists for histogram  methods that most libraries
build on, and it would useful for them to not require scipy for  example.

I will update the documentation accordingly before making a pull request, and
add in  more tests to show it's functionality. I can adapt my ipython notebook
into a quick  tutorial/help file if need be.

I've already attempted to add this into matplotlib before being redirected
here  https://github.com/matplotlib/matplotlib/issues/4316

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Numpy compilation error

2015-04-12 Thread Peter Kerpedjiev

Dear all,

Upon trying to install numpy using 'pip install numpy' in a virtualenv, 
I get the following error messages:


creating build/temp.linux-x86_64-2.7/numpy/random/mtrand

compile options: '-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 
-D_LARGEFILE64_SOURCE=1 -Inumpy/core/include 
-Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private 
-Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath 
-Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort 
-Inumpy/core/include -I/usr/include/python2.7 
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/private 
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/private 
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/private 
-Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -c'

gcc: numpy/random/mtrand/distributions.c

numpy/random/mtrand/distributions.c: In function ‘loggam’:

numpy/random/mtrand/distributions.c:892:1: internal compiler error: Illegal 
instruction

 }


 ^

Please submit a full bug report,

with preprocessed source if appropriate.

See http://bugzilla.redhat.com/bugzilla for instructions.

Preprocessed source stored into /tmp/ccjkBSd2.out file, please attach this to 
your bugreport.

This leads to the compilation process failing with this error:


Cleaning up...

Command /home/mescalin/pkerp/.virtualenvs/notebooks/bin/python -c import 
setuptools;__file__='/home/mescalin/pkerp/.virtualenvs/notebooks/build/numpy/setup.py';exec(compile(open(__file__).read().replace('\r\n',
 '\n'), __file

__, 'exec')) install --record /tmp/pip-c_Cd7B-record/install-record.txt 
--single-version-externally-managed --install-headers 
/home/mescalin/pkerp/.virtualenvs/notebooks/include/site/python2.7 failed with error 
code 1 in /ho

me/mescalin/pkerp/.virtualenvs/notebooks/build/numpy

Traceback (most recent call last):

  File /home/mescalin/pkerp/.virtualenvs/notebooks/bin/pip, line 9, in 
module

load_entry_point('pip==1.4.1', 'console_scripts', 'pip')()

  File 
/home/mescalin/pkerp/.virtualenvs/notebooks/lib/python2.7/site-packages/pip/__init__.py,
 line 148, in main

return command.main(args[1:], options)

  File 
/home/mescalin/pkerp/.virtualenvs/notebooks/lib/python2.7/site-packages/pip/basecommand.py,
 line 169, in main

text = '\n'.join(complete_log)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 72: 
ordinal not in range(128)


Have any of you encountered a similar problem before?

Thanks in advance,

-Peter



The gcc version is:

[pkerp@fluidspace ~]$ gcc --version

gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)

Copyright (C) 2013 Free Software Foundation, Inc.

This is free software; see the source for copying conditions.  There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


PS:




ccjkBSd2.out
Description: chemical/gulp
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Numpy compilation error

2015-04-12 Thread Pauli Virtanen
12.04.2015, 17:15, Peter Kerpedjiev kirjoitti:
[clip]
 numpy/random/mtrand/distributions.c:892:1: internal compiler error:
 Illegal instruction

An internal compiler error means your compiler (in this case, gcc) is
broken. The easiest solution is to use a newer version of the compiler,
assuming the compiler bug in question has been fixed. Here, it probably
has, since I have not seen similar error reports before from this code.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion