Re: [Numpy-discussion] Automatic number of bins for numpy histograms
On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Sun, Apr 12, 2015 at 12:19 AM, Varun nayy...@gmail.com wrote: http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta tistics/A http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/statistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb Long story short, histogram visualisations that depend on numpy (such as matplotlib, or nearly all of them) have poor default behaviour as I have to constantly play around with the number of bins to get a good idea of what I'm looking at. The bins=10 works ok for up to 1000 points or very normal data, but has poor performance for anything else, and doesn't account for variability either. I don't have a method easily available to scale the number of bins given the data. R doesn't suffer from these problems and provides methods for use with it's hist method. I would like to provide similar functionality for matplotlib, to at least provide some kind of good starting point, as histograms are very useful for initial data discovery. The notebook above provides an explanation of the problem as well as some proposed alternatives. Use different datasets (type and size) to see the performance of the suggestions. All of the methods proposed exist in R and literature. I've put together an implementation to add this new functionality, but am hesitant to make a pull request as I would like some feedback from a maintainer before doing so. +1 on the PR. +1 as well. Unfortunately we can't change the default of 10, but a number of string methods, with a bins=auto or some such name prominently recommended in the docstring, would be very good to have. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic number of bins for numpy histograms
On Sun, Apr 12, 2015 at 12:19 AM, Varun nayy...@gmail.com wrote: http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta tistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb Long story short, histogram visualisations that depend on numpy (such as matplotlib, or nearly all of them) have poor default behaviour as I have to constantly play around with the number of bins to get a good idea of what I'm looking at. The bins=10 works ok for up to 1000 points or very normal data, but has poor performance for anything else, and doesn't account for variability either. I don't have a method easily available to scale the number of bins given the data. R doesn't suffer from these problems and provides methods for use with it's hist method. I would like to provide similar functionality for matplotlib, to at least provide some kind of good starting point, as histograms are very useful for initial data discovery. The notebook above provides an explanation of the problem as well as some proposed alternatives. Use different datasets (type and size) to see the performance of the suggestions. All of the methods proposed exist in R and literature. I've put together an implementation to add this new functionality, but am hesitant to make a pull request as I would like some feedback from a maintainer before doing so. +1 on the PR. Jaime -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Automatic number of bins for numpy histograms
Using a URL shortener for the notebook to get around the 80 char width limit http://goo.gl/JmfTRJ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Automatic number of bins for numpy histograms
http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta tistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb Long story short, histogram visualisations that depend on numpy (such as matplotlib, or nearly all of them) have poor default behaviour as I have to constantly play around with the number of bins to get a good idea of what I'm looking at. The bins=10 works ok for up to 1000 points or very normal data, but has poor performance for anything else, and doesn't account for variability either. I don't have a method easily available to scale the number of bins given the data. R doesn't suffer from these problems and provides methods for use with it's hist method. I would like to provide similar functionality for matplotlib, to at least provide some kind of good starting point, as histograms are very useful for initial data discovery. The notebook above provides an explanation of the problem as well as some proposed alternatives. Use different datasets (type and size) to see the performance of the suggestions. All of the methods proposed exist in R and literature. I've put together an implementation to add this new functionality, but am hesitant to make a pull request as I would like some feedback from a maintainer before doing so. https://github.com/numpy/numpy/compare/master...nayyarv:master I've provided them as functions for easy refactoring, as it can be argued that it should be in it's own function/file/class, or alternatively can be turned into simple if, elif statements. I believe this belongs in numpy as it is where the functionality exists for histogram methods that most libraries build on, and it would useful for them to not require scipy for example. I will update the documentation accordingly before making a pull request, and add in more tests to show it's functionality. I can adapt my ipython notebook into a quick tutorial/help file if need be. I've already attempted to add this into matplotlib before being redirected here https://github.com/matplotlib/matplotlib/issues/4316 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Numpy compilation error
Dear all, Upon trying to install numpy using 'pip install numpy' in a virtualenv, I get the following error messages: creating build/temp.linux-x86_64-2.7/numpy/random/mtrand compile options: '-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -c' gcc: numpy/random/mtrand/distributions.c numpy/random/mtrand/distributions.c: In function ‘loggam’: numpy/random/mtrand/distributions.c:892:1: internal compiler error: Illegal instruction } ^ Please submit a full bug report, with preprocessed source if appropriate. See http://bugzilla.redhat.com/bugzilla for instructions. Preprocessed source stored into /tmp/ccjkBSd2.out file, please attach this to your bugreport. This leads to the compilation process failing with this error: Cleaning up... Command /home/mescalin/pkerp/.virtualenvs/notebooks/bin/python -c import setuptools;__file__='/home/mescalin/pkerp/.virtualenvs/notebooks/build/numpy/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file __, 'exec')) install --record /tmp/pip-c_Cd7B-record/install-record.txt --single-version-externally-managed --install-headers /home/mescalin/pkerp/.virtualenvs/notebooks/include/site/python2.7 failed with error code 1 in /ho me/mescalin/pkerp/.virtualenvs/notebooks/build/numpy Traceback (most recent call last): File /home/mescalin/pkerp/.virtualenvs/notebooks/bin/pip, line 9, in module load_entry_point('pip==1.4.1', 'console_scripts', 'pip')() File /home/mescalin/pkerp/.virtualenvs/notebooks/lib/python2.7/site-packages/pip/__init__.py, line 148, in main return command.main(args[1:], options) File /home/mescalin/pkerp/.virtualenvs/notebooks/lib/python2.7/site-packages/pip/basecommand.py, line 169, in main text = '\n'.join(complete_log) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 72: ordinal not in range(128) Have any of you encountered a similar problem before? Thanks in advance, -Peter The gcc version is: [pkerp@fluidspace ~]$ gcc --version gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. PS: ccjkBSd2.out Description: chemical/gulp ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Numpy compilation error
12.04.2015, 17:15, Peter Kerpedjiev kirjoitti: [clip] numpy/random/mtrand/distributions.c:892:1: internal compiler error: Illegal instruction An internal compiler error means your compiler (in this case, gcc) is broken. The easiest solution is to use a newer version of the compiler, assuming the compiler bug in question has been fixed. Here, it probably has, since I have not seen similar error reports before from this code. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion