[Numpy-discussion] Histogram does not preserve subclasses of ndarray (e.g. masked arrays)
Hi all, I just wanted to check if this would be considered a bug. numpy.histogram does not appear to preserve subclasses of ndarrays (e.g. masked arrays). This leads to considerable problems when working with masked arrays. (As per this Stack Overflow questionhttp://stackoverflow.com/questions/3610040/how-to-create-the-histogram-of-an-array-with-masked-values-in-numpy ) E.g. import numpy as np x = np.arange(100) x = np.ma.masked_where(x 30, x) counts, bin_edges = np.histogram(x) yields: counts -- array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10]) bin_edges -- array([ 0. , 9.9, 19.8, 29.7, 39.6, 49.5, 59.4, 69.3, 79.2, 89.1, 99. ]) I would have expected histogram to ignore the masked portion of the data. Is this a bug, or expected behavior? I'll open a bug report, if it's not expected behavior... This would appear to be easily fixed by using asanyarray rather than asarray within histogram. E.g. this diff for numpy/lib/function_base.py Index: function_base.py === --- function_base.py(revision 8604) +++ function_base.py(working copy) @@ -132,9 +132,9 @@ -a = asarray(a) +a = asanyarray(a) if weights is not None: -weights = asarray(weights) +weights = asanyarray(weights) if np.any(weights.shape != a.shape): raise ValueError( 'weights should have the same shape as a.') @@ -156,7 +156,7 @@ mx += 0.5 bins = linspace(mn, mx, bins+1, endpoint=True) else: -bins = asarray(bins) +bins = asanyarray(bins) if (np.diff(bins) 0).any(): raise AttributeError( 'bins must increase monotonically.') Thanks! -Joe ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Histogram does not preserve subclasses of ndarray (e.g. masked arrays)
On 09/02/2010 02:50 PM, Joe Kington wrote: Hi all, I just wanted to check if this would be considered a bug. numpy.histogram does not appear to preserve subclasses of ndarrays (e.g. masked arrays). This leads to considerable problems when working with masked arrays. (As per this Stack Overflow question http://stackoverflow.com/questions/3610040/how-to-create-the-histogram-of-an-array-with-masked-values-in-numpy) E.g. import numpy as np x = np.arange(100) x = np.ma.masked_where(x 30, x) counts, bin_edges = np.histogram(x) yields: counts -- array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10]) bin_edges -- array([ 0. , 9.9, 19.8, 29.7, 39.6, 49.5, 59.4, 69.3, 79.2, 89.1, 99. ]) I would have expected histogram to ignore the masked portion of the data. Is this a bug, or expected behavior? I'll open a bug report, if it's not expected behavior... This would appear to be easily fixed by using asanyarray rather than asarray within histogram. E.g. this diff for numpy/lib/function_base.py Index: function_base.py === --- function_base.py(revision 8604) +++ function_base.py(working copy) @@ -132,9 +132,9 @@ -a = asarray(a) +a = asanyarray(a) if weights is not None: -weights = asarray(weights) +weights = asanyarray(weights) if np.any(weights.shape != a.shape): raise ValueError( 'weights should have the same shape as a.') @@ -156,7 +156,7 @@ mx += 0.5 bins = linspace(mn, mx, bins+1, endpoint=True) else: -bins = asarray(bins) +bins = asanyarray(bins) if (np.diff(bins) 0).any(): raise AttributeError( 'bins must increase monotonically.') Thanks! -Joe ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion I would not call it a bug as this a known 'feature' of functions that use np.asarray(). You are welcome to file a enhancement bug but there are some issues that need to be addressed. Typical questions that come to mind are: 1) Should a user be warned that the input is a masked array? 2) Should histogram count the number of masked values? 3) What is the expected output when normed=True? 4) What type of array should be the weights and bin arguments? 5) What is the dimensions of the weight and bin arguments since it only needs to have the number of bins? 6) If the input array is masked should the weight and bins arguments also be masked arrays when applicable? If so, what happens if the masks are in different locations between arrays? Regards Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Histogram does not preserve subclasses of ndarray (e.g. masked arrays)
On Thu, Sep 2, 2010 at 3:50 PM, Joe Kington jking...@wisc.edu wrote: Hi all, I just wanted to check if this would be considered a bug. numpy.histogram does not appear to preserve subclasses of ndarrays (e.g. masked arrays). This leads to considerable problems when working with masked arrays. (As per this Stack Overflow question) E.g. import numpy as np x = np.arange(100) x = np.ma.masked_where(x 30, x) counts, bin_edges = np.histogram(x) yields: counts -- array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10]) bin_edges -- array([ 0. , 9.9, 19.8, 29.7, 39.6, 49.5, 59.4, 69.3, 79.2, 89.1, 99. ]) I would have expected histogram to ignore the masked portion of the data. Is this a bug, or expected behavior? I'll open a bug report, if it's not expected behavior... If you want to ignore masked data it's just on extra function call histogram(m_arr.compressed()) I don't think the fact that this makes an extra copy will be relevant, because I guess full masked array handling inside histogram will be a lot more expensive. Using asanyarray would also allow matrices in and other subtypes that might not be handled correctly by the histogram calculations. For anything else besides dropping masked observations, it would be necessary to figure out what the masked array definition of a histogram is, as Bruce pointed out. (Another interesting question would be if histogram handles nans correctly, searchsorted ???) Josef This would appear to be easily fixed by using asanyarray rather than asarray within histogram. E.g. this diff for numpy/lib/function_base.py Index: function_base.py === --- function_base.py (revision 8604) +++ function_base.py (working copy) @@ -132,9 +132,9 @@ - a = asarray(a) + a = asanyarray(a) if weights is not None: - weights = asarray(weights) + weights = asanyarray(weights) if np.any(weights.shape != a.shape): raise ValueError( 'weights should have the same shape as a.') @@ -156,7 +156,7 @@ mx += 0.5 bins = linspace(mn, mx, bins+1, endpoint=True) else: - bins = asarray(bins) + bins = asanyarray(bins) if (np.diff(bins) 0).any(): raise AttributeError( 'bins must increase monotonically.') Thanks! -Joe ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Histogram does not preserve subclasses of ndarray (e.g. masked arrays)
On Thu, Sep 2, 2010 at 5:31 PM, josef.p...@gmail.com wrote: On Thu, Sep 2, 2010 at 3:50 PM, Joe Kington jking...@wisc.edu wrote: Hi all, I just wanted to check if this would be considered a bug. numpy.histogram does not appear to preserve subclasses of ndarrays (e.g. masked arrays). This leads to considerable problems when working with masked arrays. (As per this Stack Overflow question) E.g. import numpy as np x = np.arange(100) x = np.ma.masked_where(x 30, x) counts, bin_edges = np.histogram(x) yields: counts -- array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10]) bin_edges -- array([ 0. , 9.9, 19.8, 29.7, 39.6, 49.5, 59.4, 69.3, 79.2, 89.1, 99. ]) I would have expected histogram to ignore the masked portion of the data. Is this a bug, or expected behavior? I'll open a bug report, if it's not expected behavior... If you want to ignore masked data it's just on extra function call histogram(m_arr.compressed()) I don't think the fact that this makes an extra copy will be relevant, because I guess full masked array handling inside histogram will be a lot more expensive. Using asanyarray would also allow matrices in and other subtypes that might not be handled correctly by the histogram calculations. For anything else besides dropping masked observations, it would be necessary to figure out what the masked array definition of a histogram is, as Bruce pointed out. (Another interesting question would be if histogram handles nans correctly, searchsorted ???) Josef Good points all around. I'll skip the enhancement request. Sorry for the noise! Thanks! -Joe This would appear to be easily fixed by using asanyarray rather than asarray within histogram. E.g. this diff for numpy/lib/function_base.py Index: function_base.py === --- function_base.py(revision 8604) +++ function_base.py(working copy) @@ -132,9 +132,9 @@ -a = asarray(a) +a = asanyarray(a) if weights is not None: -weights = asarray(weights) +weights = asanyarray(weights) if np.any(weights.shape != a.shape): raise ValueError( 'weights should have the same shape as a.') @@ -156,7 +156,7 @@ mx += 0.5 bins = linspace(mn, mx, bins+1, endpoint=True) else: -bins = asarray(bins) +bins = asanyarray(bins) if (np.diff(bins) 0).any(): raise AttributeError( 'bins must increase monotonically.') Thanks! -Joe ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion