Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Sun, Nov 21, 2010 at 10:25 AM, Wes McKinney wesmck...@gmail.com wrote: On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman kwgood...@gmail.com wrote: On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney wesmck...@gmail.com wrote: Keith (and others), What would you think about creating a library of mostly Cython-based domain specific functions? So stuff like rolling statistical moments, nan* functions like you have here, and all that-- NumPy-array only functions that don't necessarily belong in NumPy or SciPy (but could be included on down the road). You were already talking about this on the statsmodels mailing list for larry. I spent a lot of time writing a bunch of these for pandas over the last couple of years, and I would have relatively few qualms about moving these outside of pandas and introducing a dependency. You could do the same for larry-- then we'd all be relying on the same well-vetted and tested codebase. I've started working on moving window statistics cython functions. I plan to make it into a package called Roly (for rolling). The signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, window, axis=-1), etc. I think of Nanny and Roly as two separate packages. A narrow focus is good for a new package. But maybe each package could be a subpackage in a super package? Would the function signatures in Nanny (exact duplicates of the corresponding functions in Numpy and Scipy) work for pandas? I plan to use Nanny in larry. I'll try to get the structure of the Nanny package in place. But if it doesn't attract any interest after that then I may fold it into larry. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Why make multiple packages? It seems like all these functions are somewhat related: practical tools for real-world data analysis (where observations are often missing). I suspect having everything under one hood would create more interest than chopping things up-- would be very useful to folks in many different disciplines (finance, economics, statistics, etc.). In R, for example, NA-handling is just a part of every day life. Of course in R there is a special NA value which is distinct from NaN-- many folks object to the use of NaN for missing values. The alternative is masked arrays, but in my case I wasn't willing to sacrifice so much performance for purity's sake. I could certainly use the nan* functions to replace code in pandas where I've handled things in a somewhat ad hoc way. A package focused on NaN-aware functions sounds like a good idea. I think a good plan would be to start by making faster, drop-in replacements for the NaN functions that are already in numpy and scipy. That is already a lot of work. After that, one possibility is to add stuff like nancumsum, nanprod, etc. After that moving window stuff? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Sun, Nov 21, 2010 at 2:48 PM, Keith Goodman kwgood...@gmail.com wrote: On Sun, Nov 21, 2010 at 10:25 AM, Wes McKinney wesmck...@gmail.com wrote: On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman kwgood...@gmail.com wrote: On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney wesmck...@gmail.com wrote: Keith (and others), What would you think about creating a library of mostly Cython-based domain specific functions? So stuff like rolling statistical moments, nan* functions like you have here, and all that-- NumPy-array only functions that don't necessarily belong in NumPy or SciPy (but could be included on down the road). You were already talking about this on the statsmodels mailing list for larry. I spent a lot of time writing a bunch of these for pandas over the last couple of years, and I would have relatively few qualms about moving these outside of pandas and introducing a dependency. You could do the same for larry-- then we'd all be relying on the same well-vetted and tested codebase. I've started working on moving window statistics cython functions. I plan to make it into a package called Roly (for rolling). The signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, window, axis=-1), etc. I think of Nanny and Roly as two separate packages. A narrow focus is good for a new package. But maybe each package could be a subpackage in a super package? Would the function signatures in Nanny (exact duplicates of the corresponding functions in Numpy and Scipy) work for pandas? I plan to use Nanny in larry. I'll try to get the structure of the Nanny package in place. But if it doesn't attract any interest after that then I may fold it into larry. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Why make multiple packages? It seems like all these functions are somewhat related: practical tools for real-world data analysis (where observations are often missing). I suspect having everything under one hood would create more interest than chopping things up-- would be very useful to folks in many different disciplines (finance, economics, statistics, etc.). In R, for example, NA-handling is just a part of every day life. Of course in R there is a special NA value which is distinct from NaN-- many folks object to the use of NaN for missing values. The alternative is masked arrays, but in my case I wasn't willing to sacrifice so much performance for purity's sake. I could certainly use the nan* functions to replace code in pandas where I've handled things in a somewhat ad hoc way. A package focused on NaN-aware functions sounds like a good idea. I think a good plan would be to start by making faster, drop-in replacements for the NaN functions that are already in numpy and scipy. That is already a lot of work. After that, one possibility is to add stuff like nancumsum, nanprod, etc. After that moving window stuff? and maybe group functions after that? If there is a lot of repetition, you could use templating. Even simple string substitution, if it is only replacing the dtype, works pretty well. It would at least reduce some copy-paste. Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman kwgood...@gmail.com wrote: On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney wesmck...@gmail.com wrote: Keith (and others), What would you think about creating a library of mostly Cython-based domain specific functions? So stuff like rolling statistical moments, nan* functions like you have here, and all that-- NumPy-array only functions that don't necessarily belong in NumPy or SciPy (but could be included on down the road). You were already talking about this on the statsmodels mailing list for larry. I spent a lot of time writing a bunch of these for pandas over the last couple of years, and I would have relatively few qualms about moving these outside of pandas and introducing a dependency. You could do the same for larry-- then we'd all be relying on the same well-vetted and tested codebase. I've started working on moving window statistics cython functions. I plan to make it into a package called Roly (for rolling). The signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, window, axis=-1), etc. I think of Nanny and Roly as two separate packages. A narrow focus is good for a new package. But maybe each package could be a subpackage in a super package? Would the function signatures in Nanny (exact duplicates of the corresponding functions in Numpy and Scipy) work for pandas? I plan to use Nanny in larry. I'll try to get the structure of the Nanny package in place. But if it doesn't attract any interest after that then I may fold it into larry. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Why make multiple packages? It seems like all these functions are somewhat related: practical tools for real-world data analysis (where observations are often missing). I suspect having everything under one hood would create more interest than chopping things up-- would be very useful to folks in many different disciplines (finance, economics, statistics, etc.). In R, for example, NA-handling is just a part of every day life. Of course in R there is a special NA value which is distinct from NaN-- many folks object to the use of NaN for missing values. The alternative is masked arrays, but in my case I wasn't willing to sacrifice so much performance for purity's sake. I could certainly use the nan* functions to replace code in pandas where I've handled things in a somewhat ad hoc way. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Sun, Nov 21, 2010 at 12:30 PM, josef.p...@gmail.com wrote: On Sun, Nov 21, 2010 at 2:48 PM, Keith Goodman kwgood...@gmail.com wrote: On Sun, Nov 21, 2010 at 10:25 AM, Wes McKinney wesmck...@gmail.com wrote: On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman kwgood...@gmail.com wrote: On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney wesmck...@gmail.com wrote: Keith (and others), What would you think about creating a library of mostly Cython-based domain specific functions? So stuff like rolling statistical moments, nan* functions like you have here, and all that-- NumPy-array only functions that don't necessarily belong in NumPy or SciPy (but could be included on down the road). You were already talking about this on the statsmodels mailing list for larry. I spent a lot of time writing a bunch of these for pandas over the last couple of years, and I would have relatively few qualms about moving these outside of pandas and introducing a dependency. You could do the same for larry-- then we'd all be relying on the same well-vetted and tested codebase. I've started working on moving window statistics cython functions. I plan to make it into a package called Roly (for rolling). The signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, window, axis=-1), etc. I think of Nanny and Roly as two separate packages. A narrow focus is good for a new package. But maybe each package could be a subpackage in a super package? Would the function signatures in Nanny (exact duplicates of the corresponding functions in Numpy and Scipy) work for pandas? I plan to use Nanny in larry. I'll try to get the structure of the Nanny package in place. But if it doesn't attract any interest after that then I may fold it into larry. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Why make multiple packages? It seems like all these functions are somewhat related: practical tools for real-world data analysis (where observations are often missing). I suspect having everything under one hood would create more interest than chopping things up-- would be very useful to folks in many different disciplines (finance, economics, statistics, etc.). In R, for example, NA-handling is just a part of every day life. Of course in R there is a special NA value which is distinct from NaN-- many folks object to the use of NaN for missing values. The alternative is masked arrays, but in my case I wasn't willing to sacrifice so much performance for purity's sake. I could certainly use the nan* functions to replace code in pandas where I've handled things in a somewhat ad hoc way. A package focused on NaN-aware functions sounds like a good idea. I think a good plan would be to start by making faster, drop-in replacements for the NaN functions that are already in numpy and scipy. That is already a lot of work. After that, one possibility is to add stuff like nancumsum, nanprod, etc. After that moving window stuff? and maybe group functions after that? Yes, group functions are on my list. If there is a lot of repetition, you could use templating. Even simple string substitution, if it is only replacing the dtype, works pretty well. It would at least reduce some copy-paste. Unit test coverage should be good enough to mess around with trying templating. What's a good way to go? Write my own script that creates the .pyx file and call it from the make file? Or are there packages for doing the templating? I added nanmean (the first scipy function to enter nanny) and nanmin. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Sun, Nov 21, 2010 at 5:09 PM, Keith Goodman kwgood...@gmail.com wrote: On Sun, Nov 21, 2010 at 12:30 PM, josef.p...@gmail.com wrote: On Sun, Nov 21, 2010 at 2:48 PM, Keith Goodman kwgood...@gmail.com wrote: On Sun, Nov 21, 2010 at 10:25 AM, Wes McKinney wesmck...@gmail.com wrote: On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman kwgood...@gmail.com wrote: On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney wesmck...@gmail.com wrote: Keith (and others), What would you think about creating a library of mostly Cython-based domain specific functions? So stuff like rolling statistical moments, nan* functions like you have here, and all that-- NumPy-array only functions that don't necessarily belong in NumPy or SciPy (but could be included on down the road). You were already talking about this on the statsmodels mailing list for larry. I spent a lot of time writing a bunch of these for pandas over the last couple of years, and I would have relatively few qualms about moving these outside of pandas and introducing a dependency. You could do the same for larry-- then we'd all be relying on the same well-vetted and tested codebase. I've started working on moving window statistics cython functions. I plan to make it into a package called Roly (for rolling). The signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, window, axis=-1), etc. I think of Nanny and Roly as two separate packages. A narrow focus is good for a new package. But maybe each package could be a subpackage in a super package? Would the function signatures in Nanny (exact duplicates of the corresponding functions in Numpy and Scipy) work for pandas? I plan to use Nanny in larry. I'll try to get the structure of the Nanny package in place. But if it doesn't attract any interest after that then I may fold it into larry. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Why make multiple packages? It seems like all these functions are somewhat related: practical tools for real-world data analysis (where observations are often missing). I suspect having everything under one hood would create more interest than chopping things up-- would be very useful to folks in many different disciplines (finance, economics, statistics, etc.). In R, for example, NA-handling is just a part of every day life. Of course in R there is a special NA value which is distinct from NaN-- many folks object to the use of NaN for missing values. The alternative is masked arrays, but in my case I wasn't willing to sacrifice so much performance for purity's sake. I could certainly use the nan* functions to replace code in pandas where I've handled things in a somewhat ad hoc way. A package focused on NaN-aware functions sounds like a good idea. I think a good plan would be to start by making faster, drop-in replacements for the NaN functions that are already in numpy and scipy. That is already a lot of work. After that, one possibility is to add stuff like nancumsum, nanprod, etc. After that moving window stuff? and maybe group functions after that? Yes, group functions are on my list. If there is a lot of repetition, you could use templating. Even simple string substitution, if it is only replacing the dtype, works pretty well. It would at least reduce some copy-paste. Unit test coverage should be good enough to mess around with trying templating. What's a good way to go? Write my own script that creates the .pyx file and call it from the make file? Or are there packages for doing the templating? Depends on the scale, I tried once with simple string templates http://codespeak.net/pipermail/cython-dev/2009-August/006614.html here is a pastbin of another version by (?), http://pastebin.com/f1a49143d discussed on the cython-dev mailing list. The cython list has the discussion every once in a while but I haven't seen any conclusion yet. For heavier duty templating a proper templating package (Jinja?) might be better. I'm not an expert. Josef I added nanmean (the first scipy function to enter nanny) and nanmin. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Sun, Nov 21, 2010 at 6:02 PM, josef.p...@gmail.com wrote: On Sun, Nov 21, 2010 at 5:09 PM, Keith Goodman kwgood...@gmail.com wrote: On Sun, Nov 21, 2010 at 12:30 PM, josef.p...@gmail.com wrote: On Sun, Nov 21, 2010 at 2:48 PM, Keith Goodman kwgood...@gmail.com wrote: On Sun, Nov 21, 2010 at 10:25 AM, Wes McKinney wesmck...@gmail.com wrote: On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman kwgood...@gmail.com wrote: On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney wesmck...@gmail.com wrote: Keith (and others), What would you think about creating a library of mostly Cython-based domain specific functions? So stuff like rolling statistical moments, nan* functions like you have here, and all that-- NumPy-array only functions that don't necessarily belong in NumPy or SciPy (but could be included on down the road). You were already talking about this on the statsmodels mailing list for larry. I spent a lot of time writing a bunch of these for pandas over the last couple of years, and I would have relatively few qualms about moving these outside of pandas and introducing a dependency. You could do the same for larry-- then we'd all be relying on the same well-vetted and tested codebase. I've started working on moving window statistics cython functions. I plan to make it into a package called Roly (for rolling). The signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, window, axis=-1), etc. I think of Nanny and Roly as two separate packages. A narrow focus is good for a new package. But maybe each package could be a subpackage in a super package? Would the function signatures in Nanny (exact duplicates of the corresponding functions in Numpy and Scipy) work for pandas? I plan to use Nanny in larry. I'll try to get the structure of the Nanny package in place. But if it doesn't attract any interest after that then I may fold it into larry. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Why make multiple packages? It seems like all these functions are somewhat related: practical tools for real-world data analysis (where observations are often missing). I suspect having everything under one hood would create more interest than chopping things up-- would be very useful to folks in many different disciplines (finance, economics, statistics, etc.). In R, for example, NA-handling is just a part of every day life. Of course in R there is a special NA value which is distinct from NaN-- many folks object to the use of NaN for missing values. The alternative is masked arrays, but in my case I wasn't willing to sacrifice so much performance for purity's sake. I could certainly use the nan* functions to replace code in pandas where I've handled things in a somewhat ad hoc way. A package focused on NaN-aware functions sounds like a good idea. I think a good plan would be to start by making faster, drop-in replacements for the NaN functions that are already in numpy and scipy. That is already a lot of work. After that, one possibility is to add stuff like nancumsum, nanprod, etc. After that moving window stuff? and maybe group functions after that? Yes, group functions are on my list. If there is a lot of repetition, you could use templating. Even simple string substitution, if it is only replacing the dtype, works pretty well. It would at least reduce some copy-paste. Unit test coverage should be good enough to mess around with trying templating. What's a good way to go? Write my own script that creates the .pyx file and call it from the make file? Or are there packages for doing the templating? Depends on the scale, I tried once with simple string templates http://codespeak.net/pipermail/cython-dev/2009-August/006614.html here is a pastbin of another version by (?), http://pastebin.com/f1a49143d discussed on the cython-dev mailing list. The cython list has the discussion every once in a while but I haven't seen any conclusion yet. For heavier duty templating a proper templating package (Jinja?) might be better. I'm not an expert. Josef I added nanmean (the first scipy function to enter nanny) and nanmin. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion What would you say to a single package that contains: - NaN-aware NumPy and SciPy functions (nanmean, nanmin, etc.) - moving window functions (moving_{count, sum, mean, var, std, etc.}) - core subroutines for labeled data - group-by functions - other things to add to this list? In other words, basic building computational tools for making libraries like larry, pandas, etc. and
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Sun, Nov 21, 2010 at 6:37 PM, Keith Goodman kwgood...@gmail.com wrote: On Sun, Nov 21, 2010 at 3:16 PM, Wes McKinney wesmck...@gmail.com wrote: What would you say to a single package that contains: - NaN-aware NumPy and SciPy functions (nanmean, nanmin, etc.) I'd say yes. - moving window functions (moving_{count, sum, mean, var, std, etc.}) Yes. BTW, we both do arr=arr.astype(float), I think, before doing the moving statistics. So I speeded things up by running the moving window backwards and writing the result in place. - core subroutines for labeled data Not sure what this would be. Let's discuss. Basically want to produce a indexing vector based on rules-- something to pass to ndarray.take later on. And maybe your generic binary-op function from a while back? - group-by functions Yes. I have some ideas on function signatures. - other things to add to this list? A no-op function with a really long doc string! In other words, basic building computational tools for making libraries like larry, pandas, etc. and doing time series / statistical / other manipulations on real world (messy) data sets. The focus isn't so much NaN-awareness per se but more practical data wrangling. I would be happy to work on such a package and to move all the Cython code I've written into it. There's a little bit of datarray overlap potentially but I think that's OK Maybe we should make a list of function signatures along with brief doc strings to get a feel for what we (and hopefully others) have in mind? I've personally never been much for writing specs, but could be useful. We probably aren't going to get it all right on the first try, so we'll just do our best and refactor the code later if necessary. We might be well-served by collecting exemplary data sets and making a list of things we would like to be able to do easily with that data. But writing stuff like: moving_{funcname}(ndarray data, int window, int axis=0, int min_periods=window) - ndarray group_aggregate(ndarray data, ndarray labels, int axis=0, function agg_function) - ndarray group_transform(...) ... etc. makes sense Where should we continue the discussion? The pystatsmodels mailing list? By now the numpy list probably thinks of NaN as Not ANother email from this guy. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Maybe let's have the next thread on SciPy-user-- I think what we're talking about is general enough to be discussed there. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 7:42 PM, Keith Goodman kwgood...@gmail.com wrote: I should make a benchmark suite. ny.benchit(verbose=False) Nanny performance benchmark Nanny 0.0.1dev Numpy 1.4.1 Speed is numpy time divided by nanny time NaN means all NaNs Speed TestShapedtypeNaN? 6.6770 nansum(a, axis=-1) (500,500)int64 4.6612 nansum(a, axis=-1) (1,) float64 9.0351 nansum(a, axis=-1) (500,500)int32 3.0746 nansum(a, axis=-1) (500,500)float64 11.5740 nansum(a, axis=-1) (1,) int32 6.4484 nansum(a, axis=-1) (1,) int64 51.3917 nansum(a, axis=-1) (500,500)float64 NaN 13.8692 nansum(a, axis=-1) (1,) float64 NaN 6.5327 nanmax(a, axis=-1) (500,500)int64 8.8222 nanmax(a, axis=-1) (1,) float64 0.2059 nanmax(a, axis=-1) (500,500)int32 6.9262 nanmax(a, axis=-1) (500,500)float64 5.0688 nanmax(a, axis=-1) (1,) int32 6.5605 nanmax(a, axis=-1) (1,) int64 48.4850 nanmax(a, axis=-1) (500,500)float64 NaN 14.6289 nanmax(a, axis=-1) (1,) float64 NaN You can also use the makefile to run the benchmark: make bench ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Sat, Nov 20, 2010 at 6:39 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 7:42 PM, Keith Goodman kwgood...@gmail.com wrote: I should make a benchmark suite. ny.benchit(verbose=False) Nanny performance benchmark Nanny 0.0.1dev Numpy 1.4.1 Speed is numpy time divided by nanny time NaN means all NaNs Speed Test Shape dtype NaN? 6.6770 nansum(a, axis=-1) (500,500) int64 4.6612 nansum(a, axis=-1) (1,) float64 9.0351 nansum(a, axis=-1) (500,500) int32 3.0746 nansum(a, axis=-1) (500,500) float64 11.5740 nansum(a, axis=-1) (1,) int32 6.4484 nansum(a, axis=-1) (1,) int64 51.3917 nansum(a, axis=-1) (500,500) float64 NaN 13.8692 nansum(a, axis=-1) (1,) float64 NaN 6.5327 nanmax(a, axis=-1) (500,500) int64 8.8222 nanmax(a, axis=-1) (1,) float64 0.2059 nanmax(a, axis=-1) (500,500) int32 6.9262 nanmax(a, axis=-1) (500,500) float64 5.0688 nanmax(a, axis=-1) (1,) int32 6.5605 nanmax(a, axis=-1) (1,) int64 48.4850 nanmax(a, axis=-1) (500,500) float64 NaN 14.6289 nanmax(a, axis=-1) (1,) float64 NaN You can also use the makefile to run the benchmark: make bench ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Keith (and others), What would you think about creating a library of mostly Cython-based domain specific functions? So stuff like rolling statistical moments, nan* functions like you have here, and all that-- NumPy-array only functions that don't necessarily belong in NumPy or SciPy (but could be included on down the road). You were already talking about this on the statsmodels mailing list for larry. I spent a lot of time writing a bunch of these for pandas over the last couple of years, and I would have relatively few qualms about moving these outside of pandas and introducing a dependency. You could do the same for larry-- then we'd all be relying on the same well-vetted and tested codebase. - Wes ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Sat, Nov 20, 2010 at 6:54 PM, Wes McKinney wesmck...@gmail.com wrote: On Sat, Nov 20, 2010 at 6:39 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 7:42 PM, Keith Goodman kwgood...@gmail.com wrote: I should make a benchmark suite. ny.benchit(verbose=False) Nanny performance benchmark Nanny 0.0.1dev Numpy 1.4.1 Speed is numpy time divided by nanny time NaN means all NaNs Speed Test Shape dtype NaN? 6.6770 nansum(a, axis=-1) (500,500) int64 4.6612 nansum(a, axis=-1) (1,) float64 9.0351 nansum(a, axis=-1) (500,500) int32 3.0746 nansum(a, axis=-1) (500,500) float64 11.5740 nansum(a, axis=-1) (1,) int32 6.4484 nansum(a, axis=-1) (1,) int64 51.3917 nansum(a, axis=-1) (500,500) float64 NaN 13.8692 nansum(a, axis=-1) (1,) float64 NaN 6.5327 nanmax(a, axis=-1) (500,500) int64 8.8222 nanmax(a, axis=-1) (1,) float64 0.2059 nanmax(a, axis=-1) (500,500) int32 6.9262 nanmax(a, axis=-1) (500,500) float64 5.0688 nanmax(a, axis=-1) (1,) int32 6.5605 nanmax(a, axis=-1) (1,) int64 48.4850 nanmax(a, axis=-1) (500,500) float64 NaN 14.6289 nanmax(a, axis=-1) (1,) float64 NaN You can also use the makefile to run the benchmark: make bench ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Keith (and others), What would you think about creating a library of mostly Cython-based domain specific functions? So stuff like rolling statistical moments, nan* functions like you have here, and all that-- NumPy-array only functions that don't necessarily belong in NumPy or SciPy (but could be included on down the road). You were already talking about this on the statsmodels mailing list for larry. I spent a lot of time writing a bunch of these for pandas over the last couple of years, and I would have relatively few qualms about moving these outside of pandas and introducing a dependency. You could do the same for larry-- then we'd all be relying on the same well-vetted and tested codebase. - Wes By the way I wouldn't mind pushing all of my datetime-related code (date range generation, date offsets, etc.) into this new library, too. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney wesmck...@gmail.com wrote: Keith (and others), What would you think about creating a library of mostly Cython-based domain specific functions? So stuff like rolling statistical moments, nan* functions like you have here, and all that-- NumPy-array only functions that don't necessarily belong in NumPy or SciPy (but could be included on down the road). You were already talking about this on the statsmodels mailing list for larry. I spent a lot of time writing a bunch of these for pandas over the last couple of years, and I would have relatively few qualms about moving these outside of pandas and introducing a dependency. You could do the same for larry-- then we'd all be relying on the same well-vetted and tested codebase. I've started working on moving window statistics cython functions. I plan to make it into a package called Roly (for rolling). The signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, window, axis=-1), etc. I think of Nanny and Roly as two separate packages. A narrow focus is good for a new package. But maybe each package could be a subpackage in a super package? Would the function signatures in Nanny (exact duplicates of the corresponding functions in Numpy and Scipy) work for pandas? I plan to use Nanny in larry. I'll try to get the structure of the Nanny package in place. But if it doesn't attract any interest after that then I may fold it into larry. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Sat, Nov 20, 2010 at 4:39 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 7:42 PM, Keith Goodman kwgood...@gmail.com wrote: I should make a benchmark suite. ny.benchit(verbose=False) Nanny performance benchmark Nanny 0.0.1dev Numpy 1.4.1 Speed is numpy time divided by nanny time NaN means all NaNs Speed TestShapedtypeNaN? 6.6770 nansum(a, axis=-1) (500,500)int64 4.6612 nansum(a, axis=-1) (1,) float64 9.0351 nansum(a, axis=-1) (500,500)int32 3.0746 nansum(a, axis=-1) (500,500)float64 11.5740 nansum(a, axis=-1) (1,) int32 6.4484 nansum(a, axis=-1) (1,) int64 51.3917 nansum(a, axis=-1) (500,500)float64 NaN 13.8692 nansum(a, axis=-1) (1,) float64 NaN 6.5327 nanmax(a, axis=-1) (500,500)int64 8.8222 nanmax(a, axis=-1) (1,) float64 0.2059 nanmax(a, axis=-1) (500,500)int32 6.9262 nanmax(a, axis=-1) (500,500)float64 5.0688 nanmax(a, axis=-1) (1,) int32 6.5605 nanmax(a, axis=-1) (1,) int64 48.4850 nanmax(a, axis=-1) (500,500)float64 NaN 14.6289 nanmax(a, axis=-1) (1,) float64 NaN Here's what I get using (my current) np.fmax.reduce in place of nanmax. Speed TestShapedtypeNaN? 3.3717 nansum(a, axis=-1) (500,500)int64 5.1639 nansum(a, axis=-1) (1,) float64 3.8308 nansum(a, axis=-1) (500,500)int32 6.0854 nansum(a, axis=-1) (500,500)float64 8.7821 nansum(a, axis=-1) (1,) int32 1.1716 nansum(a, axis=-1) (1,) int64 5.5777 nansum(a, axis=-1) (500,500)float64 NaN 5.8718 nansum(a, axis=-1) (1,) float64 NaN 0.5419 nanmax(a, axis=-1) (500,500)int64 2.8732 nanmax(a, axis=-1) (1,) float64 0.0301 nanmax(a, axis=-1) (500,500)int32 2.7437 nanmax(a, axis=-1) (500,500)float64 0.7868 nanmax(a, axis=-1) (1,) int32 0.5535 nanmax(a, axis=-1) (1,) int64 2.8715 nanmax(a, axis=-1) (500,500)float64 NaN 2.5937 nanmax(a, axis=-1) (1,) float64 NaN I think the really small int32 ratio is due to timing granularity. For random ints in the range 0..99 the results are not quite as good for fmax, which I find puzzling. Speed TestShapedtypeNaN? 3.4021 nansum(a, axis=-1) (500,500)int64 5.5913 nansum(a, axis=-1) (1,) float64 4.4569 nansum(a, axis=-1) (500,500)int32 6.6202 nansum(a, axis=-1) (500,500)float64 7.1847 nansum(a, axis=-1) (1,) int32 2.0448 nansum(a, axis=-1) (1,) int64 6.0257 nansum(a, axis=-1) (500,500)float64 NaN 6.3172 nansum(a, axis=-1) (1,) float64 NaN 0.9598 nanmax(a, axis=-1) (500,500)int64 3.2407 nanmax(a, axis=-1) (1,) float64 0.0520 nanmax(a, axis=-1) (500,500)int32 3.1954 nanmax(a, axis=-1) (500,500)float64 1.5538 nanmax(a, axis=-1) (1,) int32 0.3716 nanmax(a, axis=-1) (1,) int64 3.2372 nanmax(a, axis=-1) (500,500)float64 NaN 2.5633 nanmax(a, axis=-1) (1,) float64 NaN Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] [ANN] Nanny, faster NaN functions
= Nanny = Nanny uses the magic of Cython to give you a faster, drop-in replacement for the NaN functions in NumPy and SciPy. For example:: import nanny as ny import numpy as np arr = np.random.rand(100, 100) timeit np.nansum(arr) 1 loops, best of 3: 67.5 us per loop timeit ny.nansum(arr) 10 loops, best of 3: 18.2 us per loop Let's not forget to add some NaNs:: arr[arr 0.5] = np.nan timeit np.nansum(arr) 1000 loops, best of 3: 411 us per loop timeit ny.nansum(arr) 1 loops, best of 3: 65 us per loop Nanny uses a separate Cython function for each combination of ndim, dtype, and axis. You can get rid of a lot of overhead (useful in an inner loop, e.g.) by directly importing the function that matches your problem:: arr = np.random.rand(10, 10) from nansum import nansum_2d_float64_axis1 timeit np.nansum(arr, axis=1) 1 loops, best of 3: 25.5 us per loop timeit ny.nansum(arr, axis=1) 10 loops, best of 3: 5.15 us per loop timeit nansum_2d_float64_axis1(arr) 100 loops, best of 3: 1.75 us per loop I put together Nanny as a way to learn Cython. It currently only supports: - functions: nansum - Operating systems: 64-bit (accumulator for int32 is hard coded to int64) - dtype: int32, int64, float64 - ndim: 1, 2, and 3 If there is interest in the project, I could continue adding the remaining NaN functions from NumPy and SciPy: nanmin, nanmax, nanmean, nanmedian (using a partial sort), nanstd. But why stop there? How about nancumsum or nanprod? Or anynan, which could short-circuit once a NaN is found? Feedback on the code or the direction of the project are welcomed. So is coding help---without that I doubt the package will ever be completed. Once nansum is complete, many of the remaining functions will be copy, paste, touch up operations. Remember, Nanny quickly protects your precious data from the corrupting influence of Mr. Nan. License === Nanny is distributed under a Simplified BSD license. Parts of NumPy, which has a BSD licenses, are included in Nanny. See the LICENSE file, which is distributed with Nanny, for details. Installation You can grab Nanny at http://github.com/kwgoodman/nanny. nansum of ints is only supported by 64-bit operating systems at the moment. **GNU/Linux, Mac OS X, et al.** To install Nanny:: $ python setup.py build $ sudo python setup.py install Or, if you wish to specify where Nanny is installed, for example inside ``/usr/local``:: $ python setup.py build $ sudo python setup.py install --prefix=/usr/local **Windows** In order to compile the C code in Nanny you need a Windows version of the gcc compiler. MinGW (Minimalist GNU for Windows) contains gcc and has been used to successfully compile Nanny on Windows. Install MinGW and add it to your system path. Then install Nanny with the commands:: python setup.py build --compiler=mingw32 python setup.py install **Post install** After you have installed Nanny, run the suite of unit tests:: import nanny nanny.test() snip Ran 1 tests in 0.008s OK nose.result.TextTestResult run=1 errors=0 failures=0 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 10:33 AM, Keith Goodman kwgood...@gmail.com wrote: Nanny uses the magic of Cython to give you a faster, drop-in replacement for the NaN functions in NumPy and SciPy. Neat! Why not make this a patch to numpy/scipy instead? Nanny uses a separate Cython function for each combination of ndim, dtype, and axis. You can get rid of a lot of overhead (useful in an inner loop, e.g.) by directly importing the function that matches your problem:: arr = np.random.rand(10, 10) from nansum import nansum_2d_float64_axis1 If this is really useful, then better to provide a function that finds the correct function for you? best_nansum = ny.get_best_nansum(ary[0, :, :], axis=1) for i in xrange(ary.shape[0]): best_nansum(ary[i, :, :], axis=1) - functions: nansum - Operating systems: 64-bit (accumulator for int32 is hard coded to int64) - dtype: int32, int64, float64 - ndim: 1, 2, and 3 What does it even mean to do NaN operations on integers? (I'd sometimes find it *really convenient* if there were a NaN value for standard computer integers... but there isn't?) -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 12:55 PM, Nathaniel Smith n...@pobox.com wrote: On Fri, Nov 19, 2010 at 10:33 AM, Keith Goodman kwgood...@gmail.com wrote: Nanny uses the magic of Cython to give you a faster, drop-in replacement for the NaN functions in NumPy and SciPy. Neat! Why not make this a patch to numpy/scipy instead? Nanny uses a separate Cython function for each combination of ndim, dtype, and axis. You can get rid of a lot of overhead (useful in an inner loop, e.g.) by directly importing the function that matches your problem:: arr = np.random.rand(10, 10) from nansum import nansum_2d_float64_axis1 If this is really useful, then better to provide a function that finds the correct function for you? best_nansum = ny.get_best_nansum(ary[0, :, :], axis=1) for i in xrange(ary.shape[0]): best_nansum(ary[i, :, :], axis=1) - functions: nansum - Operating systems: 64-bit (accumulator for int32 is hard coded to int64) - dtype: int32, int64, float64 - ndim: 1, 2, and 3 What does it even mean to do NaN operations on integers? (I'd sometimes find it *really convenient* if there were a NaN value for standard computer integers... but there isn't?) -- Nathaniel That's why I use masked arrays. It is dtype agnostic. I am curious if there are any lessons that were learned in making Nanny that could be applied to the masked array functions? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 10:55 AM, Nathaniel Smith n...@pobox.com wrote: On Fri, Nov 19, 2010 at 10:33 AM, Keith Goodman kwgood...@gmail.com wrote: Nanny uses the magic of Cython to give you a faster, drop-in replacement for the NaN functions in NumPy and SciPy. Neat! Why not make this a patch to numpy/scipy instead? My guess is that having separate underlying functions for each dtype, ndim, and axis would be a nightmare for a large project like Numpy. But manageable for a focused project like nanny. Nanny uses a separate Cython function for each combination of ndim, dtype, and axis. You can get rid of a lot of overhead (useful in an inner loop, e.g.) by directly importing the function that matches your problem:: arr = np.random.rand(10, 10) from nansum import nansum_2d_float64_axis1 If this is really useful, then better to provide a function that finds the correct function for you? best_nansum = ny.get_best_nansum(ary[0, :, :], axis=1) for i in xrange(ary.shape[0]): best_nansum(ary[i, :, :], axis=1) That would be useful. It is what nanny.nansum does but it returns the sum instead of the function. - functions: nansum - Operating systems: 64-bit (accumulator for int32 is hard coded to int64) - dtype: int32, int64, float64 - ndim: 1, 2, and 3 What does it even mean to do NaN operations on integers? (I'd sometimes find it *really convenient* if there were a NaN value for standard computer integers... but there isn't?) Well, sometimes you write functions without knowing the dtype of the input. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 2:35 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 11:12 AM, Benjamin Root ben.r...@ou.edu wrote: That's why I use masked arrays. It is dtype agnostic. I am curious if there are any lessons that were learned in making Nanny that could be applied to the masked array functions? I suppose you could write a cython function that operates on masked arrays. But other than that, I can't think of any lessons. All I can think about is speed: x = np.ma.array([[1, 2], [3, 4]], mask=[[0, 1], [1, 0]]) timeit np.sum(x) 1 loops, best of 3: 25.1 us per loop a = np.array([[1, np.nan], [np.nan, 4]]) timeit ny.nansum(a) 10 loops, best of 3: 3.11 us per loop from nansum import nansum_2d_float64_axisNone timeit nansum_2d_float64_axisNone(a) 100 loops, best of 3: 395 ns per loop What's the speed advantage of nanny compared to np.nansum that you have if the arrays are larger, say (1000,10) or (1,100) axis=0 ? Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 12:10 PM, josef.p...@gmail.com wrote: What's the speed advantage of nanny compared to np.nansum that you have if the arrays are larger, say (1000,10) or (1,100) axis=0 ? Good point. In the small examples I showed so far maybe the speed up was all in overhead. Fortunately, that's not the case: arr = np.random.rand(1000, 1000) timeit np.nansum(arr) 100 loops, best of 3: 4.79 ms per loop timeit ny.nansum(arr) 1000 loops, best of 3: 1.53 ms per loop arr[arr 0.5] = np.nan timeit np.nansum(arr) 10 loops, best of 3: 44.5 ms per loop timeit ny.nansum(arr) 100 loops, best of 3: 6.18 ms per loop timeit np.nansum(arr, axis=0) 10 loops, best of 3: 52.3 ms per loop timeit ny.nansum(arr, axis=0) 100 loops, best of 3: 12.2 ms per loop np.nansum makes a copy of the input array and makes a mask (another copy) and then uses the mask to set the NaNs to zero in the copy. So not only is nanny faster, but it uses less memory. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen p...@iki.fi wrote: Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: [clip] My guess is that having separate underlying functions for each dtype, ndim, and axis would be a nightmare for a large project like Numpy. But manageable for a focused project like nanny. Might be easier to migrate the nan* functions to using Ufuncs. Unless I'm missing something, np.nanmax - np.fmax.reduce np.nanmin - np.fmin.reduce For `nansum`, we'd need to add an ufunc `nanadd`, and for `nanargmax/min`, we'd need `argfmin/fmax'. How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd, please. arr = np.random.rand(1000, 1000) arr[arr 0.5] = np.nan np.nanmax(arr) 0.4625409581072 np.fmax.reduce(arr, axis=None) snip TypeError: an integer is required np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 0.4625409581072 timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 100 loops, best of 3: 12.7 ms per loop timeit np.nanmax(arr) 10 loops, best of 3: 39.6 ms per loop timeit np.nanmax(arr, axis=0) 10 loops, best of 3: 46.5 ms per loop timeit np.fmax.reduce(arr, axis=0) 100 loops, best of 3: 12.7 ms per loop ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen p...@iki.fi wrote: Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: [clip] My guess is that having separate underlying functions for each dtype, ndim, and axis would be a nightmare for a large project like Numpy. But manageable for a focused project like nanny. Might be easier to migrate the nan* functions to using Ufuncs. Unless I'm missing something, np.nanmax - np.fmax.reduce np.nanmin - np.fmin.reduce For `nansum`, we'd need to add an ufunc `nanadd`, and for `nanargmax/min`, we'd need `argfmin/fmax'. How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd, please. arr = np.random.rand(1000, 1000) arr[arr 0.5] = np.nan np.nanmax(arr) 0.4625409581072 np.fmax.reduce(arr, axis=None) snip TypeError: an integer is required np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 0.4625409581072 timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 100 loops, best of 3: 12.7 ms per loop timeit np.nanmax(arr) 10 loops, best of 3: 39.6 ms per loop timeit np.nanmax(arr, axis=0) 10 loops, best of 3: 46.5 ms per loop timeit np.fmax.reduce(arr, axis=0) 100 loops, best of 3: 12.7 ms per loop Cython is faster than np.fmax.reduce. I wrote a cython version of np.nanmax, called nanmax below. (It only handles the 2d, float64, axis=None case, but since the array is large I don't think that explains the time difference). Note that fmax.reduce is slower than np.nanmax when there are no NaNs: arr = np.random.rand(1000, 1000) timeit np.nanmax(arr) 100 loops, best of 3: 5.82 ms per loop timeit np.fmax.reduce(np.fmax.reduce(arr)) 100 loops, best of 3: 9.14 ms per loop timeit nanmax(arr) 1000 loops, best of 3: 1.17 ms per loop arr[arr 0.5] = np.nan timeit np.nanmax(arr) 10 loops, best of 3: 45.5 ms per loop timeit np.fmax.reduce(np.fmax.reduce(arr)) 100 loops, best of 3: 12.7 ms per loop timeit nanmax(arr) 1000 loops, best of 3: 1.17 ms per loop ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On 11/19/10 11:19 AM, Keith Goodman wrote: On Fri, Nov 19, 2010 at 10:55 AM, Nathaniel Smithn...@pobox.com wrote: Why not make this a patch to numpy/scipy instead? My guess is that having separate underlying functions for each dtype, ndim, and axis would be a nightmare for a large project like Numpy. True, but: 1) Having special-cases for the most common cases is not such a bad idea. 2) could one use some sort of templating approach to get all the dtypes and such that you want? 3) as for number of dimensions, I don't think to would be to hard to generalize that -- at least for contiguous arrays. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 3:18 PM, Christopher Barker chris.bar...@noaa.govwrote: On 11/19/10 11:19 AM, Keith Goodman wrote: On Fri, Nov 19, 2010 at 10:55 AM, Nathaniel Smithn...@pobox.com wrote: Why not make this a patch to numpy/scipy instead? My guess is that having separate underlying functions for each dtype, ndim, and axis would be a nightmare for a large project like Numpy. True, but: 1) Having special-cases for the most common cases is not such a bad idea. 2) could one use some sort of templating approach to get all the dtypes and such that you want? 3) as for number of dimensions, I don't think to would be to hard to generalize that -- at least for contiguous arrays. Note that the fmax/fmin versions can be sped up in the same way as sum.reduce was. Also, you should pass the flattened array to the routine for the axis=None case. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 1:50 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen p...@iki.fi wrote: Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: [clip] My guess is that having separate underlying functions for each dtype, ndim, and axis would be a nightmare for a large project like Numpy. But manageable for a focused project like nanny. Might be easier to migrate the nan* functions to using Ufuncs. Unless I'm missing something, np.nanmax - np.fmax.reduce np.nanmin - np.fmin.reduce For `nansum`, we'd need to add an ufunc `nanadd`, and for `nanargmax/min`, we'd need `argfmin/fmax'. How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd, please. arr = np.random.rand(1000, 1000) arr[arr 0.5] = np.nan np.nanmax(arr) 0.4625409581072 np.fmax.reduce(arr, axis=None) snip TypeError: an integer is required np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 0.4625409581072 timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 100 loops, best of 3: 12.7 ms per loop timeit np.nanmax(arr) 10 loops, best of 3: 39.6 ms per loop timeit np.nanmax(arr, axis=0) 10 loops, best of 3: 46.5 ms per loop timeit np.fmax.reduce(arr, axis=0) 100 loops, best of 3: 12.7 ms per loop Cython is faster than np.fmax.reduce. I wrote a cython version of np.nanmax, called nanmax below. (It only handles the 2d, float64, axis=None case, but since the array is large I don't think that explains the time difference). Note that fmax.reduce is slower than np.nanmax when there are no NaNs: arr = np.random.rand(1000, 1000) timeit np.nanmax(arr) 100 loops, best of 3: 5.82 ms per loop timeit np.fmax.reduce(np.fmax.reduce(arr)) 100 loops, best of 3: 9.14 ms per loop timeit nanmax(arr) 1000 loops, best of 3: 1.17 ms per loop arr[arr 0.5] = np.nan timeit np.nanmax(arr) 10 loops, best of 3: 45.5 ms per loop timeit np.fmax.reduce(np.fmax.reduce(arr)) 100 loops, best of 3: 12.7 ms per loop timeit nanmax(arr) 1000 loops, best of 3: 1.17 ms per loop There seem to be some odd hardware/compiler dependencies. I get quite a different pattern of times: In [1]: arr = np.random.rand(1000, 1000) In [2]: timeit np.nanmax(arr) 100 loops, best of 3: 10.4 ms per loop In [3]: timeit np.fmax.reduce(arr.flat) 100 loops, best of 3: 2.09 ms per loop In [4]: arr[arr 0.5] = np.nan In [5]: timeit np.nanmax(arr) 100 loops, best of 3: 12.9 ms per loop In [6]: timeit np.fmax.reduce(arr.flat) 100 loops, best of 3: 7.09 ms per loop I've tweaked fmax with the reduce loop option but the nanmax times don't look like yours at all. I'm also a bit surprised that you don't see any difference in times when the array contains a lot of nans. I'm running on AMD Phenom, gcc 4.4.5. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 8:19 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Nov 19, 2010 at 1:50 PM, Keith Goodman kwgood...@gmail.comwrote: On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen p...@iki.fi wrote: Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: [clip] My guess is that having separate underlying functions for each dtype, ndim, and axis would be a nightmare for a large project like Numpy. But manageable for a focused project like nanny. Might be easier to migrate the nan* functions to using Ufuncs. Unless I'm missing something, np.nanmax - np.fmax.reduce np.nanmin - np.fmin.reduce For `nansum`, we'd need to add an ufunc `nanadd`, and for `nanargmax/min`, we'd need `argfmin/fmax'. How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd, please. arr = np.random.rand(1000, 1000) arr[arr 0.5] = np.nan np.nanmax(arr) 0.4625409581072 np.fmax.reduce(arr, axis=None) snip TypeError: an integer is required np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 0.4625409581072 timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 100 loops, best of 3: 12.7 ms per loop timeit np.nanmax(arr) 10 loops, best of 3: 39.6 ms per loop timeit np.nanmax(arr, axis=0) 10 loops, best of 3: 46.5 ms per loop timeit np.fmax.reduce(arr, axis=0) 100 loops, best of 3: 12.7 ms per loop Cython is faster than np.fmax.reduce. I wrote a cython version of np.nanmax, called nanmax below. (It only handles the 2d, float64, axis=None case, but since the array is large I don't think that explains the time difference). Note that fmax.reduce is slower than np.nanmax when there are no NaNs: arr = np.random.rand(1000, 1000) timeit np.nanmax(arr) 100 loops, best of 3: 5.82 ms per loop timeit np.fmax.reduce(np.fmax.reduce(arr)) 100 loops, best of 3: 9.14 ms per loop timeit nanmax(arr) 1000 loops, best of 3: 1.17 ms per loop arr[arr 0.5] = np.nan timeit np.nanmax(arr) 10 loops, best of 3: 45.5 ms per loop timeit np.fmax.reduce(np.fmax.reduce(arr)) 100 loops, best of 3: 12.7 ms per loop timeit nanmax(arr) 1000 loops, best of 3: 1.17 ms per loop There seem to be some odd hardware/compiler dependencies. I get quite a different pattern of times: In [1]: arr = np.random.rand(1000, 1000) In [2]: timeit np.nanmax(arr) 100 loops, best of 3: 10.4 ms per loop In [3]: timeit np.fmax.reduce(arr.flat) 100 loops, best of 3: 2.09 ms per loop In [4]: arr[arr 0.5] = np.nan In [5]: timeit np.nanmax(arr) 100 loops, best of 3: 12.9 ms per loop In [6]: timeit np.fmax.reduce(arr.flat) 100 loops, best of 3: 7.09 ms per loop I've tweaked fmax with the reduce loop option but the nanmax times don't look like yours at all. I'm also a bit surprised that you don't see any difference in times when the array contains a lot of nans. I'm running on AMD Phenom, gcc 4.4.5. However, I noticed that the build wants to be -O1 by default. I have my own CFLAGS that make it -O2, but It looks like ubuntu's python might be built with -O1. Hmm. That could certainly cause some odd timings. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 7:19 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Nov 19, 2010 at 1:50 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen p...@iki.fi wrote: Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: [clip] My guess is that having separate underlying functions for each dtype, ndim, and axis would be a nightmare for a large project like Numpy. But manageable for a focused project like nanny. Might be easier to migrate the nan* functions to using Ufuncs. Unless I'm missing something, np.nanmax - np.fmax.reduce np.nanmin - np.fmin.reduce For `nansum`, we'd need to add an ufunc `nanadd`, and for `nanargmax/min`, we'd need `argfmin/fmax'. How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd, please. arr = np.random.rand(1000, 1000) arr[arr 0.5] = np.nan np.nanmax(arr) 0.4625409581072 np.fmax.reduce(arr, axis=None) snip TypeError: an integer is required np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 0.4625409581072 timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 100 loops, best of 3: 12.7 ms per loop timeit np.nanmax(arr) 10 loops, best of 3: 39.6 ms per loop timeit np.nanmax(arr, axis=0) 10 loops, best of 3: 46.5 ms per loop timeit np.fmax.reduce(arr, axis=0) 100 loops, best of 3: 12.7 ms per loop Cython is faster than np.fmax.reduce. I wrote a cython version of np.nanmax, called nanmax below. (It only handles the 2d, float64, axis=None case, but since the array is large I don't think that explains the time difference). Note that fmax.reduce is slower than np.nanmax when there are no NaNs: arr = np.random.rand(1000, 1000) timeit np.nanmax(arr) 100 loops, best of 3: 5.82 ms per loop timeit np.fmax.reduce(np.fmax.reduce(arr)) 100 loops, best of 3: 9.14 ms per loop timeit nanmax(arr) 1000 loops, best of 3: 1.17 ms per loop arr[arr 0.5] = np.nan timeit np.nanmax(arr) 10 loops, best of 3: 45.5 ms per loop timeit np.fmax.reduce(np.fmax.reduce(arr)) 100 loops, best of 3: 12.7 ms per loop timeit nanmax(arr) 1000 loops, best of 3: 1.17 ms per loop There seem to be some odd hardware/compiler dependencies. I get quite a different pattern of times: In [1]: arr = np.random.rand(1000, 1000) In [2]: timeit np.nanmax(arr) 100 loops, best of 3: 10.4 ms per loop In [3]: timeit np.fmax.reduce(arr.flat) 100 loops, best of 3: 2.09 ms per loop In [4]: arr[arr 0.5] = np.nan In [5]: timeit np.nanmax(arr) 100 loops, best of 3: 12.9 ms per loop In [6]: timeit np.fmax.reduce(arr.flat) 100 loops, best of 3: 7.09 ms per loop I've tweaked fmax with the reduce loop option but the nanmax times don't look like yours at all. I'm also a bit surprised that you don't see any difference in times when the array contains a lot of nans. I'm running on AMD Phenom, gcc 4.4.5. Ubuntu 10.04 64 bit, numpy 1.4.1. Difference in which times? nanny.nanmax with and wintout NaNs? The code doesn't explictily check for NaNs (it does check for all NaNs). It basically loops through the data and does: allnan = 1 ai = ai[i,k] if ai amax: amax = ai allnan = 0 I should make a benchmark suite. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 10:42 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 7:19 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Nov 19, 2010 at 1:50 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen p...@iki.fi wrote: Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: [clip] My guess is that having separate underlying functions for each dtype, ndim, and axis would be a nightmare for a large project like Numpy. But manageable for a focused project like nanny. Might be easier to migrate the nan* functions to using Ufuncs. Unless I'm missing something, np.nanmax - np.fmax.reduce np.nanmin - np.fmin.reduce For `nansum`, we'd need to add an ufunc `nanadd`, and for `nanargmax/min`, we'd need `argfmin/fmax'. How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd, please. arr = np.random.rand(1000, 1000) arr[arr 0.5] = np.nan np.nanmax(arr) 0.4625409581072 np.fmax.reduce(arr, axis=None) snip TypeError: an integer is required np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 0.4625409581072 timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 100 loops, best of 3: 12.7 ms per loop timeit np.nanmax(arr) 10 loops, best of 3: 39.6 ms per loop timeit np.nanmax(arr, axis=0) 10 loops, best of 3: 46.5 ms per loop timeit np.fmax.reduce(arr, axis=0) 100 loops, best of 3: 12.7 ms per loop Cython is faster than np.fmax.reduce. I wrote a cython version of np.nanmax, called nanmax below. (It only handles the 2d, float64, axis=None case, but since the array is large I don't think that explains the time difference). Note that fmax.reduce is slower than np.nanmax when there are no NaNs: arr = np.random.rand(1000, 1000) timeit np.nanmax(arr) 100 loops, best of 3: 5.82 ms per loop timeit np.fmax.reduce(np.fmax.reduce(arr)) 100 loops, best of 3: 9.14 ms per loop timeit nanmax(arr) 1000 loops, best of 3: 1.17 ms per loop arr[arr 0.5] = np.nan timeit np.nanmax(arr) 10 loops, best of 3: 45.5 ms per loop timeit np.fmax.reduce(np.fmax.reduce(arr)) 100 loops, best of 3: 12.7 ms per loop timeit nanmax(arr) 1000 loops, best of 3: 1.17 ms per loop There seem to be some odd hardware/compiler dependencies. I get quite a different pattern of times: In [1]: arr = np.random.rand(1000, 1000) In [2]: timeit np.nanmax(arr) 100 loops, best of 3: 10.4 ms per loop In [3]: timeit np.fmax.reduce(arr.flat) 100 loops, best of 3: 2.09 ms per loop In [4]: arr[arr 0.5] = np.nan In [5]: timeit np.nanmax(arr) 100 loops, best of 3: 12.9 ms per loop In [6]: timeit np.fmax.reduce(arr.flat) 100 loops, best of 3: 7.09 ms per loop I've tweaked fmax with the reduce loop option but the nanmax times don't look like yours at all. I'm also a bit surprised that you don't see any difference in times when the array contains a lot of nans. I'm running on AMD Phenom, gcc 4.4.5. Ubuntu 10.04 64 bit, numpy 1.4.1. Difference in which times? nanny.nanmax with and wintout NaNs? The code doesn't explictily check for NaNs (it does check for all NaNs). It basically loops through the data and does: allnan = 1 ai = ai[i,k] if ai amax: amax = ai allnan = 0 does this give you the correct answer? 1np.nan False What's the starting value for amax? -inf? Josef I should make a benchmark suite. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 7:51 PM, josef.p...@gmail.com wrote: On Fri, Nov 19, 2010 at 10:42 PM, Keith Goodman kwgood...@gmail.com wrote: It basically loops through the data and does: allnan = 1 ai = ai[i,k] if ai amax: amax = ai allnan = 0 does this give you the correct answer? 1np.nan False Yes -- notice he does the comparison the other way, and 1 np.nan False (All comparisons involving NaN return false, including, famously, NaN == NaN, which is why we need np.isnan.) -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 7:51 PM, josef.p...@gmail.com wrote: does this give you the correct answer? 1np.nan False What's the starting value for amax? -inf? Because 1 np.nan is False, the current running max does not get updated, which is what we want. import nanny as ny np.nanmax([1, np.nan]) 1.0 np.nanmax([np.nan, 1]) 1.0 np.nanmax([np.nan, 1, np.nan]) 1.0 Starting value is -np.inf for floats and stuff like this for ints: cdef np.int32_t MININT32 = np.iinfo(np.int32).min cdef np.int64_t MININT64 = np.iinfo(np.int64).min Numpy does this: np.nanmax([]) snip ValueError: zero-size array to ufunc.reduce without identity Nanny does this: ny.nanmax([]) nan So I haven't taken care of that corner case yet. I'll commit nanmax to github in case anyone wants to give it a try. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 8:42 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 7:19 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Nov 19, 2010 at 1:50 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen p...@iki.fi wrote: Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote: [clip] My guess is that having separate underlying functions for each dtype, ndim, and axis would be a nightmare for a large project like Numpy. But manageable for a focused project like nanny. Might be easier to migrate the nan* functions to using Ufuncs. Unless I'm missing something, np.nanmax - np.fmax.reduce np.nanmin - np.fmin.reduce For `nansum`, we'd need to add an ufunc `nanadd`, and for `nanargmax/min`, we'd need `argfmin/fmax'. How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd, please. arr = np.random.rand(1000, 1000) arr[arr 0.5] = np.nan np.nanmax(arr) 0.4625409581072 np.fmax.reduce(arr, axis=None) snip TypeError: an integer is required np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 0.4625409581072 timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0) 100 loops, best of 3: 12.7 ms per loop timeit np.nanmax(arr) 10 loops, best of 3: 39.6 ms per loop timeit np.nanmax(arr, axis=0) 10 loops, best of 3: 46.5 ms per loop timeit np.fmax.reduce(arr, axis=0) 100 loops, best of 3: 12.7 ms per loop Cython is faster than np.fmax.reduce. I wrote a cython version of np.nanmax, called nanmax below. (It only handles the 2d, float64, axis=None case, but since the array is large I don't think that explains the time difference). Note that fmax.reduce is slower than np.nanmax when there are no NaNs: arr = np.random.rand(1000, 1000) timeit np.nanmax(arr) 100 loops, best of 3: 5.82 ms per loop timeit np.fmax.reduce(np.fmax.reduce(arr)) 100 loops, best of 3: 9.14 ms per loop timeit nanmax(arr) 1000 loops, best of 3: 1.17 ms per loop arr[arr 0.5] = np.nan timeit np.nanmax(arr) 10 loops, best of 3: 45.5 ms per loop timeit np.fmax.reduce(np.fmax.reduce(arr)) 100 loops, best of 3: 12.7 ms per loop timeit nanmax(arr) 1000 loops, best of 3: 1.17 ms per loop There seem to be some odd hardware/compiler dependencies. I get quite a different pattern of times: In [1]: arr = np.random.rand(1000, 1000) In [2]: timeit np.nanmax(arr) 100 loops, best of 3: 10.4 ms per loop In [3]: timeit np.fmax.reduce(arr.flat) 100 loops, best of 3: 2.09 ms per loop In [4]: arr[arr 0.5] = np.nan In [5]: timeit np.nanmax(arr) 100 loops, best of 3: 12.9 ms per loop In [6]: timeit np.fmax.reduce(arr.flat) 100 loops, best of 3: 7.09 ms per loop I've tweaked fmax with the reduce loop option but the nanmax times don't look like yours at all. I'm also a bit surprised that you don't see any difference in times when the array contains a lot of nans. I'm running on AMD Phenom, gcc 4.4.5. Ubuntu 10.04 64 bit, numpy 1.4.1. Difference in which times? nanny.nanmax with and wintout NaNs? The code doesn't explictily check for NaNs (it does check for all NaNs). It basically loops through the data and does: allnan = 1 ai = ai[i,k] if ai amax: amax = ai allnan = 0 I should make a benchmark suite. _ This doesn't look right: @cython.boundscheck(False) @cython.wraparound(False) def nanmax_2d_float64_axisNone(np.ndarray[np.float64_t, ndim=2] a): nanmax of 2d numpy array with dtype=np.float64 along axis=None. cdef Py_ssize_t i, j cdef int arow = a.shape[0], acol = a.shape[1], allnan = 1 cdef np.float64_t amax = 0, aij for i in range(arow): for j in range(acol): aij = a[i,j] if aij == aij: amax += aij allnan = 0 if allnan == 0: return np.float64(amax) else: return NAN It's doing a sum, not a comparison. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 8:05 PM, Charles R Harris charlesr.har...@gmail.com wrote: This doesn't look right: @cython.boundscheck(False) @cython.wraparound(False) def nanmax_2d_float64_axisNone(np.ndarray[np.float64_t, ndim=2] a): nanmax of 2d numpy array with dtype=np.float64 along axis=None. cdef Py_ssize_t i, j cdef int arow = a.shape[0], acol = a.shape[1], allnan = 1 cdef np.float64_t amax = 0, aij for i in range(arow): for j in range(acol): aij = a[i,j] if aij == aij: amax += aij allnan = 0 if allnan == 0: return np.float64(amax) else: return NAN It's doing a sum, not a comparison. That was a placeholder. Looks at the latest commit. Sorry for the confusion. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 10:59 PM, Keith Goodman kwgood...@gmail.com wrote: On Fri, Nov 19, 2010 at 7:51 PM, josef.p...@gmail.com wrote: does this give you the correct answer? 1np.nan False What's the starting value for amax? -inf? Because 1 np.nan is False, the current running max does not get updated, which is what we want. import nanny as ny np.nanmax([1, np.nan]) 1.0 np.nanmax([np.nan, 1]) 1.0 np.nanmax([np.nan, 1, np.nan]) 1.0 Starting value is -np.inf for floats and stuff like this for ints: cdef np.int32_t MININT32 = np.iinfo(np.int32).min cdef np.int64_t MININT64 = np.iinfo(np.int64).min That's what I thought halfway through typing the question. -np.inf-np.inf False If the only value is -np.inf, you will return nan, I guess. np.nanmax([-np.inf, np.nan]) -inf Josef (being picky) Numpy does this: np.nanmax([]) snip ValueError: zero-size array to ufunc.reduce without identity Nanny does this: ny.nanmax([]) nan So I haven't taken care of that corner case yet. I'll commit nanmax to github in case anyone wants to give it a try. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [ANN] Nanny, faster NaN functions
On Fri, Nov 19, 2010 at 8:33 PM, josef.p...@gmail.com wrote: -np.inf-np.inf False If the only value is -np.inf, you will return nan, I guess. np.nanmax([-np.inf, np.nan]) -inf That's a great corner case. Thanks, Josef. This looks like it would fix it: change if ai amax: amax = ai to if ai = amax: amax = ai ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion