[Numpy-discussion] un-silencing Numpy's deprecation warnings
So starting in Python 2.7 and 3.2, the Python developers have made DeprecationWarnings invisible by default: http://docs.python.org/whatsnew/2.7.html#the-future-for-python-2-x http://mail.python.org/pipermail/stdlib-sig/2009-November/000789.html http://bugs.python.org/issue7319 The only way to see them is to explicitly request them by running Python with -Wd. The logic seems to be that between the end-of-development for 2.7 and the moratorium on 3.2 changes, there were a *lot* of added deprecations that were annoying people, and deprecations in the Python stdlib mean this code is probably sub-optimal but it will still continue to work indefinitely. So they consider that deprecation warnings are like a lint tool for conscientious developers who remember to test their code with -Wd, but not something to bother users with. In Numpy, the majority of our users are actually (relatively unsophisticated) developers, and we don't plan to support deprecated features indefinitely. Our deprecations seem to better match what Python calls a FutureWarning: warnings about constructs that will change semantically in the future. http://docs.python.org/library/warnings.html#warning-categories FutureWarning is displayed by default, and available in all versions of Python. So maybe we should change all our DeprecationWarnings into FutureWarnings (or at least the ones that we actually plan to follow through on). Thoughts? - N ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Separating out the maskna code
On Tue, May 22, 2012 at 5:34 AM, Travis Oliphant tra...@continuum.io wrote: Just to be clear. Are we waiting for the conclusion of the PyArray_Diagonal PR before proceeding with this one? We can talk about this one and everyone's welcome to look at the patch, of course. (In fact it'd be useful if anyone catches any issues now, so I can roll them into the final rebase.) But I'll rebase it again after the PyArray_diagonal thing has been sorted to sort out conflicts, and also fix some docs that I missed, so I don't want to create an actual PR yet. -- Nathaniel On May 20, 2012, at 1:06 PM, Nathaniel Smith wrote: On Sun, May 20, 2012 at 6:59 PM, Nathaniel Smith n...@pobox.com wrote: I have not reviewed it in detail, but in general I would be very supportive of your plan to commit this to master, make a 1.7 release (without the ReduceWrapper) function and then work on the masked array / ndarray separation plan for 1.8 Of course, first I would want to hear from Mark, to hear his comments about what was removed. Definitely. I'm pretty sure I didn't accidentally sweep up anything else in my net besides what it says in the commit messages (simply because it's hard to do that when all you're doing is grepping for HASMASKNA and friends), but he knows this code better than I do. Also on that note, if someone can merge the PyArray_Diagonal PR then I can sort out the conflicts and then make a PR for this, to make review easier... - N ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] un-silencing Numpy's deprecation warnings
On Tue, May 22, 2012 at 9:27 AM, Nathaniel Smith n...@pobox.com wrote: So starting in Python 2.7 and 3.2, the Python developers have made DeprecationWarnings invisible by default: http://docs.python.org/whatsnew/2.7.html#the-future-for-python-2-x http://mail.python.org/pipermail/stdlib-sig/2009-November/000789.html http://bugs.python.org/issue7319 The only way to see them is to explicitly request them by running Python with -Wd. The logic seems to be that between the end-of-development for 2.7 and the moratorium on 3.2 changes, there were a *lot* of added deprecations that were annoying people, and deprecations in the Python stdlib mean this code is probably sub-optimal but it will still continue to work indefinitely. So they consider that deprecation warnings are like a lint tool for conscientious developers who remember to test their code with -Wd, but not something to bother users with. In Numpy, the majority of our users are actually (relatively unsophisticated) developers, and we don't plan to support deprecated features indefinitely. Our deprecations seem to better match what Python calls a FutureWarning: warnings about constructs that will change semantically in the future. http://docs.python.org/library/warnings.html#warning-categories FutureWarning is displayed by default, and available in all versions of Python. So maybe we should change all our DeprecationWarnings into FutureWarnings (or at least the ones that we actually plan to follow through on). Thoughts? - N We had the same discussion for Biopython two years ago, and introduced our own warning class to avoid our deprecations being silent (and thus almost pointless). It is just a subclass of Warning (originally we used a subclass of UserWarning). ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] un-silencing Numpy's deprecation warnings
On Tue, May 22, 2012 at 9:27 AM, Nathaniel Smith n...@pobox.com wrote: So starting in Python 2.7 and 3.2, the Python developers have made DeprecationWarnings invisible by default: http://docs.python.org/whatsnew/2.7.html#the-future-for-python-2-x http://mail.python.org/pipermail/stdlib-sig/2009-November/000789.html http://bugs.python.org/issue7319 The only way to see them is to explicitly request them by running Python with -Wd. The logic seems to be that between the end-of-development for 2.7 and the moratorium on 3.2 changes, there were a *lot* of added deprecations that were annoying people, and deprecations in the Python stdlib mean this code is probably sub-optimal but it will still continue to work indefinitely. That's not quite it, I think, since this change was also made in Python 3.2 and will remain for all future versions of Python. DeprecationWarning *is* used for things that will definitely be going away, not just things that are no longer recommended but will continue to live. Note that the 3.2 moratorium was for changes to the language proper. The point was to encourage stdlib development, including the removal of deprecated code. It was not a moratorium on removing deprecated things. The silencing discussion just came up first in a discussion on the moratorium. The main problem they were running into was that the people who saw these warnings the most were not people directly using the deprecated features; they were users of packages written by third parties that used the deprecated features. Those people can't do anything to fix the problem, and many of them think that something is broken when they see the warning (I don't know why people do this, but they do). This problem is exacerbated by the standard library's position as a standard library. It's at the base of everyone's stack so these indirect effects are quite frequent, quite possibly the majority case. Users would use a newer version of Python library than the third party developer tested on and see these errors instead of the developer. I think this concern is fairly general and applies to numpy nearly as much as it does the standard library. It is at the bottom of many people's stacks. Someone calling matplotlib.pyplot.plot() should not see a DeprecationWarning from numpy. So they consider that deprecation warnings are like a lint tool for conscientious developers who remember to test their code with -Wd, but not something to bother users with. In Numpy, the majority of our users are actually (relatively unsophisticated) developers, Whether they sometimes wear a developer hat or not isn't the relevant distinction. The question to ask is, Are they the ones writing the code that directly uses the deprecated features? and we don't plan to support deprecated features indefinitely. Again, this is not relevant. The silencing of DeprecationWarnings was not driven by this. Our deprecations seem to better match what Python calls a FutureWarning: warnings about constructs that will change semantically in the future. http://docs.python.org/library/warnings.html#warning-categories FutureWarning is displayed by default, and available in all versions of Python. So maybe we should change all our DeprecationWarnings into FutureWarnings (or at least the ones that we actually plan to follow through on). Thoughts? Using FutureWarning for deprecated functions (i.e. functions that will disappear in future releases) is an abuse of the semantics. FutureWarning is for things like the numpy.histogram() changes from a few years ago: changes in default arguments that will change the semantics of a given function call. Some of our DeprecationWarnings possibly should be FutureWarnings, but most shouldn't I don't think. I can see a case being made for using a custom non-silenced exception for some cases that really probably show up mostly in true end-user scenarios, e.g. genfromtxt(). But there are many other cases where we should continue to use DeprecationWarning, e.g. _array2string(). But on the whole, I would just leave the DeprecationWarnings as they are. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] un-silencing Numpy's deprecation warnings
On 05/22/2012 12:06 PM, Robert Kern wrote: On Tue, May 22, 2012 at 9:27 AM, Nathaniel Smithn...@pobox.com wrote: So starting in Python 2.7 and 3.2, the Python developers have made DeprecationWarnings invisible by default: http://docs.python.org/whatsnew/2.7.html#the-future-for-python-2-x http://mail.python.org/pipermail/stdlib-sig/2009-November/000789.html http://bugs.python.org/issue7319 The only way to see them is to explicitly request them by running Python with -Wd. The logic seems to be that between the end-of-development for 2.7 and the moratorium on 3.2 changes, there were a *lot* of added deprecations that were annoying people, and deprecations in the Python stdlib mean this code is probably sub-optimal but it will still continue to work indefinitely. That's not quite it, I think, since this change was also made in Python 3.2 and will remain for all future versions of Python. DeprecationWarning *is* used for things that will definitely be going away, not just things that are no longer recommended but will continue to live. Note that the 3.2 moratorium was for changes to the language proper. The point was to encourage stdlib development, including the removal of deprecated code. It was not a moratorium on removing deprecated things. The silencing discussion just came up first in a discussion on the moratorium. The main problem they were running into was that the people who saw these warnings the most were not people directly using the deprecated features; they were users of packages written by third parties that used the deprecated features. Those people can't do anything to fix the problem, and many of them think that something is broken when they see the warning (I don't know why people do this, but they do). This problem is exacerbated by the standard library's position as a standard library. It's at the base of everyone's stack so these indirect effects are quite frequent, quite possibly the majority case. Users would use a newer version of Python library than the third party developer tested on and see these errors instead of the developer. I think this concern is fairly general and applies to numpy nearly as much as it does the standard library. It is at the bottom of many people's stacks. Someone calling matplotlib.pyplot.plot() should not see a DeprecationWarning from numpy. So they consider that deprecation warnings are like a lint tool for conscientious developers who remember to test their code with -Wd, but not something to bother users with. In Numpy, the majority of our users are actually (relatively unsophisticated) developers, Whether they sometimes wear a developer hat or not isn't the relevant distinction. The question to ask is, Are they the ones writing the code that directly uses the deprecated features? and we don't plan to support deprecated features indefinitely. Again, this is not relevant. The silencing of DeprecationWarnings was not driven by this. Our deprecations seem to better match what Python calls a FutureWarning: warnings about constructs that will change semantically in the future. http://docs.python.org/library/warnings.html#warning-categories FutureWarning is displayed by default, and available in all versions of Python. So maybe we should change all our DeprecationWarnings into FutureWarnings (or at least the ones that we actually plan to follow through on). Thoughts? Using FutureWarning for deprecated functions (i.e. functions that will disappear in future releases) is an abuse of the semantics. FutureWarning is for things like the numpy.histogram() changes from a few years ago: changes in default arguments that will change the semantics of a given function call. Some of our DeprecationWarnings possibly should be FutureWarnings, but most shouldn't I don't think. I guess the diagonal() change would at least be a FutureWarning then? (When you write to the result?) Dag I can see a case being made for using a custom non-silenced exception for some cases that really probably show up mostly in true end-user scenarios, e.g. genfromtxt(). But there are many other cases where we should continue to use DeprecationWarning, e.g. _array2string(). But on the whole, I would just leave the DeprecationWarnings as they are. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] un-silencing Numpy's deprecation warnings
On Tue, May 22, 2012 at 11:14 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 05/22/2012 12:06 PM, Robert Kern wrote: On Tue, May 22, 2012 at 9:27 AM, Nathaniel Smithn...@pobox.com wrote: So maybe we should change all our DeprecationWarnings into FutureWarnings (or at least the ones that we actually plan to follow through on). Thoughts? Using FutureWarning for deprecated functions (i.e. functions that will disappear in future releases) is an abuse of the semantics. FutureWarning is for things like the numpy.histogram() changes from a few years ago: changes in default arguments that will change the semantics of a given function call. Some of our DeprecationWarnings possibly should be FutureWarnings, but most shouldn't I don't think. I guess the diagonal() change would at least be a FutureWarning then? (When you write to the result?) Sure. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] assign a float number to a member of integer array always return integer
Dear all, Just in case some one didn't know this. Assign a float number to an integer array element will always return integer. In [4]: a=np.arange(2,11,2) In [5]: a Out[5]: array([ 2, 4, 6, 8, 10]) In [6]: a[1]=4.5 In [7]: a Out[7]: array([ 2, 4, 6, 8, 10]) so I would always do this if I expected a transfer from integer to float? In [18]: b=a.astype(float) In [19]: b Out[19]: array([ 2., 4., 6., 8., 10.]) In [20]: b[1]=4.5 In [21]: b Out[21]: array([ 2. , 4.5, 6. , 8. , 10. ]) thanks et cheers, Chao -- *** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Problem with str.format() and np.recarray
I came across this problem which appears to be new in numpy 1.6.2 (vs. 1.6.1): In [17]: a = np.array([(1, )], dtype=[('a', 'i4')]) In [18]: ra = a.view(np.recarray) In [19]: '{}'.format(ra[0]) --- RuntimeError Traceback (most recent call last) /data/baffin/tom/git/eng_archive/ipython-input-19-cbdd26e3ea78 in module() 1 '{}'.format(ra[0]) RuntimeError: maximum recursion depth exceeded while calling a Python object In [20]: str(ra[0]) Out[20]: '(1,)' In [21]: ra[0] Out[21]: (1,) There are obvious workarounds but it seems something is not right. I'm running Python 2.7 on linux x86_64. Cheers, Tom ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] un-silencing Numpy's deprecation warnings
On Tue, May 22, 2012 at 11:06 AM, Robert Kern robert.k...@gmail.com wrote: On Tue, May 22, 2012 at 9:27 AM, Nathaniel Smith n...@pobox.com wrote: So starting in Python 2.7 and 3.2, the Python developers have made DeprecationWarnings invisible by default: http://docs.python.org/whatsnew/2.7.html#the-future-for-python-2-x http://mail.python.org/pipermail/stdlib-sig/2009-November/000789.html http://bugs.python.org/issue7319 The only way to see them is to explicitly request them by running Python with -Wd. The logic seems to be that between the end-of-development for 2.7 and the moratorium on 3.2 changes, there were a *lot* of added deprecations that were annoying people, and deprecations in the Python stdlib mean this code is probably sub-optimal but it will still continue to work indefinitely. That's not quite it, I think, since this change was also made in Python 3.2 and will remain for all future versions of Python. DeprecationWarning *is* used for things that will definitely be going away, not just things that are no longer recommended but will continue to live. Note that the 3.2 moratorium was for changes to the language proper. The point was to encourage stdlib development, including the removal of deprecated code. It was not a moratorium on removing deprecated things. The silencing discussion just came up first in a discussion on the moratorium. The main problem they were running into was that the people who saw these warnings the most were not people directly using the deprecated features; they were users of packages written by third parties that used the deprecated features. Those people can't do anything to fix the problem, and many of them think that something is broken when they see the warning (I don't know why people do this, but they do). This problem is exacerbated by the standard library's position as a standard library. It's at the base of everyone's stack so these indirect effects are quite frequent, quite possibly the majority case. Users would use a newer version of Python library than the third party developer tested on and see these errors instead of the developer. I think this concern is fairly general and applies to numpy nearly as much as it does the standard library. It is at the bottom of many people's stacks. Someone calling matplotlib.pyplot.plot() should not see a DeprecationWarning from numpy. Yes, good points -- though I think there is a also real cost/benefit trade-off that depends on the details of how often these warnings are issued, the specific user base, etc. Compared to stdlib, a *much* higher proportion of numpy-using code consists of scripts whose only users are their authors, who didn't think very carefully about error handling, and who will continue to use these scripts for long periods of time (i.e. over multiple releases). So I feel like we should have a higher threshold for making warnings silent by default. OTOH, the distinction you suggest does make sense. I would summarize it as: - If a function or similar will just disappear in a future release, causing obvious failures in any code that depends on it, then DeprecationWarning is fine. People's code will unexpectedly break from time to time, but in safe ways, and anyway downgrading is easy. - Otherwise FutureWarning is preferred Does that sound like a reasonable rule of thumb? -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] un-silencing Numpy's deprecation warnings
On Tue, May 22, 2012 at 2:45 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, May 22, 2012 at 11:06 AM, Robert Kern robert.k...@gmail.com wrote: On Tue, May 22, 2012 at 9:27 AM, Nathaniel Smith n...@pobox.com wrote: So starting in Python 2.7 and 3.2, the Python developers have made DeprecationWarnings invisible by default: http://docs.python.org/whatsnew/2.7.html#the-future-for-python-2-x http://mail.python.org/pipermail/stdlib-sig/2009-November/000789.html http://bugs.python.org/issue7319 The only way to see them is to explicitly request them by running Python with -Wd. The logic seems to be that between the end-of-development for 2.7 and the moratorium on 3.2 changes, there were a *lot* of added deprecations that were annoying people, and deprecations in the Python stdlib mean this code is probably sub-optimal but it will still continue to work indefinitely. That's not quite it, I think, since this change was also made in Python 3.2 and will remain for all future versions of Python. DeprecationWarning *is* used for things that will definitely be going away, not just things that are no longer recommended but will continue to live. Note that the 3.2 moratorium was for changes to the language proper. The point was to encourage stdlib development, including the removal of deprecated code. It was not a moratorium on removing deprecated things. The silencing discussion just came up first in a discussion on the moratorium. The main problem they were running into was that the people who saw these warnings the most were not people directly using the deprecated features; they were users of packages written by third parties that used the deprecated features. Those people can't do anything to fix the problem, and many of them think that something is broken when they see the warning (I don't know why people do this, but they do). This problem is exacerbated by the standard library's position as a standard library. It's at the base of everyone's stack so these indirect effects are quite frequent, quite possibly the majority case. Users would use a newer version of Python library than the third party developer tested on and see these errors instead of the developer. I think this concern is fairly general and applies to numpy nearly as much as it does the standard library. It is at the bottom of many people's stacks. Someone calling matplotlib.pyplot.plot() should not see a DeprecationWarning from numpy. Yes, good points -- though I think there is a also real cost/benefit trade-off that depends on the details of how often these warnings are issued, the specific user base, etc. Compared to stdlib, a *much* higher proportion of numpy-using code consists of scripts whose only users are their authors, who didn't think very carefully about error handling, and who will continue to use these scripts for long periods of time (i.e. over multiple releases). So I feel like we should have a higher threshold for making warnings silent by default. OTOH, the distinction you suggest does make sense. I would summarize it as: - If a function or similar will just disappear in a future release, causing obvious failures in any code that depends on it, then DeprecationWarning is fine. People's code will unexpectedly break from time to time, but in safe ways, and anyway downgrading is easy. - Otherwise FutureWarning is preferred Does that sound like a reasonable rule of thumb? Sure. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Building error with ATLAS
Hi all, I'm now trying to build NumPy with ATLAS on CentOS 6.2. I'm going to use them with SciPy. My CentOS is installed as Software Development Workstation on my Virtual Machine (VMware Fusion 4, Mac OS 10.7.4). I already installed Python 2.7.3 on /usr/local/python-2.7.3 from sources, and no other tools are installed yet. I tried but failed to build ATLAS, so I got ATLAS by using yum command: yum install blas-devel lapack-devel atlas-devel Then I tried to build NumPy. But I got these errors: building 'numpy.core._sort' extension compiling C sources C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/usr/local/python-2.7.3/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c' gcc: build/src.linux-x86_64-2.7/numpy/core/src/_sortmodule.c gcc -pthread -shared build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/numpy/core/src/_sortmodule.o -L. -Lbuild/temp.linux-x86_64-2.7 -lnpymath -lm -lpython2.7 -o build/lib.linux-x86_64-2.7/numpy/core/_sort.so /usr/bin/ld: cannot find -lpython2.7 collect2: ld returned 1 exit status /usr/bin/ld: cannot find -lpython2.7 collect2: ld returned 1 exit status error: Command gcc -pthread -shared build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/numpy/core/src/_sortmodule.o -L. -Lbuild/temp.linux-x86_64-2.7 -lnpymath -lm -lpython2.7 -o build/lib.linux-x86_64-2.7/numpy/core/_sort.so failed with exit status 1 Before I built NumPy, I uncommented and modified site.cfg as below: [DEFAULT] library_dirs = /usr/local/python-2.7.3/lib:/usr/lib64:/usr/lib64/atlas include_dirs = /usr/local/python-2.7.3/include:/usr/include:/usr/include/atlas [blas_opt] libraries = f77blas, cblas, atlas [lapack_opt] libraries = lapack, f77blas, cblas, atlas And setup.py config dumped these messages: Running from numpy source directory.F2PY Version 2 blas_opt_info: blas_mkl_info: libraries mkl,vml,guide not found in /usr/local/python-2.7.3/lib libraries mkl,vml,guide not found in /usr/lib64 libraries mkl,vml,guide not found in /usr/lib64/atlas NOT AVAILABLE atlas_blas_threads_info: Setting PTATLAS=ATLAS libraries ptf77blas,ptcblas,atlas not found in /usr/local/python-2.7.3/lib Setting PTATLAS=ATLAS customize GnuFCompiler Could not locate executable g77 Could not locate executable f77 customize IntelFCompiler Could not locate executable ifort Could not locate executable ifc customize LaheyFCompiler Could not locate executable lf95 customize PGroupFCompiler Could not locate executable pgf90 Could not locate executable pgf77 customize AbsoftFCompiler Could not locate executable f90 customize NAGFCompiler Found executable /usr/bin/f95 customize VastFCompiler customize CompaqFCompiler Could not locate executable fort customize IntelItaniumFCompiler Could not locate executable efort Could not locate executable efc customize IntelEM64TFCompiler customize Gnu95FCompiler Found executable /usr/bin/gfortran customize Gnu95FCompiler customize Gnu95FCompiler using config compiling '_configtest.c': /* This file is generated from numpy/distutils/system_info.py */ void ATL_buildinfo(void); int main(void) { ATL_buildinfo(); return 0; } C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-c' gcc: _configtest.c gcc -pthread _configtest.o -L/usr/lib64/atlas -lptf77blas -lptcblas -latlas -o _configtest ATLAS version 3.8.4 built by mockbuild on Wed Dec 7 18:04:21 GMT 2011: UNAME: Linux c6b5.bsys.dev.centos.org 2.6.32-44.2.el6.x86_64 #1 SMP Wed Jul 21 12:48:32 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux INSTFLG : -1 0 -a 1 ARCHDEFS : -DATL_OS_Linux -DATL_ARCH_PII -DATL_CPUMHZ=2261 -DATL_SSE2 -DATL_SSE1 -DATL_USE64BITS -DATL_GAS_x8664 F2CDEFS : -DAdd_ -DF77_INTEGER=int -DStringSunStyle CACHEEDGE: 8388608 F77 : gfortran, version GNU Fortran (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3) F77FLAGS : -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -g -Wa,--noexecstack -fPIC -m64 SMC : gcc, version gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3) SMCFLAGS : -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -g -Wa,--noexecstack -fPIC -m64 SKC : gcc, version gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3) SKCFLAGS : -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -g -Wa,--noexecstack -fPIC -m64 success! removing: _configtest.c _configtest.o _configtest Setting PTATLAS=ATLAS FOUND: libraries = ['ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/usr/lib64/atlas']
[Numpy-discussion] question about in-place operations
hello everybody, first of all thanks to the developed for bumpy which is very useful. I am building a software that uses numpy+pyopencl for lattice qcd computations. One problem that I am facing is that I need to perform most operations on arrays in place and I must avoid creating temporary arrays (because my arrays are many gigabyte large). One typical operation is this a[i] += const * b[i] What is the efficient way to do is when a and b are arbitrary arrays? const is usually a complex number. a and b have the same shape but are not necessarily uni-dimensional. Massimo ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
On 05/22/2012 04:25 PM, Massimo DiPierro wrote: hello everybody, first of all thanks to the developed for bumpy which is very useful. I am building a software that uses numpy+pyopencl for lattice qcd computations. One problem that I am facing is that I need to perform most operations on arrays in place and I must avoid creating temporary arrays (because my arrays are many gigabyte large). One typical operation is this a[i] += const * b[i] What is the efficient way to do is when a and b are arbitrary arrays? const is usually a complex number. a and b have the same shape but are not necessarily uni-dimensional. I don't think NumPy support this; if you can't modify b[i] in-place, I think your only option will be one of numexpr/Theano/Cython. Dag ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] un-silencing Numpy's deprecation warnings
On 05/22/2012 03:50 AM, Peter wrote: We had the same discussion for Biopython two years ago, and introduced our own warning class to avoid our deprecations being silent (and thus almost pointless). It is just a subclass of Warning (originally we used a subclass of UserWarning). For SpacePy we took a similar but slightly different approach; this is in the top-level __init__: if config['enable_deprecation_warning']: warnings.filterwarnings('default', '', DeprecationWarning, '^spacepy', 0, False) enable_deprecation_warning is True by default, but can be changed in the user's config file. This keeps everything as DeprecationWarning but only fiddles with the filter for spacepy (and it's set to default, not always.) -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jnie...@lanl.gov Correspondence / Technical data or Software Publicly Available ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
Thank you. I will look into numexpr. Anyway, I do not need arbitrary expressions. If there were something like numpy.add_scaled(a,scale,b) with support for scale in int, float, complex, this would be sufficient for me. Massimo On May 22, 2012, at 9:32 AM, Dag Sverre Seljebotn wrote: On 05/22/2012 04:25 PM, Massimo DiPierro wrote: hello everybody, first of all thanks to the developed for bumpy which is very useful. I am building a software that uses numpy+pyopencl for lattice qcd computations. One problem that I am facing is that I need to perform most operations on arrays in place and I must avoid creating temporary arrays (because my arrays are many gigabyte large). One typical operation is this a[i] += const * b[i] What is the efficient way to do is when a and b are arbitrary arrays? const is usually a complex number. a and b have the same shape but are not necessarily uni-dimensional. I don't think NumPy support this; if you can't modify b[i] in-place, I think your only option will be one of numexpr/Theano/Cython. Dag ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Internationalization of numpy/scipy docstrings...
Docstrings are not stored in .rst files but in the numpy sources, so there are some non-trivial technical and workflow details missing here. But besides that, I think translating everything (even into a single language) is a massive amount of work, and it's not at all clear if there's enough people willing to help out with this. So I'd think it would be better to start with just the high-level docs (numpy user guide, scipy tutorial) to see how it goes. I understand that this is non-trivial, for me anyway, because I can't figure out how to make my way around numpydoc, and documentation editor code (not quite true, as Pauli accepted a couple of my pull requests, but I definitely can't make it dance). This is why I asked for interest and help on the mailing list. I think for the people that worked on the documentation editor, or know Django, or are cleverer than I, the required changes to the documentation editor might by mid-trivial. That is my hope anyway. Would probably have the high-level docs separate from the docstring processing anyway since the high-level docs are already in a sphinx source directory. So I agree that the high-level docs would be the best place to start and in-fact that is what I was working with and found the problem with the sphinx gettext builder mentioned in the original post. I do want to defend and clarify the docstring processing though. Docstrings, in the code, will always be English. The documentation editor is the fulcrum. The documentation editor will work with the in the code docstrings *exactly *as it does now. The documentation editor would be changed so that when it writes the ReST formatted docstring back into the code, it *also *writes a *.rst file to a separate sphinx source directory. These *.rst files would not be part of the numpy source code directory, but an interim file for the documentation editor and sphinx to extract strings to make *.po files, pootle + hordes of translators :-) gives *.pot files, *.pot - *.mo - *.rst (translated). The English *.rst, *.po, *.pot, *.mo files are all interim products behind the scenes. The translated *.rst files would NOT be part of the numpy source code, but packaged separately. I must admit that I did hope that there would be more interest. Maybe I should have figured out how to put 'maskna' or '1.7' in the subject? In defense of there not be much interest is that the people who would possibly benefit, aren't reading English mailing lists. Kindest regards, Tim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
For now I will be doing this: import numpy import time a=numpy.zeros(200) b=numpy.zeros(200) c=1.0 # naive solution t0 = time.time() for i in xrange(len(a)): a[i] += c*b[i] print time.time()-t0 # possible solution n=10 t0 = time.time() for i in xrange(0,len(a),n): a[i:i+n] += c*b[i:i+n] print time.time()-t0 the second possible solution appears 1000x faster then the former in my tests and uses little extra memory. It is only 2x slower than b*=c. Any reason not to do it? On May 22, 2012, at 9:32 AM, Dag Sverre Seljebotn wrote: On 05/22/2012 04:25 PM, Massimo DiPierro wrote: hello everybody, first of all thanks to the developed for bumpy which is very useful. I am building a software that uses numpy+pyopencl for lattice qcd computations. One problem that I am facing is that I need to perform most operations on arrays in place and I must avoid creating temporary arrays (because my arrays are many gigabyte large). One typical operation is this a[i] += const * b[i] What is the efficient way to do is when a and b are arbitrary arrays? const is usually a complex number. a and b have the same shape but are not necessarily uni-dimensional. I don't think NumPy support this; if you can't modify b[i] in-place, I think your only option will be one of numexpr/Theano/Cython. Dag ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
On Tue, May 22, 2012 at 3:47 PM, Massimo DiPierro massimo.dipie...@gmail.com wrote: Thank you. I will look into numexpr. Anyway, I do not need arbitrary expressions. If there were something like numpy.add_scaled(a,scale,b) with support for scale in int, float, complex, this would be sufficient for me. BLAS has the xAXPY functions, which will do this for float and complex. import numpy as np from scipy.linalg import fblas def add_scaled_inplace(a, scale, b): if np.issubdtype(a.dtype, complex): fblas.zaxpy(b, a, a=scale) else: fblas.daxpy(b, a, a=scale) -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
Thank you this does it. On May 22, 2012, at 9:59 AM, Robert Kern wrote: On Tue, May 22, 2012 at 3:47 PM, Massimo DiPierro massimo.dipie...@gmail.com wrote: Thank you. I will look into numexpr. Anyway, I do not need arbitrary expressions. If there were something like numpy.add_scaled(a,scale,b) with support for scale in int, float, complex, this would be sufficient for me. BLAS has the xAXPY functions, which will do this for float and complex. import numpy as np from scipy.linalg import fblas def add_scaled_inplace(a, scale, b): if np.issubdtype(a.dtype, complex): fblas.zaxpy(b, a, a=scale) else: fblas.daxpy(b, a, a=scale) -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] how to avoid re-shaping
One more questions (since this list is very useful. ;-) If I have a numpy array of arbitrary shape, is there are a way to sequentially loop over its elements without reshaping it into a 1D array? I am trying to simplify this: n=product(data.shape) oldshape = data.shape newshape = (n,) data.reshape(newshape) for i in xrange(n): do_something_with(data[i]) data.reshape(oldshape) Massimo ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] how to avoid re-shaping
On Tue, May 22, 2012 at 4:09 PM, Massimo DiPierro massimo.dipie...@gmail.com wrote: One more questions (since this list is very useful. ;-) If I have a numpy array of arbitrary shape, is there are a way to sequentially loop over its elements without reshaping it into a 1D array? I am trying to simplify this: n=product(data.shape) oldshape = data.shape newshape = (n,) data.reshape(newshape) Note that the .reshape() method does not work in-place. It just returns a new ndarray object viewing the same data using the different shape. That said, just iterate over data.flat, if you must iterate manually. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] assign a float number to a member of integer array always return integer
On Tue, May 22, 2012 at 6:33 AM, Chao YUE chaoyue...@gmail.com wrote: Just in case some one didn't know this. Assign a float number to an integer array element will always return integer. right -- numpy arrays are typed -- that's one of the points of them -- you wouldn't want the entire array up-cast with a single assignment -- particularly since there are only python literals for a subset of the numpy types. so I would always do this if I expected a transfer from integer to float? In [18]: b=a.astype(float) yes -- but that's an odd way of thinking about -- what you want to do is think about what type you need your array to be before you create it, then create it the way you need it: In [87]: np.arange(5, dtype=np.float) Out[87]: array([ 0., 1., 2., 3., 4.]) or better: In [91]: np.linspace(0,5,6) Out[91]: array([ 0., 1., 2., 3., 4., 5.]) note that most (all?) numpy array constructors take a dtype argument. -Chris In [19]: b Out[19]: array([ 2., 4., 6., 8., 10.]) In [20]: b[1]=4.5 In [21]: b Out[21]: array([ 2. , 4.5, 6. , 8. , 10. ]) thanks et cheers, Chao -- *** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] subclassing ndarray subtleties??
On Mon, May 21, 2012 at 6:47 PM, Tom Aldcroft aldcr...@head.cfa.harvard.edu wrote: Over on the scipy-user mailing list there was a question about subclassing ndarray and I was interested to see two responses that seemed to imply that subclassing should be avoided. From Dag and Nathaniel, respectively: Subclassing ndarray is a very tricky business -- I did it once and regretted having done it for years, because there's so much you can't do etc.. You're almost certainly better off with embedding an array as an attribute, and then forward properties etc. to it. Yes, it's almost always the wrong thing... So my question is whether there are subtleties or issues that are not covered in the standard NumPy documents on subclassing ndarray. What are the things you can't do etc? I'm working on a project that relies heavily on an ndarray subclass which just adds a few attributes and slightly tweaks __getitem__. It seems fine and I really like that the class is an ndarray with all the built-in methods already there. Am I missing anything? From the scipy thread I did already learn that one should also override __getslice__ in addition to __getitem__ to be safe. I don't know of anything that the docs are lacking in particular. It's just that subclassing in general is basically a special form of monkey-patching: you have this ecosystem of cooperating methods, and then you're inserting some arbitrary changes in the middle of it. Making it all work in general requires that you carefully think through how all the different pieces of the ndarray API interact, and the ndarray API is very large and complicated. The __getslice__ thing is one example of this. For another: does your __getitem__ properly handle *all* the cases that regular ndarray.__getitem__ handles? (I'm not sure anyone actually knows what this complete list is, there are a lot of potential corner cases.) What happens if one of your objects is passed to third-party code that uses __getitem__? What happens if your array is accidentally stripped of its magic properties by passing through np.asarray() at the top of some function? Have you thought about how your special attributes are affected by, say, swapaxes? Have you applied your tweaks to item() and setitem()? I'm just guessing randomly here of course, since I have no idea what you've done. And I've subclassed ndarray myself at least three times, for reasons that seemed good enough at the time, so I'm not saying it's never doable. It's just that there are tons of these tiny little details, any one of which can trip you up, and that means that people tend to dive in and then discover the pitfalls later. - N ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Internationalization of numpy/scipy docstrings...
On Tue, May 22, 2012 at 10:51 AM, Tim Cera t...@cerazone.net wrote: Docstrings are not stored in .rst files but in the numpy sources, so there are some non-trivial technical and workflow details missing here. But besides that, I think translating everything (even into a single language) is a massive amount of work, and it's not at all clear if there's enough people willing to help out with this. So I'd think it would be better to start with just the high-level docs (numpy user guide, scipy tutorial) to see how it goes. I understand that this is non-trivial, for me anyway, because I can't figure out how to make my way around numpydoc, and documentation editor code (not quite true, as Pauli accepted a couple of my pull requests, but I definitely can't make it dance). This is why I asked for interest and help on the mailing list. I think for the people that worked on the documentation editor, or know Django, or are cleverer than I, the required changes to the documentation editor might by mid-trivial. That is my hope anyway. Would probably have the high-level docs separate from the docstring processing anyway since the high-level docs are already in a sphinx source directory. So I agree that the high-level docs would be the best place to start and in-fact that is what I was working with and found the problem with the sphinx gettext builder mentioned in the original post. I do want to defend and clarify the docstring processing though. Docstrings, in the code, will always be English. The documentation editor is the fulcrum. The documentation editor will work with the in the code docstrings exactly as it does now. The documentation editor would be changed so that when it writes the ReST formatted docstring back into the code, it also writes a *.rst file to a separate sphinx source directory. These *.rst files would not be part of the numpy source code directory, but an interim file for the documentation editor and sphinx to extract strings to make *.po files, pootle + hordes of translators :-) gives *.pot files, *.pot - *.mo - *.rst (translated). The English *.rst, *.po, *.pot, *.mo files are all interim products behind the scenes. The translated *.rst files would NOT be part of the numpy source code, but packaged separately. I must admit that I did hope that there would be more interest. Maybe I should have figured out how to put 'maskna' or '1.7' in the subject? In defense of there not be much interest is that the people who would possibly benefit, aren't reading English mailing lists. One advantage of getting this done would be that other packages could follow the same approach. Just as numpy.testing and numpy's doc standard has spread to related packages, being able to generate translations might be even more interesting to downstream packages. There the fraction of end users, that are not used to working in english anyway, might be larger than for numpy itself. The numpy mailing list may be to narrow to catch the attention of developers with enough interest and expertise in the area. Josef Kindest regards, Tim ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
On 05/22/2012 04:54 PM, Massimo DiPierro wrote: For now I will be doing this: import numpy import time a=numpy.zeros(200) b=numpy.zeros(200) c=1.0 # naive solution t0 = time.time() for i in xrange(len(a)): a[i] += c*b[i] print time.time()-t0 # possible solution n=10 t0 = time.time() for i in xrange(0,len(a),n): a[i:i+n] += c*b[i:i+n] print time.time()-t0 the second possible solution appears 1000x faster then the former in my tests and uses little extra memory. It is only 2x slower than b*=c. Any reason not to do it? No, this is perfectly fine, you just manually did what numexpr does. On 05/22/2012 04:47 PM, Massimo DiPierro wrote: Thank you. I will look into numexpr. Anyway, I do not need arbitrary expressions. If there were something like numpy.add_scaled(a,scale,b) with support for scale in int, float, complex, this would be sufficient for me. But of course, few needs *arbitrary* expressions -- it's just that the ones they want are not already compiled. It's the last 5% functionality that's different for everybody... (But the example you mention could make a nice ufunc; so an alternative for you would be to look at the C implementation of np.add and try to submit a pull request for numpy.add_scaled) Dag ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
On 5/22/12 8:47 PM, Dag Sverre Seljebotn wrote: On 05/22/2012 04:54 PM, Massimo DiPierro wrote: For now I will be doing this: import numpy import time a=numpy.zeros(200) b=numpy.zeros(200) c=1.0 # naive solution t0 = time.time() for i in xrange(len(a)): a[i] += c*b[i] print time.time()-t0 # possible solution n=10 t0 = time.time() for i in xrange(0,len(a),n): a[i:i+n] += c*b[i:i+n] print time.time()-t0 the second possible solution appears 1000x faster then the former in my tests and uses little extra memory. It is only 2x slower than b*=c. Any reason not to do it? No, this is perfectly fine, you just manually did what numexpr does. Yeah. You basically re-discovered the blocking technique. For a more general example on how to apply the blocking technique with NumPy see the section CPU vs Memory Benchmark in: https://python.g-node.org/python-autumnschool-2010/materials/starving_cpus Of course numexpr has less overhead (and can use multiple cores) than using plain NumPy. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
Thank you Dag, I will look into it. Is there any documentation about ufunc? Is this the file core/src/umath/ufunc_object.c Massimo On May 22, 2012, at 1:47 PM, Dag Sverre Seljebotn wrote: On 05/22/2012 04:54 PM, Massimo DiPierro wrote: For now I will be doing this: import numpy import time a=numpy.zeros(200) b=numpy.zeros(200) c=1.0 # naive solution t0 = time.time() for i in xrange(len(a)): a[i] += c*b[i] print time.time()-t0 # possible solution n=10 t0 = time.time() for i in xrange(0,len(a),n): a[i:i+n] += c*b[i:i+n] print time.time()-t0 the second possible solution appears 1000x faster then the former in my tests and uses little extra memory. It is only 2x slower than b*=c. Any reason not to do it? No, this is perfectly fine, you just manually did what numexpr does. On 05/22/2012 04:47 PM, Massimo DiPierro wrote: Thank you. I will look into numexpr. Anyway, I do not need arbitrary expressions. If there were something like numpy.add_scaled(a,scale,b) with support for scale in int, float, complex, this would be sufficient for me. But of course, few needs *arbitrary* expressions -- it's just that the ones they want are not already compiled. It's the last 5% functionality that's different for everybody... (But the example you mention could make a nice ufunc; so an alternative for you would be to look at the C implementation of np.add and try to submit a pull request for numpy.add_scaled) Dag ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] question about in-place operations
This problem is linear so probably Ram IO bound. I do not think I would benefit much for multiple cores. But I will give it a try. In the short term this is good enough for me. On May 22, 2012, at 1:57 PM, Francesc Alted wrote: On 5/22/12 8:47 PM, Dag Sverre Seljebotn wrote: On 05/22/2012 04:54 PM, Massimo DiPierro wrote: For now I will be doing this: import numpy import time a=numpy.zeros(200) b=numpy.zeros(200) c=1.0 # naive solution t0 = time.time() for i in xrange(len(a)): a[i] += c*b[i] print time.time()-t0 # possible solution n=10 t0 = time.time() for i in xrange(0,len(a),n): a[i:i+n] += c*b[i:i+n] print time.time()-t0 the second possible solution appears 1000x faster then the former in my tests and uses little extra memory. It is only 2x slower than b*=c. Any reason not to do it? No, this is perfectly fine, you just manually did what numexpr does. Yeah. You basically re-discovered the blocking technique. For a more general example on how to apply the blocking technique with NumPy see the section CPU vs Memory Benchmark in: https://python.g-node.org/python-autumnschool-2010/materials/starving_cpus Of course numexpr has less overhead (and can use multiple cores) than using plain NumPy. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] pre-PEP for making creative forking of NumPy less destructive
Hi, The example with numpy array for small array, the speed problem is probably because NumPy have not been speed optimized for low overhead. For example, each c function should check first if the input is a NumPy array, if not jump to a function to make one. For example, currently in the c function(PyArray_Multiply?) that got called by dot(), a c function call is made to check if the array is a NumPy array. This is an extra overhead for the expected most frequent expected behavior that the input is a NumPy array. I'm pretty sure this happen at many place. In this particular function, there is many other function call before calling blas just for the simple case of vector x vector, vector x matrix or matrix x matrix dot product. But this is probably for another thread if people want to discuss it more. Also, I didn't verify how frequently we could lower the overhead as we don't need it. So it could be just a few function that need those type of optimization. For the comparison with the multiple type of array on the GPU, I think the first reason is that people worked isolated and that the only implemented the subset of the numpy ndarray they needed. As different project/groups need different part, reusing other people work was not trivial. Otherwise, I see the problem, but I don't know what to say about it as I didn't experience it. Fred ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] subclassing ndarray subtleties??
On Tue, May 22, 2012 at 4:07 PM, Dan Goodman dg.gm...@thesamovar.net wrote: On 22/05/2012 18:20, Nathaniel Smith wrote: I don't know of anything that the docs are lacking in particular. It's just that subclassing in general is basically a special form of monkey-patching: you have this ecosystem of cooperating methods, and then you're inserting some arbitrary changes in the middle of it. Making it all work in general requires that you carefully think through how all the different pieces of the ndarray API interact, and the ndarray API is very large and complicated. The __getslice__ thing is one example of this. For another: does your __getitem__ properly handle *all* the cases that regular ndarray.__getitem__ handles? (I'm not sure anyone actually knows what this complete list is, there are a lot of potential corner cases.) What happens if one of your objects is passed to third-party code that uses __getitem__? What happens if your array is accidentally stripped of its magic properties by passing through np.asarray() at the top of some function? Have you thought about how your special attributes are affected by, say, swapaxes? Have you applied your tweaks to item() and setitem()? I'm just guessing randomly here of course, since I have no idea what you've done. And I've subclassed ndarray myself at least three times, for reasons that seemed good enough at the time, so I'm not saying it's never doable. It's just that there are tons of these tiny little details, any one of which can trip you up, and that means that people tend to dive in and then discover the pitfalls later. I've also used subclasses of ndarray, and have stumbled across most (but not all) of the problems you mentioned above. In my case, my code has gradually evolved over a few years as I've become aware of each of these problems. I think it would be useful to have an example of a completely 'correctly' subclassed ndarray that handles all of these issues that people could use as a template when they want to subclass ndarray. I appreciate, though, that there's no-one who particularly wants to do this! :) I'd offer my code as an example, but Nathaniel's comment above shows that there's many things mine doesn't handle properly. Dan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion The text Nathaniel wrote is already very useful (certainly to me). It seems like this text could be put almost verbatim (maybe with some list items) in a subsection at the end of [1] titled Caution or Other considerations. Thanks, Tom [1] http://docs.scipy.org/doc/numpy/user/basics.subclassing.html ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] assign a float number to a member of integer array always return integer
Thanks Chris for informative post. cheers, Chao 2012/5/22 Chris Barker chris.bar...@noaa.gov On Tue, May 22, 2012 at 6:33 AM, Chao YUE chaoyue...@gmail.com wrote: Just in case some one didn't know this. Assign a float number to an integer array element will always return integer. right -- numpy arrays are typed -- that's one of the points of them -- you wouldn't want the entire array up-cast with a single assignment -- particularly since there are only python literals for a subset of the numpy types. so I would always do this if I expected a transfer from integer to float? In [18]: b=a.astype(float) yes -- but that's an odd way of thinking about -- what you want to do is think about what type you need your array to be before you create it, then create it the way you need it: In [87]: np.arange(5, dtype=np.float) Out[87]: array([ 0., 1., 2., 3., 4.]) or better: In [91]: np.linspace(0,5,6) Out[91]: array([ 0., 1., 2., 3., 4., 5.]) note that most (all?) numpy array constructors take a dtype argument. -Chris In [19]: b Out[19]: array([ 2., 4., 6., 8., 10.]) In [20]: b[1]=4.5 In [21]: b Out[21]: array([ 2. , 4.5, 6. , 8. , 10. ]) thanks et cheers, Chao -- *** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- *** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] subclassing ndarray subtleties??
On Tue, May 22, 2012 at 1:07 PM, Dan Goodman dg.gm...@thesamovar.net wrote: I think it would be useful to have an example of a completely 'correctly' subclassed ndarray that handles all of these issues that people could use as a template when they want to subclass ndarray. I think this is by definition impossible -- if you are subclassing it, you are changing its behavior is *some* way, which way will determine how you want it it behave under all the various conditions that an array may encounter. So there is no subclass that handles all these issues, nor is there any pre-defined definition for correct. My personal use for subclassing has been to plug in a new object into code that was currently using a regular old numpy array -- in that case, all it needed to handle were the use-cases it was already being used in -- so running my test code was all I needed. But if I were startting from scratch, I'd probably use the has a rather than the is a OO model. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue)
I just realized that the pull request doesn't do what I thought it did which is just add the flag to warn users who are writing to an array that is a view when it used to be a copy. It's more cautious and also copies the data for 1.7. Is this really a necessary step? I guess it depends on how many use-cases there are where people are relying on .diagonal() being a copy. Given that this is such an easy thing for people who encounter the warning to fix their code, it seems overly cautious to *also* make a copy (especially for a rare code-path like this --- although I admit that I don't have any reproducible data to support that assertion that it's a rare code-path). I think we have a mixed record of being cautious (not cautious enough in some changes), but this seems like swinging in the other direction of being overly cautious on a minor point. I wonder if I'm the only one who feels that way about this PR. This is not a major issue, so I am fine with the current strategy, but the drawback of being this cautious on this point is 1) it is not really reflective of other changes and 2) it does mean that someone who wants to fix their code for the future will end up with two copies for 1.7. -Travis On May 16, 2012, at 3:51 PM, Travis Oliphant wrote: This Pull Request looks like a good idea to me as well. -Travis On May 16, 2012, at 3:10 PM, Ralf Gommers wrote: On Wed, May 16, 2012 at 3:55 PM, Nathaniel Smith n...@pobox.com wrote: On Tue, May 15, 2012 at 2:49 PM, Frédéric Bastien no...@nouiz.org wrote: Hi, In fact, I would arg to never change the current behavior, but add the flag for people that want to use it. Why? 1) There is probably 10k script that use it that will need to be checked for correctness. There won't be easy to see crash or error that allow user to see it. My suggestion is that we follow the scheme, which I think gives ample opportunity for people to notice problems: 1.7: works like 1.6, except that a DeprecationWarning is produced if (and only if) someone writes to an array returned by np.diagonal (or friends). This gives a pleasant heads-up for those who pay attention to DeprecationWarnings. 1.8: return a view, but mark this view read-only. This causes crashes for anyone who ignored the DeprecationWarnings, guaranteeing that they'll notice the issue. 1.9: return a writeable view, transition complete. I've written a pull request implementing the first part of this; I hope everyone interested will take a look: https://github.com/numpy/numpy/pull/280 Thanks for doing that. Seems like a good way forward. When the PR gets merged, can you please also open a ticket for this with Milestone 1.8? Then we won't forget to make the required changes for that release. Ralf 2) This is a globally not significant speed up by this change. Due to 1), i think it is not work it. Why this is not a significant speed up? First, the user already create and use the original tensor. Suppose a matrix of size n x n. If it don't fit in the cache, creating it will cost n * n. But coping it will cost cst * n. The cst is the price of loading a full cache line. But if you return a view, you will pay this cst price later when you do the computation. But it all case, this is cheap compared to the cost of creating the matrix. Also, you will do work on the matrix and this work will be much more costly then the price of the copy. In the case the matrix fix in the cache, the price of the copy is even lower. So in conclusion, optimizing the diagonal won't give speed up in the global user script, but will break many of them. I agree that the speed difference is small. I'm more worried about the cost to users of having to remember odd inconsistencies like this, and to think about whether there actually is a speed difference or not, etc. (If we do add a copy=False option, then I guarantee many people will use it religiously just in case the speed difference is enough to matter! And that would suck for them.) Returning a view makes the API slightly nicer, cleaner, more consistent, more useful. (I believe the reason this was implemented in the first place was that providing a convenient way to *write* to the diagonal of an arbitrary array made it easier to implement numpy.eye for masked arrays.) And the whole point of numpy is to trade off a little speed in favor of having a simple, easy-to-work with high-level API :-). -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org