Re: [Numpy-discussion] Help speeding up element-wise operations for video processing
Hey Brendan 2008/9/17 Brendan Simons [EMAIL PROTECTED]: I would love a c-types code snippet. I'm not very handy in c. Since I gather numpy is row-major, I thought I up and down crops very quickly by moving the start and end pointers of the array. For cropping left and right, is there a fast c command for copy while skipping every nth hundred bytes? Not sure which way you decided to go, but here are some code snippets: http://mentat.za.net/hg/graycomatrix and http://mentat.za.net/source/connected_components.tar.bz2 Also take a look at http://scipy.org/Cookbook/Ctypes Good luck! Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Help speeding up element-wise operations forvideo processing
Sorry for not relating to the details of the problem, but, did you take a look pygpu? It intends to be able to enable image processing as video rate. Nadav -הודעה מקורית- מאת: [EMAIL PROTECTED] בשם St?fan van der Walt נשלח: ה 18-ספטמבר-08 10:25 אל: Discussion of Numerical Python נושא: Re: [Numpy-discussion] Help speeding up element-wise operations forvideo processing Hey Brendan 2008/9/17 Brendan Simons [EMAIL PROTECTED]: I would love a c-types code snippet. I'm not very handy in c. Since I gather numpy is row-major, I thought I up and down crops very quickly by moving the start and end pointers of the array. For cropping left and right, is there a fast c command for copy while skipping every nth hundred bytes? Not sure which way you decided to go, but here are some code snippets: http://mentat.za.net/hg/graycomatrix and http://mentat.za.net/source/connected_components.tar.bz2 Also take a look at http://scipy.org/Cookbook/Ctypes Good luck! St?fan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion winmail.dat___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
On Thu, Sep 18, 2008 at 1:29 AM, Robert Kern [EMAIL PROTECTED] wrote: On Wed, Sep 17, 2008 at 18:09, Ondrej Certik [EMAIL PROTECTED] wrote: On Wed, Sep 17, 2008 at 3:56 AM, Robert Kern [EMAIL PROTECTED] wrote: On Mon, Sep 15, 2008 at 11:13, Arnar Flatberg [EMAIL PROTECTED] wrote: That would make me an extremely happy user, I've been looking for this for years! I can't imagine I'm the only one who profiles some hundred lines of code and ends up with 90% of total time in the dot-function For the time being, you can grab it here: http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/line_profiler/ It requires Cython and a C compiler to build. I'm still debating myself about the desired workflow for using it, but for now, it only profiles functions which you have registered with it. I have made the profiler work as a decorator to make this easy. E.g., from line_profiler import LineProfiler profile = LineProfiler() @profile def my_slow_func(): ... profile.dump_stats('my_slow_func.lprof') This is kind of inconvenient, so I have a little script that handles everything except for the @profile itself. It started as a script to run cProfile nicely, so it actually does that by default. I took pystone.py from the Python source and added a couple of @profile decorators to demonstrate: [line_profiler]$ ./kernprof.py --help Usage: ./kernprof.py [-s setupfile] [-o output_file_path] scriptfile [arg] ... Options: -h, --helpshow this help message and exit -l, --line-by-lineUse the line-by-line profiler from the line_profiler module instead of cProfile. Implies --builtin. -b, --builtin Put 'profile' in the builtins. Use 'profile.enable()' and 'profile.disable()' in your code to turn it on and off, or '@profile' to decorate a single function, or 'with profile:' to profile a single section of code. -o OUTFILE, --outfile=OUTFILE Save stats to outfile -s SETUP, --setup=SETUP Code to execute before the code to profile [line_profiler]$ ./kernprof.py -l pystone.py Pystone(1.1) time for 5 passes = 11.34 This machine benchmarks at 4409.17 pystones/second Wrote profile results to pystone.py.lprof [line_profiler]$ ./view_line_prof.py pystone.py.lprof Timer unit: 1e-06 s File: pystone.py Function: Proc0 at line 79 Total time: 8.46516 s [...] This is what I am getting: $ ./kernprof.py -l pystone.py Wrote profile results to pystone.py.lprof $ ./view_line_prof.py pystone.py.lprof Timer unit: 1e-06 s $ So I think you meant: $ ./kernprof.py -l mystone.py 20628 Wrote profile results to mystone.py.lprof $ ./view_line_prof.py mystone.py.lprof Timer unit: 1e-06 s File: pystone.py Function: Proc0 at line 79 Total time: 13.0803 s [...] Now it works. No, I meant pystone.py. My script-finding code may have (incorrectly) found a different, uninstrumented pystone.py file somewhere else, though. Try with ./pystone.py. This is an excellent piece of software! Nice job. Just today I needed such a thing! How do you easily install it? python setup.py install should have installed the module. I haven't done anything with the scripts, yet. I usually do python setup.py install --home=~/lib and I have the PYTHONPATH + PATH setup in my .bashrc, but I then need to manually remove the stuff from my ~/lib if I want to uninstall, which sucks. So this time I just did python setup.py build and moved the .so file manually to the current dir. But there must be a better way. What is your workflow? For things I am developing on, I build them in-place, using the setuptools develop command to add the appropriate path to the easy-install.pth file. To remove, I would just edit that file. For thing I'm not developing on, I usually build and install an egg if at all possible. But then, I'm typically on a single-user box where I'm root, so I sometimes do nasty and unsanitary things like chown -R rkern:rkern /usr/local/. Anyway, so I used it on my code and here is what I got: File: hermes1d/mesh.py Function: integrate_function at line 119 Total time: 0.647412 s Line # Hits Time % Time Line Contents = 119 @profile 120 def integrate_function(self, f): 121 122 Integrate the function f on the element. 123 12496 1091 0.2 from numpy import array 12596 461070 71.2 from scipy.integrate import quadrature 12696 496 0.1 a, b = self.nodes[0].x, self.nodes[1].x 12796 418 0.1
[Numpy-discussion] Medians that ignore values
I have data from biological experiments that is represented as a list of about 5000 triples. I would like to convert this to a list of the median of each triple. I did some profiling and found that numpy was much about 12 times faster for this application than using regular Python lists and a list median implementation. I'll be performing quite a few mathematical operations on these values, so using numpy arrays seems sensible. The only problem is that my data has gaps in it - where an experiment failed, a triple will not have three values. Some will have 2, 1 or even no values. To keep the arrays regular so that they can be used by numpy, is there some dummy value I can use to fill these gaps that will be ignored by the median routine? I tried NaN for this, but as far as median is concerned, it counts as infinity: from numpy import * median(array([1,3,nan])) 3.0 median(array([1,nan,nan])) nan Is this the correct behavior for median with nan? Is there a fix for this or am I going to have to settle with using lists? Thanks, Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
I think you need to use masked arrays. Nadav -הודעה מקורית- מאת: [EMAIL PROTECTED] בשם Peter Saffrey נשלח: ה 18-ספטמבר-08 14:27 אל: numpy-discussion@scipy.org נושא: [Numpy-discussion] Medians that ignore values I have data from biological experiments that is represented as a list of about 5000 triples. I would like to convert this to a list of the median of each triple. I did some profiling and found that numpy was much about 12 times faster for this application than using regular Python lists and a list median implementation. I'll be performing quite a few mathematical operations on these values, so using numpy arrays seems sensible. The only problem is that my data has gaps in it - where an experiment failed, a triple will not have three values. Some will have 2, 1 or even no values. To keep the arrays regular so that they can be used by numpy, is there some dummy value I can use to fill these gaps that will be ignored by the median routine? I tried NaN for this, but as far as median is concerned, it counts as infinity: from numpy import * median(array([1,3,nan])) 3.0 median(array([1,nan,nan])) nan Is this the correct behavior for median with nan? Is there a fix for this or am I going to have to settle with using lists? Thanks, Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion winmail.dat___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
On Thu, Sep 18, 2008 at 1:01 PM, Robert Cimrman [EMAIL PROTECTED] wrote: Hi Robert, Robert Kern wrote: On Mon, Sep 15, 2008 at 11:13, Arnar Flatberg [EMAIL PROTECTED] wrote: That would make me an extremely happy user, I've been looking for this for years! I can't imagine I'm the only one who profiles some hundred lines of code and ends up with 90% of total time in the dot-function For the time being, you can grab it here: http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/line_profiler/ It requires Cython and a C compiler to build. I'm still debating myself about the desired workflow for using it, but for now, it only profiles functions which you have registered with it. I have made the profiler work as a decorator to make this easy. E.g., many thanks for this! I have wanted to try out the profiler but failed to build it (changeset 6 0de294aa75bf): $ python setup.py install --root=/home/share/software/ running install running build running build_py creating build creating build/lib.linux-i686-2.4 copying line_profiler.py - build/lib.linux-i686-2.4 running build_ext cythoning _line_profiler.pyx to _line_profiler.c building '_line_profiler' extension creating build/temp.linux-i686-2.4 i486-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC -I/usr/include/python2.4 -c -I/usr/include/python2.4 -c _line_profiler.c -o build/temp.linux-i686-2.4/_line_profiler.o _line_profiler.c:1614: error: 'T_LONGLONG' undeclared here (not in a function) error: command 'i486-pc-linux-gnu-gcc' failed with exit status 1 I have cython-0.9.8.1 and GCC 4.1.2, 32-bit machine. I am telling you all the time Robert to use Debian that it just works and you say, no no, gentoo is the best. :) Ondrej ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Nadav Horesh wrote: I think you need to use masked arrays. Nadav -הודעה מקורית- מאת: [EMAIL PROTECTED] בשם Peter Saffrey נשלח: ה 18-ספטמבר-08 14:27 אל: numpy-discussion@scipy.org נושא: [Numpy-discussion] Medians that ignore values I have data from biological experiments that is represented as a list of about 5000 triples. I would like to convert this to a list of the median of each triple. I did some profiling and found that numpy was much about 12 times faster for this application than using regular Python lists and a list median implementation. I'll be performing quite a few mathematical operations on these values, so using numpy arrays seems sensible. The only problem is that my data has gaps in it - where an experiment failed, a triple will not have three values. Some will have 2, 1 or even no values. To keep the arrays regular so that they can be used by numpy, is there some dummy value I can use to fill these gaps that will be ignored by the median routine? I tried NaN for this, but as far as median is concerned, it counts as infinity: from numpy import * median(array([1,3,nan])) 3.0 median(array([1,nan,nan])) nan Is this the correct behavior for median with nan? Is there a fix for this or am I going to have to settle with using lists? Thanks, Peter ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion Hi, The counting of infinity is correct due to the implementation of IEEE Standard for Binary Floating-Point for Arithmetic (IEEE 754). You might want to try isfinite() to first remove nan, +/- infinity before doing that. numpy.median(a[numpy.isfinite(a)]) Bruce ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
Ondrej Certik wrote: On Thu, Sep 18, 2008 at 1:01 PM, Robert Cimrman [EMAIL PROTECTED] wrote: It requires Cython and a C compiler to build. I'm still debating myself about the desired workflow for using it, but for now, it only profiles functions which you have registered with it. I have made the profiler work as a decorator to make this easy. E.g., many thanks for this! I have wanted to try out the profiler but failed to build it (changeset 6 0de294aa75bf): $ python setup.py install --root=/home/share/software/ running install running build running build_py creating build creating build/lib.linux-i686-2.4 copying line_profiler.py - build/lib.linux-i686-2.4 running build_ext cythoning _line_profiler.pyx to _line_profiler.c building '_line_profiler' extension creating build/temp.linux-i686-2.4 i486-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC -I/usr/include/python2.4 -c -I/usr/include/python2.4 -c _line_profiler.c -o build/temp.linux-i686-2.4/_line_profiler.o _line_profiler.c:1614: error: 'T_LONGLONG' undeclared here (not in a function) error: command 'i486-pc-linux-gnu-gcc' failed with exit status 1 I have cython-0.9.8.1 and GCC 4.1.2, 32-bit machine. I am telling you all the time Robert to use Debian that it just works and you say, no no, gentoo is the best. :) And what's wrong with that? :) Once you get over the learning curve, Gentoo works just fine. Must be Robert K.'s fault. :) Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
You might want to try isfinite() to first remove nan, +/- infinity before doing that. numpy.median(a[numpy.isfinite(a)]) We just had this discussion a month or two ago, I think even on this list, and continued it at the SciPy conference. The problem with numpy.median(a[numpy.isfinite(a)]) is that it breaks when you have a multi-dimensional array, such as an array of 5000x3 as in this case, and take median down an axis. The example above flattens the array and eliminates the possibility of taking the median down an axis in a single call, as the poster desires. Currently the only way you can handle NaNs is by using masked arrays. Create a mask by doing isfinite(a), then call the masked array median(). There's an example here: http://sd-2116.dedibox.fr/pydocweb/doc/numpy.ma/ Note that our competitor language IDL does have a /nan flag to its single median routine, making this common task much easier in that language than ours. --jh-- ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
So the timing raises a lot. For obvious reasons, that's the overhead of the profiler. But the problem is that then the timings just don't fit, e.g. if I sum the total time spent in subfunctions, it doesn't account for all the time printed on the respective line in the parent function. I don't know if there is any way to fix it, or even worth fixing. So I guess one should just use the profiling info to determine which line to fix. Do you think it's worthy to implement line profiling for all lines and then make sure that all timings match? What is your experience. The reason I want this is so that I can determine just by looking at the timing how much time is spent at the line and how much time is spent at the functions that are being called at the line. I think it is doable, what do you think? One would trace how much time was wasted in python_trace_callback and then just substract this time from all parent function's lines timings. Btw Robin, how is Matlab doing it? Ondrej ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
On Wed, Sep 17, 2008 at 10:33 PM, Robert Kern [EMAIL PROTECTED] wrote: On Wed, Sep 17, 2008 at 07:12, Arnar Flatberg [EMAIL PROTECTED] wrote: On Wed, Sep 17, 2008 at 3:56 AM, Robert Kern [EMAIL PROTECTED] wrote: It should be straightforward to make an ipython %magic, yes. There are some questions as to how that would work, though. Do you want the top-level code in the file to be profiled? I dont think that will be my primary use. Or do you just want functions defined inside to be profiled? That was my initial thought, yes Do you want to be able to specify functions defined outside of the code to be profiled? Sounds complicated, but that would be nice. I'll play around with ipythons %magic to see if I can get a workflow Im comfortable with. I was thinking something in the line of ipython's %prun for a quick looksy, and the possibility to easily write small scripts to test different inputs (small/large arrays etc.) to a specific function. Any recommendations are very welcome. I do not have much experience in profiling code. Arnar ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
On Thursday 18 September 2008 13:31:18 Peter Saffrey wrote: The version in the Ubuntu package repository. It says 1:1.0.4-6ubuntu3. So it's 1.0 ? It's fairly old, that would explain. if you don't give an axis parameter, you should get the median of the flattened array, therefore a scalar, not an array. Not for my version. Indeed. Looks like the default axis changed from 0 in 1.0 to None in the incoming 1.2. But that's a detail at this point. Anyway: you should use ma.median for masked arrays. Else, you're just keeping the NaNs where they were. That will be the problem. My version does not have median or mean methods for masked arrays, only the average() method. The method mean has always been around for masked arrays, so has the corresponding function. But I'm surprised, median has been in numpy.ma.extras for a while. Maybe not 1.0... According to this page: http://www.scipy.org/Download 1.1.0 is the latest release. You need to update your internet ;) 1.1.1 was released 6 weeks ago. Do I need to use an SVN build to get the ma.median functionality? No, you can install 1.1.1, that should work. Note that I just fixed a bug in median in SVN (it would fail when trying to get the median of a 2D array with axis=1), so you may want to check this one instead if you feel like it. You can still use 1.1.1 : as a quick workaround the forementioned bug, use ma.median(a.T, axis=0) instead of ma.median(a,axis=1) when working w/ 2D arrays. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] ready to tag 1.2.0
Hey, I would like to tag 1.2.0 from the 1.2.x branch. Are there any problems with this? In particular, are there any known problems that would require us having another release candidate? As soon as we get this release out we can start back-porting bugfixes from the trunk to the 1.2.x branch in preparation for a 1.2.1 release. Thanks, -- Jarrod Millman Computational Infrastructure for Research Labs 10 Giannini Hall, UC Berkeley phone: 510.643.4014 http://cirl.berkeley.edu/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
On Thu, Sep 18, 2008 at 06:01, Robert Cimrman [EMAIL PROTECTED] wrote: Hi Robert, Robert Kern wrote: On Mon, Sep 15, 2008 at 11:13, Arnar Flatberg [EMAIL PROTECTED] wrote: That would make me an extremely happy user, I've been looking for this for years! I can't imagine I'm the only one who profiles some hundred lines of code and ends up with 90% of total time in the dot-function For the time being, you can grab it here: http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/line_profiler/ It requires Cython and a C compiler to build. I'm still debating myself about the desired workflow for using it, but for now, it only profiles functions which you have registered with it. I have made the profiler work as a decorator to make this easy. E.g., many thanks for this! I have wanted to try out the profiler but failed to build it (changeset 6 0de294aa75bf): $ python setup.py install --root=/home/share/software/ running install running build running build_py creating build creating build/lib.linux-i686-2.4 copying line_profiler.py - build/lib.linux-i686-2.4 running build_ext cythoning _line_profiler.pyx to _line_profiler.c building '_line_profiler' extension creating build/temp.linux-i686-2.4 i486-pc-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -fPIC -I/usr/include/python2.4 -c -I/usr/include/python2.4 -c _line_profiler.c -o build/temp.linux-i686-2.4/_line_profiler.o _line_profiler.c:1614: error: 'T_LONGLONG' undeclared here (not in a function) error: command 'i486-pc-linux-gnu-gcc' failed with exit status 1 I have cython-0.9.8.1 and GCC 4.1.2, 32-bit machine. It uses the #define'd macro PY_LONG_LONG. Go through your Python headers to see what this gets expanded to. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
On Thu, Sep 18, 2008 at 10:03, Ondrej Certik [EMAIL PROTECTED] wrote: Anyway, back to work: Robert K., I noticed that if I profile some function, I get results like this for example: 40 307246952 6.6 [x,w] = p_roots(n) 41 307224192 3.4 x = real(x) 42 307234470 4.8 ainf, binf = map(isinf,(a,b)) 43 3072 6565 0.9 if ainf or binf: 44 raise ValueError, Gaussian quadrature is only available for \ 45finite limits. 46 3072 5093 0.7 if not reference: 47 x = (b-a)*(x+1)/2.0 + a 48 3072 594190 83.5 return (b-a)/2.0*sum(w*func(x,*args),0) --- Then if I turn profiling of the func() method, I get this: 40 307246999 4.6 [x,w] = p_roots(n) 41 307224313 2.4 x = real(x) 42 307234327 3.4 ainf, binf = map(isinf,(a,b)) 43 3072 6190 0.6 if ainf or binf: 44 raise ValueError, Gaussian quadrature is only available for \ 45finite limits. 46 3072 4918 0.5 if not reference: 47 x = (b-a)*(x+1)/2.0 + a 48 3072 906876 88.6 return (b-a)/2.0*sum(w*func(x,*args),0) --- So the timing raises a lot. For obvious reasons, that's the overhead of the profiler. But the problem is that then the timings just don't fit, e.g. if I sum the total time spent in subfunctions, it doesn't account for all the time printed on the respective line in the parent function. I don't know if there is any way to fix it, or even worth fixing. So I guess one should just use the profiling info to determine which line to fix. So here's what going on: I'm being clever (and possibly too clever). When tracing is enabled, Python will call my tracing function just before each new line gets executed. If tracing isn't enabled for this function, I return. Otherwise, I grab the current time. Then, I look for the last line and time I recorded for this function. I look up the accumulator for the (code, old line) pair and record the time delta. Then I grab the current time *again*, and store the current line and this new time for the next go 'round. This way, I exclude most of the time spent inside the profiler itself and just record the time being spent in the code. The total time reported is just a sum of the recorded times, not the sum of wall-clock times spent in the function. Now, this does break down in your use case where you are profiling both the caller and callee and trying to determine how much of a line's time is being spent just by calling the function. I could record wall-clock times between the start and end of a function call, but I think that's fragile. For example, suppose you are profiling A() and B() but not C(), and both A() and C() call B(). Using the wall-clock time spent in B() will tell you that you spent more time in B() than the appropriate line (or lines!) that called it in A(). I think the most robust way to figure this out is to rewrite your code to pull out such calls onto their own lines. This is like breaking up your functions into tiny little one-liners in order to use cProfile only it doesn't suck *nearly* as much. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] A bug in loadtxt and how to convert a string array (hex data) to decimal?
Hi, All, I have found a bug in the loadtxt function. Here is the example. The file name is test.txt and contains: Thist is test 3FF 3fE 3Ef 3e8 3Df 3d9 3cF 3c7 In the Python 2.5.2, I type: test=loadtxt('test.txt',comments='',dtype='string',converters={0:lambda s:int(s,16)}) test will contain array([['102', '3fE'], ['100', '3e8'], ['991', '3d9'], ['975', '3c7']], dtype='|S3') The first two values 102 and 100 are wrong. The reason I am doing this because that I have to process a large amount of data from a file. The data is in hex format. This is only way I found that I can efficiently convert the hex to decimal. Anyone has a good solution? Thanks Frank _ Get more out of the Web. Learn 10 hidden secrets of Windows Live. http://windowslive.com/connect/post/jamiethomson.spaces.live.com-Blog-cns!550F681DAD532637!5295.entry?ocid=TXT_TAGLM_WL_domore_092008___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A bug in loadtxt and how to convert a string array (hex data) to decimal?
frank wang wrote: Hi, All, I have found a bug in the loadtxt function. Here is the example. The file name is test.txt and contains: Thist is test 3FF 3fE 3Ef 3e8 3Df 3d9 3cF 3c7 In the Python 2.5.2, I type: test=loadtxt('test.txt',comments='',dtype='string',converters={0:lambda s:int(s,16)}) test will contain array([['102', '3fE'], ['100', '3e8'], ['991', '3d9'], ['975', '3c7']], dtype='|S3') It's because of how numpy handles strings arrays (which I admit I don't understand very well.) Basically, it's converting the numbers properly, but truncating them to 3 characters. Try this, which just forces it to expand to strings 4 characters wide: test=loadtxt('test.txt',comments='',dtype='|S4',converters={0:lambda s:int(s,16)}) HTH, Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Generating random samples without repeats
I want to generate a series of random samples, to do simulations based on them. Essentially, I want to be able to produce a SAMPLESIZE * N matrix, where each row of N values consists of either 1. Integers between 1 and M (simulating M rolls of an N-sided die), or 2. A sample of N numbers between 1 and M without repeats (simulating deals of N cards from an M-card deck). Example (1) is easy, numpy.random.random_integers(1, M, (SAMPLESIZE, N)) But I can't find an obvious equivalent for (2). Am I missing something glaringly obvious? I'm using numpy - is there maybe something in scipy I should be looking at? Also, in evaluating samples, I'm likely to want to calculate combinatorial functions, such as the list of all pairs of items from a sample (imagine looking at how many pairs add up to 15 in a cribbage hand). Clearly, I can write a normal Python function which does this for one row, and use apply_along_axis - but that's *slow*. I'm looking for a function that, given an N*M array and a sample size S, gives a C(N,S)*S*M array of all the combinations, which runs at array-processing speeds (preferably without having to code it in C myself!!) Is there anywhere with this type of function available? This type of combinatorial simulation seems to me to be a fairly good fit for numpy's capabilities, and yet I can't seem to find things that seem relevant. Is it simly not something that people use numpy for? Or am I looking in the wrong places in the documentation? Thanks for any help, Paul. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Generating random samples without repeats
On Thu, Sep 18, 2008 at 16:55, Paul Moore [EMAIL PROTECTED] wrote: I want to generate a series of random samples, to do simulations based on them. Essentially, I want to be able to produce a SAMPLESIZE * N matrix, where each row of N values consists of either 1. Integers between 1 and M (simulating M rolls of an N-sided die), or 2. A sample of N numbers between 1 and M without repeats (simulating deals of N cards from an M-card deck). Example (1) is easy, numpy.random.random_integers(1, M, (SAMPLESIZE, N)) But I can't find an obvious equivalent for (2). Am I missing something glaringly obvious? I'm using numpy - is there maybe something in scipy I should be looking at? numpy.array([(numpy.random.permutation(M) + 1)[:N] for i in range(SAMPLESIZE)]) -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A bug in loadtxt and how to convert a string array (hex data) to decimal?
Hi, Ryan, Thank you very much. It solves my problem. I have struggled with this for long time. Frank Date: Thu, 18 Sep 2008 16:39:47 -0500 From: [EMAIL PROTECTED] To: numpy-discussion@scipy.org Subject: Re: [Numpy-discussion] A bug in loadtxt and how to convert a string array (hex data) to decimal? frank wang wrote: Hi, All,I have found a bug in the loadtxt function. Here is the example. The file name is test.txt and contains: Thist is test 3FF 3fE 3Ef 3e8 3Df 3d9 3cF 3c7In the Python 2.5.2, I type: test=loadtxt('test.txt',comments='',dtype='string',converters={0:lambda s:int(s,16)})test will contain array([['102', '3fE'], ['100', '3e8'], ['991', '3d9'], ['975', '3c7']], dtype='|S3')It's because of how numpy handles strings arrays (which I admit I don't understand very well.) Basically, it's converting the numbers properly, but truncating them to 3 characters. Try this, which just forces it to expand to strings 4 characters wide: test=loadtxt('test.txt',comments='',dtype='|S4',converters={0:lambda s:int(s,16)}) HTH, Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion _ See how Windows connects the people, information, and fun that are part of your life. http://clk.atdmt.com/MRT/go/msnnkwxp1020093175mrt/direct/01/___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] profiling line by line
On Thu, Sep 18, 2008 at 02:54:13PM -0500, Robert Kern wrote: So here's what going on: I'm being clever (and possibly too clever). Oh no. Robert K. is too clever. We knew that, right ;). Gaël ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] building numpy locally but get error: undefined symbol: zgesdd_
Francis wrote: Thank you for your effort. I guess garnumpy reflects the idea in this Pylab discussion: http://www.scipy.org/PyLab Again I get errors in libblas/lapack related to gfortran (local variable problems). I replaced the libblas.a and the liblaplack.a by the ones of sage. And started make install again. It seems to work until it tries to configure/install umfpack. If you install blas/lapack from sage, it kind of defeats the whole purpose of garnumpy. The goal is to have a unified set of options to build. It is likely that sage uses different options than the ones from garnumpy. If you use garnumpy, you should use it for everything: it is an all for nothing, intended to replace broken distributions/workstations where you don't have admin rights. When I updated a bit garnumpy yesterday, I tested it on CENTOS 5, which is the distribution you are using, right ? xerbla.f:(.text+0xd7): undefined reference to `_g95_stop_blank' collect2: ld returned 1 exit status make[4]: *** [umfpack_di_demo] Error 1 Yes, that's expected: you are using blas from sage, this cannot work. Perhaps it possible to skip the installation of the umfpack since I probably won't need it or it requires the other libblas.a. Just wildly guessing here. garnumpy by default does not install umfpack (scipy only requires blas/lapack, and by default, garnumpy only install what is needed; you can configure it to build umfpack, atlas, whatever, but by default, only the minimal stuff), so I am not sure why you have umfpack errors. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] building numpy locally but get error: undefined symbol: zgesdd_
David Cournapeau wrote: If you install blas/lapack from sage, it kind of defeats the whole purpose of garnumpy. The goal is to have a unified set of options to build. It is likely that sage uses different options than the ones from garnumpy. If you use garnumpy, you should use it for everything: it is an all for nothing should read all or nothing. David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Suggestion for recarray.view
All, I'd like to submit the following suggestion for recarray.view, so that it could accept two keywords like standard ndarrays do. As a change in records.py can potentially affect a lot of people (probably more than a change in numpy.ma), I'm not confident enough to commit it. Consider that as an attempt of having it peer-reviewed before inclusion. Cheers P. #--- def view(self, dtype=None, type=None): if dtype is None: return ndarray.view(self, type) elif type is None: try: if issubclass(dtype, ndarray): return ndarray.view(self, dtype) except TypeError: pass dtype = sb.dtype(dtype) if dtype.fields is None: return self.__array__().view(dtype) return ndarray.view(self, dtype) else: return ndarray.view(self, dtype, type) #--- ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
2008/9/18 David Cournapeau [EMAIL PROTECTED]: Peter Saffrey wrote: Is this the correct behavior for median with nan? That's the expected behavior, at least :) (this is also the expected behavior of most math packages I know, including matlab and R, so this should not be too surprising if you have used those). I don't think I agree: In [4]: np.median([1,3,nan]) Out[4]: 3.0 In [5]: np.median([1,nan,3]) Out[5]: nan In [6]: np.median([nan,1,3]) Out[6]: 1.0 I think the expected behaviour would be for all of these to return nan. Anne ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Suggestion for recarray.view
Pierre GM wrote: All, I'd like to submit the following suggestion for recarray.view, so that it could accept two keywords like standard ndarrays do. As a change in records.py can potentially affect a lot of people (probably more than a change in numpy.ma), I'm not confident enough to commit it. Consider that as an attempt of having it peer-reviewed before inclusion. Cheers P. #--- def view(self, dtype=None, type=None): if dtype is None: return ndarray.view(self, type) elif type is None: try: if issubclass(dtype, ndarray): return ndarray.view(self, dtype) except TypeError: pass dtype = sb.dtype(dtype) if dtype.fields is None: return self.__array__().view(dtype) return ndarray.view(self, dtype) else: return ndarray.view(self, dtype, type) #--- This looks pretty good to me. +1 for adding it. -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] PyUnicodeUCS2 issue with numpy revision 5833
hello, I just updated my svn repository, but I am now unable anymore to import numpy : In [1]: import numpy as np --- ImportError Traceback (most recent call last) /home/cohen/ipython console in module() /usr/lib/python2.5/site-packages/numpy/__init__.py in module() 123 return loader(*packages, **options) 124 -- 125 import add_newdocs 126 __all__ = ['add_newdocs'] 127 /usr/lib/python2.5/site-packages/numpy/add_newdocs.py in module() 7 # core/fromnumeric.py, core/defmatrix.py up-to-date. 8 9 from lib import add_newdoc 10 11 add_newdoc('numpy.core', 'dtype', /usr/lib/python2.5/site-packages/numpy/lib/__init__.py in module() 2 from numpy.version import version as __version__ 3 4 from type_check import * 5 from index_tricks import * 6 from function_base import * /usr/lib/python2.5/site-packages/numpy/lib/type_check.py in module() 6'common_type'] 7 8 import numpy.core.numeric as _nx 9 from numpy.core.numeric import asarray, asanyarray, array, isnan, \ 10 obj2sctype, zeros /usr/lib/python2.5/site-packages/numpy/core/__init__.py in module() 3 from numpy.version import version as __version__ 4 5 import multiarray 6 import umath 7 import _internal # for freeze programs ImportError: /usr/lib/python2.5/site-packages/numpy/core/multiarray.so: undefined symbol: PyUnicodeUCS2_FromUnicode I did not change anything to my python setup, and I did remove the build directory before recompiling. Anyone has an idea? thanks in advance, Johann -- View this message in context: http://www.nabble.com/PyUnicodeUCS2-issue-with-numpy-revision-5833-tp19564937p19564937.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Medians that ignore values
Anne Archibald wrote: That was in amax/amin. Pretty much every other function that does comparisons needs to be fixed to work with nans. In some cases it's not even clear how: where should a sort put the nans in an array? The problem is more on how the functions use sort than sort itself in the case of median. There can't be a 'good' way to put nan in soft, for example, since nans cannot be ordered. I don't know about the best strategy: either we fix every function using comparison, handling nan as a special case as you mentioned, or there may be a more clever thing to do to avoid special casing everywhere. I don't have a clear idea of how many functions rely on ordering in numpy. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion