Re: [Numpy-discussion] Improving Python+MPI import performance
On 01/14/2012 12:28 AM, Sturla Molden wrote: > Den 13.01.2012 22:42, skrev Sturla Molden: >> Den 13.01.2012 22:24, skrev Robert Kern: >>> Do these systems have a ramdisk capability? >> I assume you have seen this as well :) >> >> http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf >> > > This paper also repeats a common mistake about the GIL: > > "A future challenge is the increasing number of CPU cores per node, > which is normally addressed by hybrid thread and message passing based > parallelization. Whereas message passing can be used transparently by > both on Python and C level, the global interpreter lock in CPython > limits the thread based parallelization to the C-extensions only. We are > currently investigating hybrid OpenMP/MPI implementation with the hope > that limiting threading to only C-extension provides enough performance." > > This is NOT true. > > Python threads are native OS threads. They can be used for parallel > computing on multi-core CPUs. The only requirement is that the Python > code calls a C extension that releases the GIL. We can use threads in C > or Python code: OpenMP and threading.Thread perform equally well, but if > we use threading.Thread the GIL must be released for parallel execution. > OpenMP is typically better for fine-grained parallelism in C code and > threading.Thread is better for course-grained parallelism in Python > code. The latter is also where mpi4py and multiprocessing can be used. I don't see how you contradict their statement. The only code that can run without the GIL is in C-extensions (even if it is written in, say, Cython). Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Python+MPI import performance
Den 13.01.2012 22:42, skrev Sturla Molden: > Den 13.01.2012 22:24, skrev Robert Kern: >> Do these systems have a ramdisk capability? > I assume you have seen this as well :) > > http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf > This paper also repeats a common mistake about the GIL: "A future challenge is the increasing number of CPU cores per node, which is normally addressed by hybrid thread and message passing based parallelization. Whereas message passing can be used transparently by both on Python and C level, the global interpreter lock in CPython limits the thread based parallelization to the C-extensions only. We are currently investigating hybrid OpenMP/MPI implementation with the hope that limiting threading to only C-extension provides enough performance." This is NOT true. Python threads are native OS threads. They can be used for parallel computing on multi-core CPUs. The only requirement is that the Python code calls a C extension that releases the GIL. We can use threads in C or Python code: OpenMP and threading.Thread perform equally well, but if we use threading.Thread the GIL must be released for parallel execution. OpenMP is typically better for fine-grained parallelism in C code and threading.Thread is better for course-grained parallelism in Python code. The latter is also where mpi4py and multiprocessing can be used. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [JOB] Extracting subset of dataset using latitude and longitude
Hi, I don't know if numpy has ready tool for this. I also have this use in my study. So I write a simple code for my personal use. It might no be great. I hope others can also respond as this is very basic function in earth data analysis. 3 import numpy as np lat=np.arange(89.75,-90,-0.5) lon=np.arange(-179.75,180,0.5) lon0,lat0=np.meshgrid(lon,lat) #crate the grid from demonstration def Get_GridValue(data,(vlat1,vlat2),(vlon1,vlon2)): index_lat=np.nonzero((lat[:]>=vlat1)&(lat[:]<=vlat2))[0] index_lon=np.nonzero((lon[:]>=vlon1)&(lon[:]<=vlon2))[0] target=data[...,index_lat[0]:index_lat[-1]+1,index_lon[0]:index_lon[-1]+1] return target Get_GridValue(lat0,(40,45),(-30,-25)) Get_GridValue(lon0,(40,45),(-30,-25)) Chao 2012/1/13 Jeremy Lounds > Hello, > > I am looking for some help extracting a subset of data from a large > dataset. The data is being read from a wgrib2 (World Meterological > Organization standard gridded data) using the pygrib library. > > The data values, latitudes and longitudes are in separate lists (arrays?), > and I would like a regional subset. > > The budget is not very large, but I am hoping that this is pretty simple > job. I am just way too green at Python / numpy to know how to proceed, or > even what to search for on Google. > > If interested, please e-mail jlou...@dynamiteinc.com > > Thank you! > > Jeremy Lounds > DynamiteInc.com > 1-877-762-7723, ext 711 > Fax: 877-202-3014 > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Python+MPI import performance
On Fri, Jan 13, 2012 at 21:42, Sturla Molden wrote: > Den 13.01.2012 22:24, skrev Robert Kern: >> Do these systems have a ramdisk capability? > > I assume you have seen this as well :) > > http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf I hadn't, actually! Good find! Actually, this same problem came up at the last SciPy conference from several people (Blue Genes are more common than I expected!), and the ramdisk was just my first idea. I'm glad people have evaluated it. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Python+MPI import performance
On 1/13/12 1:58 PM, Dag Sverre Seljebotn wrote: > >It's actually not too difficult to do something like > >LD_PRELOAD=myhack.so python something.py > >and have myhack.so intercept the filesystem calls Python makes (to libc) >and do whatever it wants. That's a solution that doesn't interfer with >how Python does its imports at all, it simply changes how Python >perceives the world around it ("emulation", though much, much lighter). > >It does require some low-level C code, but there are several examples on >the net. I know Ondrej Certik just implemented something similar. One of my colleagues suggested the LD_PRELOAD trick. I asked around here at LLNL, and I seem to recall hearing that the LD_PRELOAD trick didn't work on BlueGene/P, which is where the import bottleneck is the worst. That might have been incorrect though, since LD_PRELOAD is mentioned on Argonne's BG/P wiki. I'll have to look into this some more. -Asher ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Python+MPI import performance
On 01/13/2012 10:20 PM, Langton, Asher wrote: > On 1/13/12 12:38 PM, Sturla Molden wrote: >> Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn: >>> Another idea: Given your diagnostics, wouldn't dumping the output of >>> "find" of every path in sys.path to a single text file work well? >> >> It probably would, and would also be less prone to synchronization >> problems than using an MPI broadcast. Another possibility would be to >> use a bsddb (or sqlite?) file as a persistent dict for caching the >> output of imp.find_module. > > We tested something along those lines. Tim Kadich, a summer student at > LLNL, wrote a module that went through the path and built up a dict of > module->location mappings for a subset of module types. My recollection is > that it worked well, and as you note, it didn't have the synchronization > issues that MPI_Import has. We didn't fully implement it, since to handle > complicated packages correctly, it looked like we'd either have to > re-implement a lot of the internal Python import code or modify the > interpreter itself. I don't think that MPI_Import is ultimately the > "right" solution, but it shows how easily we can reap significant gains. > Two better approaches that come to mind are: It's actually not too difficult to do something like LD_PRELOAD=myhack.so python something.py and have myhack.so intercept the filesystem calls Python makes (to libc) and do whatever it wants. That's a solution that doesn't interfer with how Python does its imports at all, it simply changes how Python perceives the world around it ("emulation", though much, much lighter). It does require some low-level C code, but there are several examples on the net. I know Ondrej Certik just implemented something similar. Note, I'm just brainstorming here and recording possible (and perhaps impossible) ideas in this thread -- the solution you have found is indeed a great step forward! Dag Sverre > > 1) Fixing this bottleneck at the interpreter level (pre-computing and > caching the locations) > > 2) More generally, dealing with this as well as other library-loading > issues at the system level, perhaps by putting a small disk near a node or > small collection of nodes, along with a command to push (broadcast) some > portions of the filesystem to these (more-)local disks. Basically, the > idea would be to let the user specify those directories or objects that > will be accessed by most of the processes and treated as read-only so that > those objects can be cached near the node. > > -Asher > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] [JOB] Extracting subset of dataset using latitude and longitude
Hello, I am looking for some help extracting a subset of data from a large dataset. The data is being read from a wgrib2 (World Meterological Organization standard gridded data) using the pygrib library. The data values, latitudes and longitudes are in separate lists (arrays?), and I would like a regional subset. The budget is not very large, but I am hoping that this is pretty simple job. I am just way too green at Python / numpy to know how to proceed, or even what to search for on Google. If interested, please e-mail jlou...@dynamiteinc.com Thank you! Jeremy Lounds DynamiteInc.com 1-877-762-7723, ext 711 Fax: 877-202-3014 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Python+MPI import performance
On 1/13/12 1:24 PM, Robert Kern wrote: >On Fri, Jan 13, 2012 at 21:20, Langton, Asher wrote: > >> 2) More generally, dealing with this as well as other library-loading >> issues at the system level, perhaps by putting a small disk near a node >>or >> small collection of nodes, along with a command to push (broadcast) some >> portions of the filesystem to these (more-)local disks. Basically, the >> idea would be to let the user specify those directories or objects that >> will be accessed by most of the processes and treated as read-only so >>that >> those objects can be cached near the node. > >Do these systems have a ramdisk capability? That was another thing we looked at (but didn't implement): broadcasting the modules to each node and putting them in a ramdisk. The drawback (for us) is that we're already struggling with the amount of available memory per core, and according to the vendors, the situation will only get worse on future systems. The ramdisk approach might work well when there are lots of small objects that will be accessed. On 1/13/12 1:42 PM, Sturla Molden wrote: >Den 13.01.2012 22:24, skrev Robert Kern: >>Do these systems have a ramdisk capability? > >I assume you have seen this as well :) > >http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final >.pdf I hadn't. Thanks! -Asher ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Python+MPI import performance
It is a straightforward thing to implement a "registry mechanism" for Python that by-passes imp.find_module (i.e. using sys.meta_path). You could imagine creating the registry file for a package or distribution (much like Dag described) and push that to every node during distribution. The registry file would have the map between package_name : file_location which would avoid all the failed open calls. You would need to keep the registry updated as Dag describes, but this seems like a fairly simple approach that should help. -Travis On Jan 13, 2012, at 2:38 PM, Sturla Molden wrote: > Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn: >> Another idea: Given your diagnostics, wouldn't dumping the output of >> "find" of every path in sys.path to a single text file work well? > > It probably would, and would also be less prone to synchronization > problems than using an MPI broadcast. Another possibility would be to > use a bsddb (or sqlite?) file as a persistent dict for caching the > output of imp.find_module. > > Sturla > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Python+MPI import performance
Den 13.01.2012 22:24, skrev Robert Kern: > Do these systems have a ramdisk capability? I assume you have seen this as well :) http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Python+MPI import performance
On Fri, Jan 13, 2012 at 21:20, Langton, Asher wrote: > 2) More generally, dealing with this as well as other library-loading > issues at the system level, perhaps by putting a small disk near a node or > small collection of nodes, along with a command to push (broadcast) some > portions of the filesystem to these (more-)local disks. Basically, the > idea would be to let the user specify those directories or objects that > will be accessed by most of the processes and treated as read-only so that > those objects can be cached near the node. Do these systems have a ramdisk capability? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Python+MPI import performance
On 1/13/12 12:38 PM, Sturla Molden wrote: >Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn: >> Another idea: Given your diagnostics, wouldn't dumping the output of >> "find" of every path in sys.path to a single text file work well? > >It probably would, and would also be less prone to synchronization >problems than using an MPI broadcast. Another possibility would be to >use a bsddb (or sqlite?) file as a persistent dict for caching the >output of imp.find_module. We tested something along those lines. Tim Kadich, a summer student at LLNL, wrote a module that went through the path and built up a dict of module->location mappings for a subset of module types. My recollection is that it worked well, and as you note, it didn't have the synchronization issues that MPI_Import has. We didn't fully implement it, since to handle complicated packages correctly, it looked like we'd either have to re-implement a lot of the internal Python import code or modify the interpreter itself. I don't think that MPI_Import is ultimately the "right" solution, but it shows how easily we can reap significant gains. Two better approaches that come to mind are: 1) Fixing this bottleneck at the interpreter level (pre-computing and caching the locations) 2) More generally, dealing with this as well as other library-loading issues at the system level, perhaps by putting a small disk near a node or small collection of nodes, along with a command to push (broadcast) some portions of the filesystem to these (more-)local disks. Basically, the idea would be to let the user specify those directories or objects that will be accessed by most of the processes and treated as read-only so that those objects can be cached near the node. -Asher ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Python+MPI import performance
Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn: > Another idea: Given your diagnostics, wouldn't dumping the output of > "find" of every path in sys.path to a single text file work well? It probably would, and would also be less prone to synchronization problems than using an MPI broadcast. Another possibility would be to use a bsddb (or sqlite?) file as a persistent dict for caching the output of imp.find_module. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Python+MPI import performance
On 01/13/2012 09:19 PM, Dag Sverre Seljebotn wrote: > On 01/13/2012 02:13 AM, Asher Langton wrote: >> Hi all, >> >> (I originally posted this to the BayPIGgies list, where Fernando Perez >> suggested I send it to the NumPy list as well. My apologies if you're >> receiving this email twice.) >> >> I work on a Python/C++ scientific code that runs as a number of >> independent Python processes communicating via MPI. Unfortunately, as >> some of you may have experienced, module importing does not scale well >> in Python/MPI applications. For 32k processes on BlueGene/P, importing >> 100 trivial C-extension modules takes 5.5 hours, compared to 35 >> minutes for all other interpreter loading and initialization. We >> developed a simple pure-Python module (based on knee.py, a >> hierarchical import example) that cuts the import time from 5.5 hours >> to 6 minutes. >> >> The code is available here: >> >> https://github.com/langton/MPI_Import >> >> Usage, implementation details, and limitations are described in a >> docstring at the beginning of the file (just after the mandatory >> legalese). >> >> I've talked with a few people who've faced the same problem and heard >> about a variety of approaches, which range from putting all necessary >> files in one directory to hacking the interpreter itself so it >> distributes the module-loading over MPI. Last summer, I had a student >> intern try a few of these approaches. It turned out that the problem >> wasn't so much the simultaneous module loads, but rather the huge >> number of failed open() calls (ENOENT) as the interpreter tries to >> find the module files. In the MPI_Import module, we have rank 0 >> perform the module lookups and then broadcast the locations to the >> rest of the processes. For our real-world scientific applications >> written in Python and C++, this has meant that we can start a problem >> and actually make computational progress before the batch allocation >> ends. > > This is great news! I've forwarded to the mpi4py mailing list which > despairs over this regularly. > > Another idea: Given your diagnostics, wouldn't dumping the output of > "find" of every path in sys.path to a single text file work well? Then > each node download that file once and consult it when looking for > modules, instead of network file metadata. > > (In fact I think "texhash" does the same for LaTeX?) > > The disadvantage is that one would need to run "update-python-paths" > every time a package is installed to update the text file. But I'm not > sure if that that disadvantage is larger than remembering to avoid > diverging import paths between nodes; hopefully one could put a reminder > to run update-python-paths in the ImportError string. I meant "diverging code paths during imports between nodes".. Dag > > >> If you try out the code, I'd appreciate any feedback you have: >> performance results, bugfixes/feature-additions, or alternate >> approaches to solving this problem. Thanks! > > I didn't try it myself, but forwarding this from the mpi4py mailing list: > > """ > I'm testing it now and actually > running into some funny errors with unittest on Python 2.7 causing > infinite recursion. If anyone is able to get this going, and could > report successes back to the group, that would be very helpful. > """ > > Dag Sverre > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Python+MPI import performance
On 01/13/2012 02:13 AM, Asher Langton wrote: > Hi all, > > (I originally posted this to the BayPIGgies list, where Fernando Perez > suggested I send it to the NumPy list as well. My apologies if you're > receiving this email twice.) > > I work on a Python/C++ scientific code that runs as a number of > independent Python processes communicating via MPI. Unfortunately, as > some of you may have experienced, module importing does not scale well > in Python/MPI applications. For 32k processes on BlueGene/P, importing > 100 trivial C-extension modules takes 5.5 hours, compared to 35 > minutes for all other interpreter loading and initialization. We > developed a simple pure-Python module (based on knee.py, a > hierarchical import example) that cuts the import time from 5.5 hours > to 6 minutes. > > The code is available here: > > https://github.com/langton/MPI_Import > > Usage, implementation details, and limitations are described in a > docstring at the beginning of the file (just after the mandatory > legalese). > > I've talked with a few people who've faced the same problem and heard > about a variety of approaches, which range from putting all necessary > files in one directory to hacking the interpreter itself so it > distributes the module-loading over MPI. Last summer, I had a student > intern try a few of these approaches. It turned out that the problem > wasn't so much the simultaneous module loads, but rather the huge > number of failed open() calls (ENOENT) as the interpreter tries to > find the module files. In the MPI_Import module, we have rank 0 > perform the module lookups and then broadcast the locations to the > rest of the processes. For our real-world scientific applications > written in Python and C++, this has meant that we can start a problem > and actually make computational progress before the batch allocation > ends. This is great news! I've forwarded to the mpi4py mailing list which despairs over this regularly. Another idea: Given your diagnostics, wouldn't dumping the output of "find" of every path in sys.path to a single text file work well? Then each node download that file once and consult it when looking for modules, instead of network file metadata. (In fact I think "texhash" does the same for LaTeX?) The disadvantage is that one would need to run "update-python-paths" every time a package is installed to update the text file. But I'm not sure if that that disadvantage is larger than remembering to avoid diverging import paths between nodes; hopefully one could put a reminder to run update-python-paths in the ImportError string. > If you try out the code, I'd appreciate any feedback you have: > performance results, bugfixes/feature-additions, or alternate > approaches to solving this problem. Thanks! I didn't try it myself, but forwarding this from the mpi4py mailing list: """ I'm testing it now and actually running into some funny errors with unittest on Python 2.7 causing infinite recursion. If anyone is able to get this going, and could report successes back to the group, that would be very helpful. """ Dag Sverre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Python+MPI import performance
On Fri, Jan 13, 2012 at 19:41, Sturla Molden wrote: > Den 13.01.2012 02:13, skrev Asher Langton: >> intern try a few of these approaches. It turned out that the problem >> wasn't so much the simultaneous module loads, but rather the huge >> number of failed open() calls (ENOENT) as the interpreter tries to >> find the module files. > > It sounds like there is a scalability problem with imp.find_module. I'd > report > this on python-dev or python-ideas. It's well-known. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Improving Python+MPI import performance
Den 13.01.2012 02:13, skrev Asher Langton: > intern try a few of these approaches. It turned out that the problem > wasn't so much the simultaneous module loads, but rather the huge > number of failed open() calls (ENOENT) as the interpreter tries to > find the module files. It sounds like there is a scalability problem with imp.find_module. I'd report this on python-dev or python-ideas. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Python for Scientists - courses in Germany and US
By the way, Enthought is also offering Python training and we have just updated out training calendar for this year: http://www.enthought.com/training/enthought_training_calendar.php We are offering about 20 open Python classes in the US and Europe this year. - Ilan On Fri, Jan 13, 2012 at 4:32 AM, Mike Müller wrote: > Learn NumPy and Much More > = > > Scientists like Python. If you would like to learn more about > important libraries for scientific applications, you might be > interested in these courses. > > The course in Germany covers: > > - Overview of libraries > - NumPy > - Data storage with text files, Excel, netCDF and HDF5 > - matplotlib > - Object oriented programming for scientists > - Problem solving session > > The course in the USA covers all this plus: > > - Extending Python in other languages > - Version control > - Unit testing > > > More details below. > > If you have any questions about the courses, please contact me. > > Mike > > > Python for Scientists and Engineers (Germany) > - > > A three-day course covering all the basic tools scientists and engineers need. > This course requires basic Python knowledge. > > Date: 19.01.-21.01.2012 > Location: Leipzig, Germany > Trainer: Mike Müller > Course Language: English > Link: http://www.python-academy.com/courses/python_course_scientists.html > > > Python for Scientists and Engineers (USA) > - > > This is an extend version of our well-received course for > scientists and engineers. Five days of intensive training > will give you a solid basis for using Python for scientific > an technical problems. > > The course is hosted by David Beazley (http://www.dabeaz.com). > > Date: 27.02.-02.03.2012 > Location: Chicago, IL, USA > Trainer: Mike Müller > Course Language: English > Link: http://www.dabeaz.com/chicago/science.html > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Python for Scientists - courses in Germany and US
Learn NumPy and Much More = Scientists like Python. If you would like to learn more about important libraries for scientific applications, you might be interested in these courses. The course in Germany covers: - Overview of libraries - NumPy - Data storage with text files, Excel, netCDF and HDF5 - matplotlib - Object oriented programming for scientists - Problem solving session The course in the USA covers all this plus: - Extending Python in other languages - Version control - Unit testing More details below. If you have any questions about the courses, please contact me. Mike Python for Scientists and Engineers (Germany) - A three-day course covering all the basic tools scientists and engineers need. This course requires basic Python knowledge. Date: 19.01.-21.01.2012 Location: Leipzig, Germany Trainer: Mike Müller Course Language: English Link: http://www.python-academy.com/courses/python_course_scientists.html Python for Scientists and Engineers (USA) - This is an extend version of our well-received course for scientists and engineers. Five days of intensive training will give you a solid basis for using Python for scientific an technical problems. The course is hosted by David Beazley (http://www.dabeaz.com). Date: 27.02.-02.03.2012 Location: Chicago, IL, USA Trainer: Mike Müller Course Language: English Link: http://www.dabeaz.com/chicago/science.html ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Question on F/C-ordering in numpy svd
On 01/12/2012 04:21 PM, Ivan Oseledets wrote: > Dear all! > > I quite new to numpy and python. > I am a matlab user, my work is mainly > on multidimensional arrays, and I have a question on the svd function > from numpy.linalg > > It seems that > > u,s,v=svd(a,full_matrices=False) > > returns u and v in the F-contiguous format. The reason for this is that the underlying computational routine is in Fortran (when using system lapack library, for instance) that requires and returns F-contiguous arrays and the current behaviour guarantees the most memory efficient computation of svd. > That is not in a good agreement with other numpy stuff, where > C-ordering is default. > For example, matrix multiplication, dot() ignores ordering and returns > result always in C-ordering. > (which is documented), but the svd feature is not documented. In generic numpy operation, the particular ordering of arrays should not matter as the underlying code should know how to compute array operation results from different input orderings efficiently. This behaviour of svd should be documented. However, one should check that when using the svd from numpy lapack_lite (which is f2c code and could use also C-ordering, in principle), F-contiguous arrays are actually returned. Regards, Pearu ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion