Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Dag Sverre Seljebotn
On 01/14/2012 12:28 AM, Sturla Molden wrote:
> Den 13.01.2012 22:42, skrev Sturla Molden:
>> Den 13.01.2012 22:24, skrev Robert Kern:
>>> Do these systems have a ramdisk capability?
>> I assume you have seen this as well :)
>>
>> http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf
>>
>
> This paper also repeats a common mistake about the GIL:
>
> "A future challenge is the increasing number of CPU cores per node,
> which is normally addressed by hybrid thread and message passing based
> parallelization. Whereas message passing can be used transparently by
> both on Python and C level, the global interpreter lock in CPython
> limits the thread based parallelization to the C-extensions only. We are
> currently investigating hybrid OpenMP/MPI implementation with the hope
> that limiting threading to only C-extension provides enough performance."
>
> This is NOT true.
>
> Python threads are native OS threads. They can be used for parallel
> computing on multi-core CPUs. The only requirement is that the Python
> code calls a C extension that releases the GIL. We can use threads in C
> or Python code: OpenMP and threading.Thread perform equally well, but if
> we use threading.Thread the GIL must be released for parallel execution.
> OpenMP is typically better for fine-grained parallelism in C code and
> threading.Thread is better for course-grained parallelism in Python
> code. The latter is also where mpi4py and multiprocessing can be used.

I don't see how you contradict their statement. The only code that can 
run without the GIL is in C-extensions (even if it is written in, say, 
Cython).

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Sturla Molden
Den 13.01.2012 22:42, skrev Sturla Molden:
> Den 13.01.2012 22:24, skrev Robert Kern:
>> Do these systems have a ramdisk capability?
> I assume you have seen this as well :)
>
> http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf
>

This paper also repeats a common mistake about the GIL:

"A future challenge is the increasing number of CPU cores per node, 
which is normally addressed by hybrid thread and message passing based 
parallelization. Whereas message passing can be used transparently by 
both on Python and C level, the global interpreter lock in CPython 
limits the thread based parallelization to the C-extensions only. We are 
currently investigating hybrid OpenMP/MPI implementation with the hope 
that limiting threading to only C-extension provides enough performance."

This is NOT true.

Python threads are native OS threads. They can be used for parallel 
computing on multi-core CPUs. The only requirement is that the Python 
code calls a C extension that releases the GIL. We can use threads in C 
or Python code: OpenMP and threading.Thread perform equally well, but if 
we use threading.Thread the GIL must be released for parallel execution. 
OpenMP is typically better for fine-grained parallelism in C code and 
threading.Thread is better for course-grained parallelism in Python 
code. The latter is also where mpi4py and multiprocessing can be used.

Sturla












___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [JOB] Extracting subset of dataset using latitude and longitude

2012-01-13 Thread Chao YUE
Hi, I don't know if numpy has ready tool for this.
I also have this use in my study.  So I write a simple code for my personal
use. It might no be great. I hope others can also respond as this is very
basic function in earth data analysis.

3
import numpy as np
lat=np.arange(89.75,-90,-0.5)
lon=np.arange(-179.75,180,0.5)
lon0,lat0=np.meshgrid(lon,lat)  #crate the grid from demonstration

def Get_GridValue(data,(vlat1,vlat2),(vlon1,vlon2)):
index_lat=np.nonzero((lat[:]>=vlat1)&(lat[:]<=vlat2))[0]
index_lon=np.nonzero((lon[:]>=vlon1)&(lon[:]<=vlon2))[0]

target=data[...,index_lat[0]:index_lat[-1]+1,index_lon[0]:index_lon[-1]+1]
return target

Get_GridValue(lat0,(40,45),(-30,-25))
Get_GridValue(lon0,(40,45),(-30,-25))


Chao

2012/1/13 Jeremy Lounds 

> Hello,
>
> I am looking for some help extracting a subset of data from a large
> dataset. The data is being read from a wgrib2 (World Meterological
> Organization standard gridded data) using the pygrib library.
>
> The data values, latitudes and longitudes are in separate lists (arrays?),
> and I would like a regional subset.
>
> The budget is not very large, but I am hoping that this is pretty simple
> job. I am just way too green at Python / numpy to know how to proceed, or
> even what to search for on Google.
>
> If interested, please e-mail jlou...@dynamiteinc.com
>
> Thank you!
>
> Jeremy Lounds
> DynamiteInc.com
> 1-877-762-7723, ext 711
> Fax: 877-202-3014
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
***
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Robert Kern
On Fri, Jan 13, 2012 at 21:42, Sturla Molden  wrote:
> Den 13.01.2012 22:24, skrev Robert Kern:
>> Do these systems have a ramdisk capability?
>
> I assume you have seen this as well :)
>
> http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf

I hadn't, actually! Good find! Actually, this same problem came up at
the last SciPy conference from several people (Blue Genes are more
common than I expected!), and the ramdisk was just my first idea. I'm
glad people have evaluated it.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Langton, Asher
On 1/13/12 1:58 PM, Dag Sverre Seljebotn wrote:
>
>It's actually not too difficult to do something like
>
>LD_PRELOAD=myhack.so python something.py
>
>and have myhack.so intercept the filesystem calls Python makes (to libc)
>and do whatever it wants. That's a solution that doesn't interfer with
>how Python does its imports at all, it simply changes how Python
>perceives the world around it ("emulation", though much, much lighter).
>
>It does require some low-level C code, but there are several examples on
>the net. I know Ondrej Certik just implemented something similar.

One of my colleagues suggested the LD_PRELOAD trick. I asked around here
at LLNL, and I seem to recall hearing that the LD_PRELOAD trick didn't
work on BlueGene/P, which is where the import bottleneck is the worst.
That might have been incorrect though, since LD_PRELOAD is mentioned on
Argonne's BG/P wiki. I'll have to look into this some more.

-Asher

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Dag Sverre Seljebotn
On 01/13/2012 10:20 PM, Langton, Asher wrote:
> On 1/13/12 12:38 PM, Sturla Molden wrote:
>> Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn:
>>> Another idea: Given your diagnostics, wouldn't dumping the output of
>>> "find" of every path in sys.path to a single text file work well?
>>
>> It probably would, and would also be less prone to synchronization
>> problems than using an MPI broadcast. Another possibility would be to
>> use a bsddb (or sqlite?) file as a persistent dict for caching the
>> output of imp.find_module.
>
> We tested something along those lines. Tim Kadich, a summer student at
> LLNL, wrote a module that went through the path and built up a dict of
> module->location mappings for a subset of module types. My recollection is
> that it worked well, and as you note, it didn't have the synchronization
> issues that MPI_Import has. We didn't fully implement it, since to handle
> complicated packages correctly, it looked like we'd either have to
> re-implement a lot of the internal Python import code or modify the
> interpreter itself. I don't think that MPI_Import is ultimately the
> "right" solution, but it shows how easily we can reap significant gains.
> Two better approaches that come to mind are:

It's actually not too difficult to do something like

LD_PRELOAD=myhack.so python something.py

and have myhack.so intercept the filesystem calls Python makes (to libc) 
and do whatever it wants. That's a solution that doesn't interfer with 
how Python does its imports at all, it simply changes how Python 
perceives the world around it ("emulation", though much, much lighter).

It does require some low-level C code, but there are several examples on 
the net. I know Ondrej Certik just implemented something similar.

Note, I'm just brainstorming here and recording possible (and perhaps 
impossible) ideas in this thread  -- the solution you have found is 
indeed a great step forward!

Dag Sverre

>
> 1) Fixing this bottleneck at the interpreter level (pre-computing and
> caching the locations)
>
> 2) More generally, dealing with this as well as other library-loading
> issues at the system level, perhaps by putting a small disk near a node or
> small collection of nodes, along with a command to push (broadcast) some
> portions of the filesystem to these (more-)local disks. Basically, the
> idea would be to let the user specify those directories or objects that
> will be accessed by most of the processes and treated as read-only so that
> those objects can be cached near the node.
>
> -Asher
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [JOB] Extracting subset of dataset using latitude and longitude

2012-01-13 Thread Jeremy Lounds
Hello,

I am looking for some help extracting a subset of data from a large dataset. 
The data is being read from a wgrib2 (World Meterological Organization standard 
gridded data) using the pygrib library.

The data values, latitudes and longitudes are in separate lists (arrays?), and 
I would like a regional subset.

The budget is not very large, but I am hoping that this is pretty simple job. I 
am just way too green at Python / numpy to know how to proceed, or even what to 
search for on Google.

If interested, please e-mail jlou...@dynamiteinc.com

Thank you!

Jeremy Lounds
DynamiteInc.com
1-877-762-7723, ext 711
Fax: 877-202-3014
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Langton, Asher
On 1/13/12 1:24 PM, Robert Kern wrote:
>On Fri, Jan 13, 2012 at 21:20, Langton, Asher  wrote:
>
>> 2) More generally, dealing with this as well as other library-loading
>> issues at the system level, perhaps by putting a small disk near a node
>>or
>> small collection of nodes, along with a command to push (broadcast) some
>> portions of the filesystem to these (more-)local disks. Basically, the
>> idea would be to let the user specify those directories or objects that
>> will be accessed by most of the processes and treated as read-only so
>>that
>> those objects can be cached near the node.
>
>Do these systems have a ramdisk capability?

That was another thing we looked at (but didn't implement): broadcasting
the modules to each node and putting them in a ramdisk. The drawback (for
us) is that we're already struggling with the amount of available memory
per core, and according to the vendors, the situation will only get worse
on future systems. The ramdisk approach might work well when there are
lots of small objects that will be accessed.

On 1/13/12 1:42 PM, Sturla Molden wrote:
>Den 13.01.2012 22:24, skrev Robert Kern:
>>Do these systems have a ramdisk capability?
>
>I assume you have seen this as well :)
>
>http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final
>.pdf


I hadn't. Thanks!

-Asher

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Travis Oliphant
It is a straightforward thing to implement a "registry mechanism" for Python 
that by-passes imp.find_module (i.e. using sys.meta_path).  You could  imagine 
creating the registry file for a package or distribution (much like Dag 
described) and push that to every node during distribution.   

The registry file would have the map between

package_name : file_location

which would avoid all the failed open calls. You would need to keep the 
registry updated as Dag describes, but this seems like a fairly simple approach 
that should help. 

-Travis





On Jan 13, 2012, at 2:38 PM, Sturla Molden wrote:

> Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn:
>> Another idea: Given your diagnostics, wouldn't dumping the output of
>> "find" of every path in sys.path to a single text file work well?
> 
> It probably would, and would also be less prone to synchronization 
> problems than using an MPI broadcast. Another possibility would be to 
> use a bsddb (or sqlite?) file as a persistent dict for caching the 
> output of imp.find_module.
> 
> Sturla
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Sturla Molden
Den 13.01.2012 22:24, skrev Robert Kern:
> Do these systems have a ramdisk capability? 

I assume you have seen this as well :)

http://www.cs.uoregon.edu/Research/paracomp/papers/iccs11/iccs_paper_final.pdf


Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Robert Kern
On Fri, Jan 13, 2012 at 21:20, Langton, Asher  wrote:

> 2) More generally, dealing with this as well as other library-loading
> issues at the system level, perhaps by putting a small disk near a node or
> small collection of nodes, along with a command to push (broadcast) some
> portions of the filesystem to these (more-)local disks. Basically, the
> idea would be to let the user specify those directories or objects that
> will be accessed by most of the processes and treated as read-only so that
> those objects can be cached near the node.

Do these systems have a ramdisk capability?

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Langton, Asher
On 1/13/12 12:38 PM, Sturla Molden wrote:
>Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn:
>> Another idea: Given your diagnostics, wouldn't dumping the output of
>> "find" of every path in sys.path to a single text file work well?
>
>It probably would, and would also be less prone to synchronization
>problems than using an MPI broadcast. Another possibility would be to
>use a bsddb (or sqlite?) file as a persistent dict for caching the
>output of imp.find_module.

We tested something along those lines. Tim Kadich, a summer student at
LLNL, wrote a module that went through the path and built up a dict of
module->location mappings for a subset of module types. My recollection is
that it worked well, and as you note, it didn't have the synchronization
issues that MPI_Import has. We didn't fully implement it, since to handle
complicated packages correctly, it looked like we'd either have to
re-implement a lot of the internal Python import code or modify the
interpreter itself. I don't think that MPI_Import is ultimately the
"right" solution, but it shows how easily we can reap significant gains.
Two better approaches that come to mind are:

1) Fixing this bottleneck at the interpreter level (pre-computing and
caching the locations)

2) More generally, dealing with this as well as other library-loading
issues at the system level, perhaps by putting a small disk near a node or
small collection of nodes, along with a command to push (broadcast) some
portions of the filesystem to these (more-)local disks. Basically, the
idea would be to let the user specify those directories or objects that
will be accessed by most of the processes and treated as read-only so that
those objects can be cached near the node.

-Asher

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Sturla Molden
Den 13.01.2012 21:21, skrev Dag Sverre Seljebotn:
> Another idea: Given your diagnostics, wouldn't dumping the output of
> "find" of every path in sys.path to a single text file work well?

It probably would, and would also be less prone to synchronization 
problems than using an MPI broadcast. Another possibility would be to 
use a bsddb (or sqlite?) file as a persistent dict for caching the 
output of imp.find_module.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Dag Sverre Seljebotn
On 01/13/2012 09:19 PM, Dag Sverre Seljebotn wrote:
> On 01/13/2012 02:13 AM, Asher Langton wrote:
>> Hi all,
>>
>> (I originally posted this to the BayPIGgies list, where Fernando Perez
>> suggested I send it to the NumPy list as well. My apologies if you're
>> receiving this email twice.)
>>
>> I work on a Python/C++ scientific code that runs as a number of
>> independent Python processes communicating via MPI. Unfortunately, as
>> some of you may have experienced, module importing does not scale well
>> in Python/MPI applications. For 32k processes on BlueGene/P, importing
>> 100 trivial C-extension modules takes 5.5 hours, compared to 35
>> minutes for all other interpreter loading and initialization. We
>> developed a simple pure-Python module (based on knee.py, a
>> hierarchical import example) that cuts the import time from 5.5 hours
>> to 6 minutes.
>>
>> The code is available here:
>>
>> https://github.com/langton/MPI_Import
>>
>> Usage, implementation details, and limitations are described in a
>> docstring at the beginning of the file (just after the mandatory
>> legalese).
>>
>> I've talked with a few people who've faced the same problem and heard
>> about a variety of approaches, which range from putting all necessary
>> files in one directory to hacking the interpreter itself so it
>> distributes the module-loading over MPI. Last summer, I had a student
>> intern try a few of these approaches. It turned out that the problem
>> wasn't so much the simultaneous module loads, but rather the huge
>> number of failed open() calls (ENOENT) as the interpreter tries to
>> find the module files. In the MPI_Import module, we have rank 0
>> perform the module lookups and then broadcast the locations to the
>> rest of the processes. For our real-world scientific applications
>> written in Python and C++, this has meant that we can start a problem
>> and actually make computational progress before the batch allocation
>> ends.
>
> This is great news! I've forwarded to the mpi4py mailing list which
> despairs over this regularly.
>
> Another idea: Given your diagnostics, wouldn't dumping the output of
> "find" of every path in sys.path to a single text file work well? Then
> each node download that file once and consult it when looking for
> modules, instead of network file metadata.
>
> (In fact I think "texhash" does the same for LaTeX?)
>
> The disadvantage is that one would need to run "update-python-paths"
> every time a package is installed to update the text file. But I'm not
> sure if that that disadvantage is larger than remembering to avoid
> diverging import paths between nodes; hopefully one could put a reminder
> to run update-python-paths in the ImportError string.

I meant "diverging code paths during imports between nodes"..

Dag

>
>
>> If you try out the code, I'd appreciate any feedback you have:
>> performance results, bugfixes/feature-additions, or alternate
>> approaches to solving this problem. Thanks!
>
> I didn't try it myself, but forwarding this from the mpi4py mailing list:
>
> """
> I'm testing it now and actually
> running into some funny errors with unittest on Python 2.7 causing
> infinite recursion.  If anyone is able to get this going, and could
> report successes back to the group, that would be very helpful.
> """
>
> Dag Sverre
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Dag Sverre Seljebotn
On 01/13/2012 02:13 AM, Asher Langton wrote:
> Hi all,
>
> (I originally posted this to the BayPIGgies list, where Fernando Perez
> suggested I send it to the NumPy list as well. My apologies if you're
> receiving this email twice.)
>
> I work on a Python/C++ scientific code that runs as a number of
> independent Python processes communicating via MPI. Unfortunately, as
> some of you may have experienced, module importing does not scale well
> in Python/MPI applications. For 32k processes on BlueGene/P, importing
> 100 trivial C-extension modules takes 5.5 hours, compared to 35
> minutes for all other interpreter loading and initialization. We
> developed a simple pure-Python module (based on knee.py, a
> hierarchical import example) that cuts the import time from 5.5 hours
> to 6 minutes.
>
> The code is available here:
>
> https://github.com/langton/MPI_Import
>
> Usage, implementation details, and limitations are described in a
> docstring at the beginning of the file (just after the mandatory
> legalese).
>
> I've talked with a few people who've faced the same problem and heard
> about a variety of approaches, which range from putting all necessary
> files in one directory to hacking the interpreter itself so it
> distributes the module-loading over MPI. Last summer, I had a student
> intern try a few of these approaches. It turned out that the problem
> wasn't so much the simultaneous module loads, but rather the huge
> number of failed open() calls (ENOENT) as the interpreter tries to
> find the module files. In the MPI_Import module, we have rank 0
> perform the module lookups and then broadcast the locations to the
> rest of the processes. For our real-world scientific applications
> written in Python and C++, this has meant that we can start a problem
> and actually make computational progress before the batch allocation
> ends.

This is great news! I've forwarded to the mpi4py mailing list which 
despairs over this regularly.

Another idea: Given your diagnostics, wouldn't dumping the output of 
"find" of every path in sys.path to a single text file work well? Then 
each node download that file once and consult it when looking for 
modules, instead of network file metadata.

(In fact I think "texhash" does the same for LaTeX?)

The disadvantage is that one would need to run "update-python-paths" 
every time a package is installed to update the text file. But I'm not 
sure if that that disadvantage is larger than remembering to avoid 
diverging import paths between nodes; hopefully one could put a reminder 
to run update-python-paths in the ImportError string.


> If you try out the code, I'd appreciate any feedback you have:
> performance results, bugfixes/feature-additions, or alternate
> approaches to solving this problem. Thanks!

I didn't try it myself, but forwarding this from the mpi4py mailing list:

"""
I'm testing it now and actually
running into some funny errors with unittest on Python 2.7 causing
infinite recursion.  If anyone is able to get this going, and could
report successes back to the group, that would be very helpful.
"""

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Robert Kern
On Fri, Jan 13, 2012 at 19:41, Sturla Molden  wrote:
> Den 13.01.2012 02:13, skrev Asher Langton:
>> intern try a few of these approaches. It turned out that the problem
>> wasn't so much the simultaneous module loads, but rather the huge
>> number of failed open() calls (ENOENT) as the interpreter tries to
>> find the module files.
>
> It sounds like there is a scalability problem with imp.find_module. I'd
> report
> this on python-dev or python-ideas.

It's well-known.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Improving Python+MPI import performance

2012-01-13 Thread Sturla Molden
Den 13.01.2012 02:13, skrev Asher Langton:
> intern try a few of these approaches. It turned out that the problem
> wasn't so much the simultaneous module loads, but rather the huge
> number of failed open() calls (ENOENT) as the interpreter tries to
> find the module files.

It sounds like there is a scalability problem with imp.find_module. I'd 
report
this on python-dev or python-ideas.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Python for Scientists - courses in Germany and US

2012-01-13 Thread Ilan Schnell
By the way, Enthought is also offering Python training and we have
just updated out training calendar for this year:
http://www.enthought.com/training/enthought_training_calendar.php

We are offering about 20 open Python classes in the US and Europe
this year.

- Ilan


On Fri, Jan 13, 2012 at 4:32 AM, Mike Müller  wrote:
> Learn NumPy and Much More
> =
>
> Scientists like Python. If you would like to learn more about
> important libraries for scientific applications, you might be
> interested in these courses.
>
> The course in Germany covers:
>
> - Overview of libraries
> - NumPy
> - Data storage with text files, Excel, netCDF and HDF5
> - matplotlib
> - Object oriented programming for scientists
> - Problem solving session
>
> The course in the USA covers all this plus:
>
> - Extending Python in other languages
> - Version control
> - Unit testing
>
>
> More details below.
>
> If you have any questions about the courses, please contact me.
>
> Mike
>
>
> Python for Scientists and Engineers (Germany)
> -
>
> A three-day course covering all the basic tools scientists and engineers need.
> This course requires basic Python knowledge.
>
> Date: 19.01.-21.01.2012
> Location: Leipzig, Germany
> Trainer: Mike Müller
> Course Language: English
> Link: http://www.python-academy.com/courses/python_course_scientists.html
>
>
> Python for Scientists and Engineers (USA)
> -
>
> This is an extend version of our well-received course for
> scientists and engineers. Five days of intensive training
> will give you a solid basis for using Python for scientific
> an technical problems.
>
> The course is hosted by David Beazley (http://www.dabeaz.com).
>
> Date: 27.02.-02.03.2012
> Location: Chicago, IL, USA
> Trainer: Mike Müller
> Course Language: English
> Link: http://www.dabeaz.com/chicago/science.html
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Python for Scientists - courses in Germany and US

2012-01-13 Thread Mike Müller
Learn NumPy and Much More
=

Scientists like Python. If you would like to learn more about
important libraries for scientific applications, you might be
interested in these courses.

The course in Germany covers:

- Overview of libraries
- NumPy
- Data storage with text files, Excel, netCDF and HDF5
- matplotlib
- Object oriented programming for scientists
- Problem solving session

The course in the USA covers all this plus:

- Extending Python in other languages
- Version control
- Unit testing


More details below.

If you have any questions about the courses, please contact me.

Mike


Python for Scientists and Engineers (Germany)
-

A three-day course covering all the basic tools scientists and engineers need.
This course requires basic Python knowledge.

Date: 19.01.-21.01.2012
Location: Leipzig, Germany
Trainer: Mike Müller
Course Language: English
Link: http://www.python-academy.com/courses/python_course_scientists.html


Python for Scientists and Engineers (USA)
-

This is an extend version of our well-received course for
scientists and engineers. Five days of intensive training
will give you a solid basis for using Python for scientific
an technical problems.

The course is hosted by David Beazley (http://www.dabeaz.com).

Date: 27.02.-02.03.2012
Location: Chicago, IL, USA
Trainer: Mike Müller
Course Language: English
Link: http://www.dabeaz.com/chicago/science.html
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question on F/C-ordering in numpy svd

2012-01-13 Thread Pearu Peterson


On 01/12/2012 04:21 PM, Ivan Oseledets wrote:
> Dear all!
>
> I quite new to numpy and python.
> I am a matlab user, my work is mainly
> on multidimensional arrays, and I have a question on the svd function
> from numpy.linalg
>
> It seems that
>
> u,s,v=svd(a,full_matrices=False)
>
> returns u and v in the F-contiguous format.

The reason for this is that the underlying computational routine
is in Fortran (when using system lapack library, for instance) that 
requires and returns F-contiguous arrays and the current behaviour 
guarantees the most memory efficient computation of svd.

> That is not in a good agreement with other numpy stuff, where
> C-ordering is default.
> For example, matrix multiplication, dot() ignores ordering and returns
> result always in C-ordering.
> (which is documented), but the svd feature is not documented.

In generic numpy operation, the particular ordering of arrays
should not matter as the underlying code should know how to
compute array operation results from different input orderings
efficiently.

This behaviour of svd should be documented. However, one
should check that when using the svd from numpy lapack_lite (which is 
f2c code and could use also C-ordering, in principle),
F-contiguous arrays are actually returned.

Regards,
Pearu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion