Re: [OMPI users] mpi_leave_pinned is dangerous

2012-11-08 Thread Shamis, Pavel
> * Some perf (or trace?) guys came late and said "oh your code should be I think eventually there was a consensus that perf/trace doesn't fit well ... (or it requires substantial changes) > > > > Le 08/11/2012 15:43, Jeff Squyres a écrit : >> Note that the saga of trying to push ummunotify

Re: [OMPI users] mpi_leave_pinned is dangerous

2012-11-08 Thread Shamis, Pavel
Note that the saga of trying to push ummunotify upstream to Linux ended up with Linus essentially saying "fix your own network stack; don't put this in the main kernel." I haven't seen this one. All I found is this thread http://thread.gmane.org/gmane.linux.drivers.openib/65188 On Nov

Re: [OMPI users] mpi_leave_pinned is dangerous

2012-11-08 Thread Jeff Squyres
Nope, that wasn't it. ...oh, I see, Linus' reply didn't go to LKML; it just went to a bunch of individuals. Here's part of his reply: The interface claims to be generic, but is really just a hack for a single use case that very few people care about. I find the design depressingly stupid,

Re: [OMPI users] mpi_leave_pinned is dangerous

2012-11-08 Thread Brice Goglin
My understanding of the upstreaming failure was more like: * Linus was going to be OK * Some perf (or trace?) guys came late and said "oh your code should be integrated into our more general stuff" but they didn't do it, and basically vetoed anything that didn't do what they said Brice Le

Re: [OMPI users] mpi_leave_pinned is dangerous

2012-11-08 Thread Jeff Squyres
Note that the saga of trying to push ummunotify upstream to Linux ended up with Linus essentially saying "fix your own network stack; don't put this in the main kernel." He's was right back then. With a 2nd "customer" for this kind of thing (cuda), that equation might be changing, but I'll

Re: [OMPI users] mpi_leave_pinned is dangerous

2012-11-08 Thread Shamis, Pavel
Another good reason for ummunotify kernel module (http://lwn.net/Articles/345013/) Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Nov 8, 2012, at 9:08 AM, Jeff Squyres wrote: On Nov 8, 2012, at 8:51 AM, Rolf

Re: [OMPI users] mpi_leave_pinned is dangerous

2012-11-08 Thread Jeff Squyres
On Nov 8, 2012, at 8:51 AM, Rolf vandeVaart wrote: > Not sure. I will look into this. And thank you for the feedback Jens! FWIW, I +1 Jens' request. MPI implementations are able to handle network registration mechanisms via standard memory hooks (their hooks are actually pretty terrible,

Re: [OMPI users] mpi_leave_pinned is dangerous

2012-11-08 Thread Rolf vandeVaart
Not sure. I will look into this. And thank you for the feedback Jens! Rolf >-Original Message- >From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] >On Behalf Of Jeff Squyres >Sent: Thursday, November 08, 2012 8:49 AM >To: Open MPI Users >Subjec

Re: [OMPI users] mpi_leave_pinned is dangerous

2012-11-08 Thread Jeff Squyres
On Nov 7, 2012, at 7:21 PM, Jens Glaser wrote: > With the help of MVAPICH2 developer S. Potluri the problem was isolated and > fixed. Sorry about not replying; we're all (literally) very swamped trying to prepare for the Supercomputing trade show/conference next week. I know I'm wy

Re: [OMPI users] mpi_leave_pinned is dangerous

2012-11-07 Thread Jens Glaser
I am replying to my own post, since no one else replied. With the help of MVAPICH2 developer S. Potluri the problem was isolated and fixed. It was, as expected, due to the library not intercepting the cudaHostAlloc() and cudaFreeHost() calls to register pinned memory, as would be required for

[OMPI users] mpi_leave_pinned is dangerous

2012-11-04 Thread Jens Glaser
Hi, I am working on a CUDA/MPI application. It uses page-locked host buffers allocated with cudaHostAlloc(...,cudaHostAllocDefault), to which data from the device is copied before calling MPI. The application, a particle simulation, reproducibly crashed or produced undefined behavior at large