Re: [OMPI devel] RFC: libevent update

2008-03-18 Thread George Bosilca
Commit 17872 is the one you're looking for. https://svn.open-mpi.org/trac/ompi/changeset/17872 george. On Mar 18, 2008, at 9:12 PM, Jeff Squyres wrote: When did you fix it? I merged the trunk down to the libevent-merge branch late this afternoon (r17869). On Mar 18, 2008, at 7:29 PM, Georg

Re: [OMPI devel] rankfile questions

2008-03-18 Thread Ralph Castain
Not trying to pile on here...but I do have a question. This commit inserted a bunch of affinity-specific code in ompi_mpi_init.c. Was this truly necessary? It seems to me this violates our code architecture. Affinity-specific code belongs in the opal_p[m]affinity functions. Why aren't we just cal

Re: [OMPI devel] RFC: libevent update

2008-03-18 Thread Jeff Squyres
When did you fix it? I merged the trunk down to the libevent-merge branch late this afternoon (r17869). On Mar 18, 2008, at 7:29 PM, George Bosilca wrote: This has been fixed in the trunk, but not yet merged in the branch. george. On Mar 18, 2008, at 7:17 PM, Josh Hursey wrote: I found

Re: [OMPI devel] RFC: libevent update

2008-03-18 Thread Paul H. Hargrove
After taking a look at how epoll is implemented in the Linyux kernel, I can say with 100% certainty that BLCR will not restore the epoll fd correctly. I hope to fix that eventually, but have too many other things on my plate to address is now. Since I cannot promise how soon BLCR may be able to r

Re: [OMPI devel] Switching away from SVN?

2008-03-18 Thread Jeff Squyres
On Mar 18, 2008, at 7:02 PM, Roland Dreier wrote: Primary reasons for doing the switch are: - distributed repositories are attractive/useful - git/Mercurial branching and merging are *way* better than SVN --> note that SVN v1.5 is supposed to be *much* better than v1.4 Also, svn is much slo

Re: [OMPI devel] RFC: libevent update

2008-03-18 Thread George Bosilca
This has been fixed in the trunk, but not yet merged in the branch. george. On Mar 18, 2008, at 7:17 PM, Josh Hursey wrote: I found another problem with the libevent branch. If I set "-mca btl tcp,self" on the command line then I get a segfult when sending messages > 16 KB. I can try to mak

Re: [OMPI devel] RFC: libevent update

2008-03-18 Thread Josh Hursey
I found another problem with the libevent branch. If I set "-mca btl tcp,self" on the command line then I get a segfult when sending messages > 16 KB. I can try to make a smaller repeater, but if you use the "progress" or "simple" tests in ompi-tests below: https://svn.open-mpi.org/svn/omp

Re: [OMPI devel] Switching away from SVN?

2008-03-18 Thread Roland Dreier
> It's been loosely proposed that we switch away from SVN into a > different system. This probably warrants some discussion to a) figure > out if we want to move, and b) *if* we want to move, which system > should we move to? One has system been proposed: Mercurial -- several > OMPI

Re: [OMPI devel] RFC: libevent update

2008-03-18 Thread Josh Hursey
I have some more data from the field. Leaving "opal_event_include" unset (Default) BLCR would give me the following error when trying to restart a 2 process 'noop' MPI application: shell$ ompi-restart ompi_global_snapshot_8587.ckpt Restart failed: Bad file descri

Re: [OMPI devel] RFC: libevent update

2008-03-18 Thread George Bosilca
Its like rewriting libevent from scratch. I guess it can be done, but it will be a long and painful process. How about the following solution: - the daemons are aware that the checkpointing is enabled. They can set the environment variable which will force the opal_event_include to be set t

Re: [OMPI devel] RFC: libevent update

2008-03-18 Thread Jeff Squyres
George added an MCA parameter for it (opal_event_include is a string that can be set to "select" or "poll"), but it has to be set before opal_init(). Josh: could you try running with the MCA parameter opal_event_include set to "select"? This would confirm Brian's hypothesis... Given that

Re: [OMPI devel] RFC: libevent update

2008-03-18 Thread Paul H. Hargrove
If avoiding epoll() makes Josh's problems go away, PLEASE let me know because that might indicate a deficiency in BLCR that I would want to address. -Paul Brian W. Barrett wrote: > Jeff / George - > > Did you add a way to specify which event modules are used? Because epoll > pushs the socket l

Re: [OMPI devel] RFC: libevent update

2008-03-18 Thread Brian W. Barrett
Jeff / George - Did you add a way to specify which event modules are used? Because epoll pushs the socket list into the kernel, I can see how it would screw up BLCR. I bet everything would work if we forced the use of poll / select. Brian On Tue, 18 Mar 2008, Jeff Squyres wrote: Crud, ok

Re: [OMPI devel] RFC: libevent update

2008-03-18 Thread Jeff Squyres
Crud, ok. Keep us posted. On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote: I'm testing with checkpoint/restart and the new libevent seems to be messing up the checkpoints generated by BLCR. I'll be taking a look at it over the next couple of days, but just thought I'd let people know. Unfortuna

[OMPI devel] Switching away from SVN?

2008-03-18 Thread Jeff Squyres
It's been loosely proposed that we switch away from SVN into a different system. This probably warrants some discussion to a) figure out if we want to move, and b) *if* we want to move, which system should we move to? One has system been proposed: Mercurial -- several OMPI developers are

Re: [OMPI devel] RFC: libevent update

2008-03-18 Thread Josh Hursey
I'm testing with checkpoint/restart and the new libevent seems to be messing up the checkpoints generated by BLCR. I'll be taking a look at it over the next couple of days, but just thought I'd let people know. Unfortunately I don't have any more details at the moment. -- Josh On Mar 17, 2

[OMPI devel] libevent-merge tarball

2008-03-18 Thread Jeff Squyres
Per the RFC posted yesterday, we plan to merge in the new libevent over this upcoming weekend. Please test the /tmp-public/libevent- merge SVN branch! For convenience, I have posted a tarball from this branch if it would make it easier for you to test: http://www.open-mpi.org/~jsquyre

[OMPI devel] 1.2.6 man page fixes: done

2008-03-18 Thread Jeff Squyres
Terry -- Per the teleconf today (I wanted to ensure that some man page fixes were included in 1.2.6): I checked SVN; the man pages fixes submitted by the Debian OMPI package maintainers were committed to the 1.2 branch almost a month ago. So I think we're clear for 1.2.6rc3. -- Jeff Squy

Re: [OMPI devel] xensocket btl and migration

2008-03-18 Thread Josh Hursey
Muhammad, With regard to your question on migration you will likely have to reload the BTL components when a migration occurs. Open MPI currently assumes that once the set of BTLs are decided upon in a process they are to be used until the application completes. There is some limited supp

Re: [OMPI devel] rankfile questions

2008-03-18 Thread Jeff Squyres
On Mar 18, 2008, at 9:32 AM, Jeff Squyres wrote: I notice that rankfile didn't compile properly on some platforms and issued warnings on other platforms. Thanks to Ralph for cleaning it up... 1. I see a getenv("slot_list") in the MPI side of the code; it looks like $slot_list is set by the odl

[OMPI devel] rankfile questions

2008-03-18 Thread Jeff Squyres
I notice that rankfile didn't compile properly on some platforms and issued warnings on other platforms. Thanks to Ralph for cleaning it up... 1. I see a getenv("slot_list") in the MPI side of the code; it looks like $slot_list is set by the odls for the MPI process. Why isn't it an MCA