Commit 17872 is the one you're looking for.
https://svn.open-mpi.org/trac/ompi/changeset/17872
george.
On Mar 18, 2008, at 9:12 PM, Jeff Squyres wrote:
When did you fix it? I merged the trunk down to the libevent-merge
branch late this afternoon (r17869).
On Mar 18, 2008, at 7:29 PM, Georg
Not trying to pile on here...but I do have a question.
This commit inserted a bunch of affinity-specific code in ompi_mpi_init.c.
Was this truly necessary?
It seems to me this violates our code architecture. Affinity-specific code
belongs in the opal_p[m]affinity functions. Why aren't we just cal
When did you fix it? I merged the trunk down to the libevent-merge
branch late this afternoon (r17869).
On Mar 18, 2008, at 7:29 PM, George Bosilca wrote:
This has been fixed in the trunk, but not yet merged in the branch.
george.
On Mar 18, 2008, at 7:17 PM, Josh Hursey wrote:
I found
After taking a look at how epoll is implemented in the Linyux kernel, I
can say with 100% certainty that BLCR will not restore the epoll fd
correctly. I hope to fix that eventually, but have too many other
things on my plate to address is now.
Since I cannot promise how soon BLCR may be able to r
On Mar 18, 2008, at 7:02 PM, Roland Dreier wrote:
Primary reasons for doing the switch are:
- distributed repositories are attractive/useful
- git/Mercurial branching and merging are *way* better than SVN
--> note that SVN v1.5 is supposed to be *much* better than v1.4
Also, svn is much slo
This has been fixed in the trunk, but not yet merged in the branch.
george.
On Mar 18, 2008, at 7:17 PM, Josh Hursey wrote:
I found another problem with the libevent branch.
If I set "-mca btl tcp,self" on the command line then I get a segfult
when sending messages > 16 KB. I can try to mak
I found another problem with the libevent branch.
If I set "-mca btl tcp,self" on the command line then I get a segfult
when sending messages > 16 KB. I can try to make a smaller repeater,
but if you use the "progress" or "simple" tests in ompi-tests below:
https://svn.open-mpi.org/svn/omp
> It's been loosely proposed that we switch away from SVN into a
> different system. This probably warrants some discussion to a) figure
> out if we want to move, and b) *if* we want to move, which system
> should we move to? One has system been proposed: Mercurial -- several
> OMPI
I have some more data from the field.
Leaving "opal_event_include" unset (Default) BLCR would give me the
following error when trying to restart a 2 process 'noop' MPI
application:
shell$ ompi-restart ompi_global_snapshot_8587.ckpt
Restart failed: Bad file descri
Its like rewriting libevent from scratch. I guess it can be done, but
it will be a long and painful process. How about the following solution:
- the daemons are aware that the checkpointing is enabled. They can
set the environment variable which will force the opal_event_include
to be set t
George added an MCA parameter for it (opal_event_include is a string
that can be set to "select" or "poll"), but it has to be set before
opal_init().
Josh: could you try running with the MCA parameter opal_event_include
set to "select"? This would confirm Brian's hypothesis...
Given that
If avoiding epoll() makes Josh's problems go away, PLEASE let me know
because that might indicate a deficiency in BLCR that I would want to
address.
-Paul
Brian W. Barrett wrote:
> Jeff / George -
>
> Did you add a way to specify which event modules are used? Because epoll
> pushs the socket l
Jeff / George -
Did you add a way to specify which event modules are used? Because epoll
pushs the socket list into the kernel, I can see how it would screw up
BLCR. I bet everything would work if we forced the use of poll / select.
Brian
On Tue, 18 Mar 2008, Jeff Squyres wrote:
Crud, ok
Crud, ok. Keep us posted.
On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote:
I'm testing with checkpoint/restart and the new libevent seems to be
messing up the checkpoints generated by BLCR. I'll be taking a look
at it over the next couple of days, but just thought I'd let people
know. Unfortuna
It's been loosely proposed that we switch away from SVN into a
different system. This probably warrants some discussion to a) figure
out if we want to move, and b) *if* we want to move, which system
should we move to? One has system been proposed: Mercurial -- several
OMPI developers are
I'm testing with checkpoint/restart and the new libevent seems to be
messing up the checkpoints generated by BLCR. I'll be taking a look
at it over the next couple of days, but just thought I'd let people
know. Unfortunately I don't have any more details at the moment.
-- Josh
On Mar 17, 2
Per the RFC posted yesterday, we plan to merge in the new libevent
over this upcoming weekend. Please test the /tmp-public/libevent-
merge SVN branch!
For convenience, I have posted a tarball from this branch if it would
make it easier for you to test:
http://www.open-mpi.org/~jsquyre
Terry --
Per the teleconf today (I wanted to ensure that some man page fixes
were included in 1.2.6): I checked SVN; the man pages fixes submitted
by the Debian OMPI package maintainers were committed to the 1.2
branch almost a month ago.
So I think we're clear for 1.2.6rc3.
--
Jeff Squy
Muhammad,
With regard to your question on migration you will likely have to
reload the BTL components when a migration occurs. Open MPI currently
assumes that once the set of BTLs are decided upon in a process they
are to be used until the application completes. There is some limited
supp
On Mar 18, 2008, at 9:32 AM, Jeff Squyres wrote:
I notice that rankfile didn't compile properly on some platforms and
issued warnings on other platforms. Thanks to Ralph for cleaning it
up...
1. I see a getenv("slot_list") in the MPI side of the code; it looks
like $slot_list is set by the odl
I notice that rankfile didn't compile properly on some platforms and
issued warnings on other platforms. Thanks to Ralph for cleaning it
up...
1. I see a getenv("slot_list") in the MPI side of the code; it looks
like $slot_list is set by the odls for the MPI process. Why isn't it
an MCA
21 matches
Mail list logo