Hi, I am having a problem with the last version of openmpi.
In some executions (1 each 100 more or less) a message is printed:
[tegasaste:01617] [NO-NAME] ORTE_ERROR_LOG: File read failure in file
util/universe_setup_file_io.c at line 123
It seems like if it try to read the universe file and it ha
This has been around for a very long time (at least a year, if memory serves
correctly). The problem is that the system "hangs" while trying to flush the
io buffers through the RML because it loses connection to the head node
process (for 1.x, that's basically mpirun) - but the "flush" procedure
do
I have been noticing this for a while (at least 2 months) as well
along with stale session directories. I filed a bug yesterday #177
https://svn.open-mpi.org/trac/ompi/ticket/177
I'll add this stack trace to it. I want to take a closer look
tomorrow to see what's really going on here.
When
Starting with few days ago, I notice that more and more orted are
left over after my runs. Usually, if the job run to completions they
disappear. But if I kill the job or it segfault they don't. I
attached to one of them and I get the following stack:
#0 0x9001f7a8 in select ()
#1 0x00375