Yes please, this fixes is asked by Bull clients.
damien
Le 17/03/2011 15:44, Jeff Squyres a écrit :
Does this need to be CMR'ed to 1.4 and/or 1.5?
On Mar 16, 2011, at 10:27 PM, Ralph Castain wrote:
Okay, I fixed this in r24536.
Sorry for the problem, Damien - thanks for catching it!
You are welcome. I'm happy you find quickly this fix.
Thanks to all
Damien
Le 17/03/2011 03:27, Ralph Castain a écrit :
Okay, I fixed this in r24536.
Sorry for the problem, Damien - thanks for catching it! Went unnoticed because
the folks at the Labs always use IB.
On Mar 16, 2011, at
Hi all
From my test, it is impossible to use "btl:tcp" with "grpcomm:hier".
The "grpcomm:hier" module is important because, "srun" launch protocol
can't use any other "grpcomm" module.
You can reproduce this bug, by using "btl:tcp" and "grpcomm:hier" , when
you create a ring(like: IMB
by customer who use BPS and LSF batch manager.
thanks
Damien Guinier
-
diff -r 486ca4bfca95 contrib/dist/linux/openmpi.spec
--- a/contrib/dist/linux/openmpi.spec Mon Feb 07 15:40:31 2011 +0100
+++ b/contrib/dist/linux/openmpi.spec Tue Feb 08 14:30:01 2011 +0100
@@ -514,6 +514,10
oups
Ok, you can commit it. All problem is on "procs" word, on source code,
"processes" AND "cores" definition is used.
Le 01/12/2010 11:37, Damien Guinier a écrit :
Ok, you can commit it. All problem is on "procs" work, on source code,
"proce
uot;
causes us to set the "bynode" flag by mistake. Did you check that?
BTW: when running cpus-per-proc, a slot doesn't have X processes. I suspect
this is just a language thing, but it will create confusion. A slot consists of
X cpus - we still assign only one process to each slot.
Using Intel OpenMP in conjunction with srun seems to cause a
segmentation fault, at least in the 1.5 branch.
After a long time tracking this strange bug, I finally found out that
the slurmd ess component was corrupting the __environ structure. This
results in a crash in Intel OpenMP, which
Hi all
A recent update of the libevent seems to cause a regression on our side.
On my 32 cpus node cluster , process launch by srun, hang on
opal_event_loop().
We see a deadlock in MPI_Init (endlessly looping in opal_event_loop())
when we launch processes with pure srun on 32 cores nodes.
appy with the way this behaves...
Let me know what you find out.
On Feb 26, 2010, at 9:45 AM, Damien Guinier wrote:
Hi Ralph,
I find a minor bug on the MCA composent: ras slurm.
This one have an incorrect comportement with the "X number of processors per
task" feature.
On the file ort
Hi Ralph,
I find a minor bug on the MCA composent: ras slurm.
This one have an incorrect comportement with the "X number of processors
per task" feature.
On the file orte/mca/ras/slurm/ras_slurm_module.c, line 356:
- The node slot number is divide with "cpus_per_task" information,
but
s module as it will then try
to track its own launches and totally forget that it is a remote orted
with slightly different responsibilities.
If you need it to execute a different plm on the backend, please let
me know - it is a trivial change to allow specification of remote
launch agents, and we
Hi Ralph
On Openmpi, I working on a new little feature: hnp_always_use_plm.
- To create final application , mpirun use on remote "orted via plm:
Process lifecycle managment module" or localy "fork()". So the first
compute node haven't the same methode than other compute node. Some
debug
Hi Ralph
I have found a bug in the 'grpcomm' : 'hier'. This bug create a
infinite loop in mpi_finalize. In this module: the barrier is executed
as an allgather with data length of zero. This allgather function can go
to an infinite loop , depend of rank execution order.
In
13 matches
Mail list logo