Re: [OMPI users] ORTE HNP Daemon Error - Generated by Tweaking MTU

2020-08-10 Thread Ralph Castain via users
My apologies - I should have included "--debug-daemons" for the mpirun cmd line so that the stderr of the backend daemons would be output. > On Aug 10, 2020, at 10:28 AM, John Duffy via users > wrote: > > Thanks Ralph > > I will do all of that. Much appreciated.

Re: [OMPI users] ORTE HNP Daemon Error - Generated by Tweaking MTU

2020-08-10 Thread John Duffy via users
Thanks Ralph I will do all of that. Much appreciated.

Re: [OMPI users] ORTE HNP Daemon Error - Generated by Tweaking MTU

2020-08-10 Thread Ralph Castain via users
Well, we aren't really that picky :-) While I agree with Gilles that we are unlikely to be able to help you resolve the problem, we can give you a couple of ideas on how to chase it down First, be sure to build OMPI with "--enable-debug" and then try adding "--mca oob_base_verbose 100" to you

Re: [OMPI users] ORTE HNP Daemon Error - Generated by Tweaking MTU

2020-08-09 Thread John Duffy via users
Thanks Gilles I realise this is “off topic”. I was hoping the Open-MPI ORTE/HNP message might give me a clue where to look for my driver problem. Regarding P/Q ratios, indeed P=2 & Q=16 does indeed give me better performance. Kind regards

Re: [OMPI users] ORTE HNP Daemon Error - Generated by Tweaking MTU

2020-08-09 Thread Gilles Gouaillardet via users
John, I am not sure you will get much help here with a kernel crash caused by a tweaked driver. About HPL, you are more likely to get better performance with P and Q closer (e.g. 4x8 is likely better then 2x16 or 1x32). Also, HPL might have better performance with one MPI task per node and a

[OMPI users] ORTE HNP Daemon Error - Generated by Tweaking MTU

2020-08-09 Thread John Duffy via users
Hi I have generated this problem myself by tweaking the MTU of my 8 node Raspberry Pi 4 cluster to 9000 bytes, but I would be grateful for any ideas/suggestions on how to relate the Open-MPI ORTE message to my tweaking. When I run HPL Linpack using my “improved” cluster, it runs quite happily