Re: [OMPI users] ORTE HNP Daemon Error - Generated by Tweaking MTU

2020-08-10 Thread Ralph Castain via users
My apologies - I should have included "--debug-daemons" for the mpirun cmd line 
so that the stderr of the backend daemons would be output.


> On Aug 10, 2020, at 10:28 AM, John Duffy via users  
> wrote:
> 
> Thanks Ralph
> 
> I will do all of that. Much appreciated.




Re: [OMPI users] ORTE HNP Daemon Error - Generated by Tweaking MTU

2020-08-10 Thread John Duffy via users
Thanks Ralph

I will do all of that. Much appreciated.


Re: [OMPI users] ORTE HNP Daemon Error - Generated by Tweaking MTU

2020-08-10 Thread Ralph Castain via users
Well, we aren't really that picky :-)  While I agree with Gilles that we are 
unlikely to be able to help you resolve the problem, we can give you a couple 
of ideas on how to chase it down

First, be sure to build OMPI with "--enable-debug" and then try adding "--mca 
oob_base_verbose 100" to you mpirun cmd line. That will dump a bunch of 
diagnostics from the daemon-to-daemon TCP code.

Second, you can look at that code and see if you can spot something that might 
be MTU sensitive - the code is in the orte/mca/oob/tcp directory.

HTH
Ralph


> On Aug 9, 2020, at 10:37 PM, John Duffy via users  
> wrote:
> 
> Thanks Gilles
> 
> I realise this is “off topic”. I was hoping the Open-MPI ORTE/HNP message 
> might give me a clue where to look for my driver problem.
> 
> Regarding P/Q ratios, indeed P=2 & Q=16 does indeed give me better 
> performance.
> 
> Kind regards



Re: [OMPI users] ORTE HNP Daemon Error - Generated by Tweaking MTU

2020-08-09 Thread John Duffy via users
Thanks Gilles

I realise this is “off topic”. I was hoping the Open-MPI ORTE/HNP message might 
give me a clue where to look for my driver problem.

Regarding P/Q ratios, indeed P=2 & Q=16 does indeed give me better performance.

Kind regards
 

Re: [OMPI users] ORTE HNP Daemon Error - Generated by Tweaking MTU

2020-08-09 Thread Gilles Gouaillardet via users
John,

I am not sure you will get much help here with a kernel crash caused
by a tweaked driver.

About HPL, you are more likely to get better performance with P and Q
closer (e.g. 4x8 is likely better then 2x16 or 1x32).
Also, HPL might have better performance with one MPI task per node and
a multithreaded BLAS
(e.g. PxQ = 2x4 and 4 OpenMP threads per MPI task)

Cheers,

Gilles


On Mon, Aug 10, 2020 at 3:31 AM John Duffy via users
 wrote:
>
> Hi
>
> I have generated this problem myself by tweaking the MTU of my 8 node 
> Raspberry Pi 4 cluster to 9000 bytes, but I would be grateful for any 
> ideas/suggestions on how to relate the Open-MPI ORTE message to my tweaking.
>
> When I run HPL Linpack using my “improved” cluster, it runs quite happily for 
> 2 hours with P=1 & Q=32 using 80% of memory, and this give me a 7% 
> performance increase to 97 Gflops. And I can quite happily Iperf 1GB of data 
> between nodes with an improved bandwidth of 980Mb/s. So, the MTU tweak 
> appears to be relatively robust.
>
> However, as soon as the HPL.dat parameters change to P=2 & Q=16, from within 
> the same HPL.dat file, I get the following message...
>
> --
> ORTE has lost communication with a remote daemon.
>
>   HNP daemon   : [[19859,0],0] on node node1
>   Remote daemon: [[19859,0],5] on node node6
>
> This is usually due to either a failure of the TCP network
> connection to the node, or possibly an internal failure of
> the daemon itself. We cannot recover from this failure, and
> therefore will terminate the job.
> —
>
> …and the affected node becomes uncontactable.
>
> I’m thinking the Open-MPI message sizes with P=2 & Q=16 are not working with 
> my imperfect MTU tweak, and I’m corrupting the TCP stack somehow.
>
> My tweak consisted of the following kernel changes:
>
> 1.) include/linux/if_vlan.h
>
> #define VLAN_ETH_DATA_LEN 9000
> #define VLAN_ETH_FRAME_LEN 9018
>
> 2.) include/uapi/linux/if_ether.h
>
> #define ETH_DATA_LEN 9000
> #define ETH_FRAME_LEN 9014
>
> 3.) drivers/net/ethernet/broadcom/genet/bcmgenet.c
>
> #define RX_BUF_LENGTH 10240
>
> The Raspberry Pi 4 ethernet driver does not expose many knobs to turn, most 
> ethtool options are not available, and there is no publicly available NIC 
> documentation, so my tweaks are educated guesswork based upon Raspberry Pi 
> forum threads.
>
> Any ideas/suggestions would be much appreciated. With P=2 & Q=16 prior to my 
> tweak I can achieve 100 Gflops, a potential increase to 107 Gflops is not to 
> be sniffed at.
>
> Best regards
>


[OMPI users] ORTE HNP Daemon Error - Generated by Tweaking MTU

2020-08-09 Thread John Duffy via users
Hi

I have generated this problem myself by tweaking the MTU of my 8 node Raspberry 
Pi 4 cluster to 9000 bytes, but I would be grateful for any ideas/suggestions 
on how to relate the Open-MPI ORTE message to my tweaking.

When I run HPL Linpack using my “improved” cluster, it runs quite happily for 2 
hours with P=1 & Q=32 using 80% of memory, and this give me a 7% performance 
increase to 97 Gflops. And I can quite happily Iperf 1GB of data between nodes 
with an improved bandwidth of 980Mb/s. So, the MTU tweak appears to be 
relatively robust.

However, as soon as the HPL.dat parameters change to P=2 & Q=16, from within 
the same HPL.dat file, I get the following message... 

--
ORTE has lost communication with a remote daemon.

  HNP daemon   : [[19859,0],0] on node node1
  Remote daemon: [[19859,0],5] on node node6

This is usually due to either a failure of the TCP network
connection to the node, or possibly an internal failure of
the daemon itself. We cannot recover from this failure, and
therefore will terminate the job.
—

…and the affected node becomes uncontactable.

I’m thinking the Open-MPI message sizes with P=2 & Q=16 are not working with my 
imperfect MTU tweak, and I’m corrupting the TCP stack somehow.

My tweak consisted of the following kernel changes:

1.) include/linux/if_vlan.h

#define VLAN_ETH_DATA_LEN 9000
#define VLAN_ETH_FRAME_LEN 9018

2.) include/uapi/linux/if_ether.h

#define ETH_DATA_LEN 9000
#define ETH_FRAME_LEN 9014

3.) drivers/net/ethernet/broadcom/genet/bcmgenet.c

#define RX_BUF_LENGTH 10240

The Raspberry Pi 4 ethernet driver does not expose many knobs to turn, most 
ethtool options are not available, and there is no publicly available NIC 
documentation, so my tweaks are educated guesswork based upon Raspberry Pi 
forum threads.

Any ideas/suggestions would be much appreciated. With P=2 & Q=16 prior to my 
tweak I can achieve 100 Gflops, a potential increase to 107 Gflops is not to be 
sniffed at.

Best regards