John, I am not sure you will get much help here with a kernel crash caused by a tweaked driver.
About HPL, you are more likely to get better performance with P and Q closer (e.g. 4x8 is likely better then 2x16 or 1x32). Also, HPL might have better performance with one MPI task per node and a multithreaded BLAS (e.g. PxQ = 2x4 and 4 OpenMP threads per MPI task) Cheers, Gilles On Mon, Aug 10, 2020 at 3:31 AM John Duffy via users <users@lists.open-mpi.org> wrote: > > Hi > > I have generated this problem myself by tweaking the MTU of my 8 node > Raspberry Pi 4 cluster to 9000 bytes, but I would be grateful for any > ideas/suggestions on how to relate the Open-MPI ORTE message to my tweaking. > > When I run HPL Linpack using my “improved” cluster, it runs quite happily for > 2 hours with P=1 & Q=32 using 80% of memory, and this give me a 7% > performance increase to 97 Gflops. And I can quite happily Iperf 1GB of data > between nodes with an improved bandwidth of 980Mb/s. So, the MTU tweak > appears to be relatively robust. > > However, as soon as the HPL.dat parameters change to P=2 & Q=16, from within > the same HPL.dat file, I get the following message... > > -------------------------------------------------------------------------- > ORTE has lost communication with a remote daemon. > > HNP daemon : [[19859,0],0] on node node1 > Remote daemon: [[19859,0],5] on node node6 > > This is usually due to either a failure of the TCP network > connection to the node, or possibly an internal failure of > the daemon itself. We cannot recover from this failure, and > therefore will terminate the job. > ————————————————————————————————————— > > …and the affected node becomes uncontactable. > > I’m thinking the Open-MPI message sizes with P=2 & Q=16 are not working with > my imperfect MTU tweak, and I’m corrupting the TCP stack somehow. > > My tweak consisted of the following kernel changes: > > 1.) include/linux/if_vlan.h > > #define VLAN_ETH_DATA_LEN 9000 > #define VLAN_ETH_FRAME_LEN 9018 > > 2.) include/uapi/linux/if_ether.h > > #define ETH_DATA_LEN 9000 > #define ETH_FRAME_LEN 9014 > > 3.) drivers/net/ethernet/broadcom/genet/bcmgenet.c > > #define RX_BUF_LENGTH 10240 > > The Raspberry Pi 4 ethernet driver does not expose many knobs to turn, most > ethtool options are not available, and there is no publicly available NIC > documentation, so my tweaks are educated guesswork based upon Raspberry Pi > forum threads. > > Any ideas/suggestions would be much appreciated. With P=2 & Q=16 prior to my > tweak I can achieve 100 Gflops, a potential increase to 107 Gflops is not to > be sniffed at. > > Best regards >