Hi Joseph, John

Thank you for your replies.

I’m using Ubuntu 20.04 aarch64 on a 8 x Raspberry Pi 4 cluster.

The symptoms I’m experiencing are that the HPL Linpack performance in Gflops increases on a single core as NB is increased from 32 to 256. The theoretical maximum is 6 Gflops per core. I can achieve 4.8 Gflops, which I think is a reasonable expectation. However, as I add more cores on a single node, 2, 3 and finally 4 cores, the performance scaling is nowhere near linear, and tails off dramatically as NB is increased. I can achieve 15 Gflops on a single node of 4 cores, whereas the theoretical maximum is 24 Gflops per node.

opmi_info suggest vader is available/working…

                 MCA btl: openib (MCA v2.1.0, API v3.1.0, Component v4.0.3)
                 MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.0.3)
                 MCA btl: vader (MCA v2.1.0, API v3.1.0, Component v4.0.3)
                 MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.0.3)

I’m wondering whether the Ubuntu kernel CMA_SIZE_MBYTES=5 is limiting Open-MPI message number/size. So, I’m currently building a new kernel with CMA_SIZE_MBYTES=16.

I have attached 2 plots from my experiments…

Plot 1 - shows an increase in Gflops for 1 core as NB increases, up to a maximum value of 4.75 Gflops when NB = 240.

Plot 2 - shows an increase in Gflops for 4 x cores (all on same the same node) as NB increases. The maximum Gflops achieved is 15 Gflops. I would hope that rather than drop off dramatically at NB = 168, the performance would trend upwards towards somewhere near 4 x 4.75 = 19 Gflops.

This is why I wondering whether Open-MPI messages via vader are being hampered by a limiting CMA size.

Lets see what happens with my new kernel...

Best regards

John

Attachment: gflops_vs_nb_1_core_80_percent_memory.pdf
Description: Adobe PDF document

Attachment: gflops_vs_nb_1_node_80_percent_memory.pdf
Description: Adobe PDF document



Reply via email to