Re: [OMPI users] Floating point overflow and tuning

2019-09-09 Thread Logan Stonebraker via users
>> Do you know what StarCCM is doing when it hangs?  I.e., is it in an MPI call?

I have set FI_LOG_LEVEL="debug" and below except is the point where it hangs on 
usdf_cq_readerr, right after the last usdf_am_insert_async.  I am defining hang 
as 5 minutes.  It might hang for longer?   With Intel MPI and USNIC or TCP BTL, 
there is no "hang" and it starts happily running the batch job almost 
immediately.

libfabric-cisco:usnic:domain:usdf_am_get_distance():219 
libfabric-cisco:usnic:av:usdf_am_insert_async():317
libfabric-cisco:usnic:cq:usdf_cq_readerr():93
libfabric-cisco:usnic:cq:usdf_cq_readerr():93
libfabric-cisco:usnic:cq:usdf_cq_readerr():93
libfabric-cisco:usnic:cq:usdf_cq_readerr():93
(above readerr's generate rapidly forever..)

On the large core runs it happens during the first stages of mpi init and it 
never get's passed "Starting STAR-CCM+ parallel server".  It does not reach CPU 
Affinity Report (I have -cpubind bandwidth,v flag in STAR).  

Perhaps it is possible this is lower level than mpi, perhaps with 
libfabric-cisco, or as you point out with StarCCM.

Interestingly, with a small number of cores selected, the job does complete, 
however we still see these libfabric-cisco:usnic:cq:usdf_cq_readerr():93 errors 
above.

I will try to run some other app through mpirun and see if I can replicate.
I briefly used fi_pingpong and cant replicate the cq_readerr, however did get 
plenty of other errors related to provider 

-Logan

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Floating point overflow and tuning

2019-09-09 Thread Jeff Squyres (jsquyres) via users
On Sep 6, 2019, at 2:17 PM, Logan Stonebraker via users 
 wrote:
> 
> I am working with star ccm+ 2019.1.1 Build 14.02.012
> 
> CentOS 7.6 kernel 3.10.0-957.21.3.el7.x86_64
> 
> Intel MPI Version 2018 Update 5 Build 20190404 (this is version shipped with 
> star ccm+)
> 
> Also trying to make openmpi work (more on that later)

Greetings Logan.

I would definitely recommend Open MPI vs. DAPL/Intel MPI.

> Cisco UCS b200 and c240 cluster using USNIC fabric over 10gbe
> Intel(R) Xeon(R) CPU E5-2698
> 7 nodes
> 280 total cores
> 
> enic RPM version kmod-enic-3.2.210.22-738.18.centos7u7.x86_64 installed
> usnic RPM kmod-usnic_verbs-3.2.158.15-738.18.rhel7u6.x86_64 installed
> enic modinfo version: 3.2.210.22
> enic loaded module version: 3.2.210.22
> usnic_verbs modinfo version: 3.2.158.15
> usnic_verbs loaded module version: 3.2.158.15
> libdaplusnic RPM version 2.0.39cisco3.2.112.8 installed
> libfabric RPM version 1.6.0cisco3.2.112.9.rhel7u6 installed
> 
> On batch runs less than 5 hours, everything works flawlessly, the jobs 
> complete with out error, and is quite fast with dapl. Especially compared to 
> TCP btl.
> 
> However when running with n-1 273 total cores at or around 5 hours into a 
> job, the longer jobs die with a star ccm floating point exception.
> The same job completes fine with no more than 210 cores, 30 cores per each of 
> 7 nodes.  I would like to be able to use the 60 more cores.
> I am using PBS Pro with 99 hour wall time
> 
> Here is the overflow error. 
> --
> Turbulent viscosity limited on 56 cells in Region
> A floating point exception has occurred: floating point exception [Overflow]. 
>  The specific cause cannot be identified.  Please refer to the 
> troubleshooting section of the User's Guide.
> Context: star.coupledflow.CoupledImplicitSolver
> Command: Automation.Run
>error: Server Error
> --
> 
> I have not ruled out that I am missing some parameters or tuning with Intel 
> MPI as this is a new cluster.

That's odd.  That type of error is *usually* not the MPI's fault.

> I am also trying to make Open MPI work.  I have openmpi compiled and it runs 
> and I can see it is using the usnic fabric, however it only runs with very 
> small number of CPU.  Anything over about 2 cores per node it hangs 
> indefinately, right after the job starts.

That's also quite odd; it shouldn't *hang*.

> I have compiled Open MPI 3.1.3 from [url]https://www.open-mpi.org/[/url] 
> because this is what Star CCM version I am running supports.  I am telling 
> star to use the open mpi that I installed so it can support the Cisco USNIC 
> fabric, which I can verify using Cisco native tools (star ships with openmpi 
> btw however I'm not using it).
> 
> I am thinking that I need to tune OpenMPI, which was also requried with Intel 
> MPI in order to run with out indefinite hang.
> 
> With Intel MPI prior to tuning, jobs with more than about 100 cores would 
> hang forever until I added these parameters:
> 
> reference: 
> [url]https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/542591[/url]
> reference: 
> [url]https://software.intel.com/en-us/articles/tuning-the-intel-mpi-library-advanced-techniques[/url]
> 
> export I_MPI_DAPL_UD_SEND_BUFFER_NUM=8208
> export I_MPI_DAPL_UD_RECV_BUFFER_NUM=8208
> export I_MPI_DAPL_UD_ACK_SEND_POOL_SIZE=8704
> export I_MPI_DAPL_UD_ACK_RECV_POOL_SIZE=8704
> export I_MPI_DAPL_UD_RNDV_EP_NUM=2
> export I_MPI_DAPL_UD_REQ_EVD_SIZE=2000
> export I_MPI_DAPL_UD_MAX_MSG_SIZE=4096
> export I_MPI_DAPL_UD_DIRECT_COPY_THRESHOLD=2147483647
> 
> After adding these parms I can scale to 273 cores and it runs very fast, up 
> until the point where it gets the floating point exception about 5 hours into 
> the job.
> 
> I am struggling trying to find equivelant turning parms for Open MPI.

FWIW, you shouldn't need any tuning params -- it should "just work".

> I have listed all the MCA available with Open using MCA, and have tried 
> setting these parms with no success, I may not have the equivalent parms 
> listed here, this is what I have tried.
> 
> btl_max_send_size = 4096
> btl_usnic_eager_limit = 2147483647
> btl_usnic_rndv_eager_limit = 2147483647
> btl_usnic_sd_num = 8208
> btl_usnic_rd_num = 8208
> btl_usnic_prio_sd_num = 8704
> btl_usnic_prio_rd_num = 8704
> btl_usnic_pack_lazy_threshold = -1

All those look reasonable.

Do you know what StarCCM is doing when it hangs?  I.e., is it in an MPI call?

-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


[OMPI users] Floating point overflow and tuning

2019-09-06 Thread Logan Stonebraker via users
I am working with star ccm+ 2019.1.1 Build 14.02.012
CentOS 7.6 kernel 3.10.0-957.21.3.el7.x86_64
Intel MPI Version 2018 Update 5 Build 20190404 (this is version shipped with 
star ccm+)
Also trying to make openmpi work (more on that later)
Cisco UCS b200 and c240 cluster using USNIC fabric over 10gbeIntel(R) Xeon(R) 
CPU E5-26987 nodes280 total cores
enic RPM version kmod-enic-3.2.210.22-738.18.centos7u7.x86_64 installedusnic 
RPM kmod-usnic_verbs-3.2.158.15-738.18.rhel7u6.x86_64 installedenic modinfo 
version: 3.2.210.22enic loaded module version: 3.2.210.22usnic_verbs modinfo 
version: 3.2.158.15usnic_verbs loaded module version: 3.2.158.15libdaplusnic 
RPM version 2.0.39cisco3.2.112.8 installedlibfabric RPM version 
1.6.0cisco3.2.112.9.rhel7u6 installed
On batch runs less than 5 hours, everything works flawlessly, the jobs complete 
with out error, and is quite fast with dapl. Especially compared to TCP btl.

However when running with n-1 273 total cores at or around 5 hours into a job, 
the longer jobs die with a star ccm floating point exception.The same job 
completes fine with no more than 210 cores, 30 cores per each of 7 nodes.  I 
would like to be able to use the 60 more cores.I am using PBS Pro with 99 hour 
wall time
Here is the overflow error. --Turbulent viscosity limited on 56 
cells in RegionA floating point exception has occurred: floating point 
exception [Overflow].  The specific cause cannot be identified.  Please refer 
to the troubleshooting section of the User's Guide.Context: 
star.coupledflow.CoupledImplicitSolverCommand: Automation.Run   error: Server 
Error--
I have not ruled out that I am missing some parameters or tuning with Intel MPI 
as this is a new cluster.
I am also trying to make Open MPI work.  I have openmpi compiled and it runs 
and I can see it is using the usnic fabric, however it only runs with very 
small number of CPU.  Anything over about 2 cores per node it hangs 
indefinately, right after the job starts.
I have compiled Open MPI 3.1.3 from [url]https://www.open-mpi.org/[/url] 
because this is what Star CCM version I am running supports.  I am telling star 
to use the open mpi that I installed so it can support the Cisco USNIC fabric, 
which I can verify using Cisco native tools (star ships with openmpi btw 
however I'm not using it).
I am thinking that I need to tune OpenMPI, which was also requried with Intel 
MPI in order to run with out indefinite hang.
With Intel MPI prior to tuning, jobs with more than about 100 cores would hang 
forever until I added these parameters:
reference: 
[url]https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/542591[/url]reference:
 
[url]https://software.intel.com/en-us/articles/tuning-the-intel-mpi-library-advanced-techniques[/url]
export I_MPI_DAPL_UD_SEND_BUFFER_NUM=8208export 
I_MPI_DAPL_UD_RECV_BUFFER_NUM=8208export 
I_MPI_DAPL_UD_ACK_SEND_POOL_SIZE=8704export 
I_MPI_DAPL_UD_ACK_RECV_POOL_SIZE=8704export I_MPI_DAPL_UD_RNDV_EP_NUM=2export 
I_MPI_DAPL_UD_REQ_EVD_SIZE=2000export I_MPI_DAPL_UD_MAX_MSG_SIZE=4096export 
I_MPI_DAPL_UD_DIRECT_COPY_THRESHOLD=2147483647
After adding these parms I can scale to 273 cores and it runs very fast, up 
until the point where it gets the floating point exception about 5 hours into 
the job.
I am struggling trying to find equivelant turning parms for Open MPI.
I have listed all the MCA available with Open using MCA, and have tried setting 
these parms with no success, I may not have the equivalent parms listed here, 
this is what I have tried.
btl_max_send_size = 4096btl_usnic_eager_limit = 
2147483647btl_usnic_rndv_eager_limit = 2147483647btl_usnic_sd_num = 
8208btl_usnic_rd_num = 8208btl_usnic_prio_sd_num = 8704btl_usnic_prio_rd_num = 
8704btl_usnic_pack_lazy_threshold = -1

Does anyone have any advice or ideas for:
1.) The floating point overflow issueand   2.)  Know of equivelant tuning parms 
for Open MPI 
Many thanks in advance!
-Logan___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users