[ https://issues.apache.org/jira/browse/SYSTEMML-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matthias Boehm closed SYSTEMML-2487. ------------------------------------ Resolution: Fixed Assignee: Matthias Boehm Fix Version/s: SystemML 1.2 > Native Dnn operations crashing in over-provisioned parfor > --------------------------------------------------------- > > Key: SYSTEMML-2487 > URL: https://issues.apache.org/jira/browse/SYSTEMML-2487 > Project: SystemML > Issue Type: Bug > Reporter: Matthias Boehm > Assignee: Matthias Boehm > Priority: Major > Fix For: SystemML 1.2 > > > In case parfor does not consume all the available parallelism, we propagate > this parallelism down to individual operations with slight (max 50%) > overprovisioning. For example, if we have 80vcores, and parfor is assigned > k=47, we still assign k=2 to individual operations. > However, with native DNN operations this causes JVM crashes as follows: > {code} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGFPE (0x8) at pc=0x00007f5de21902d6, pid=335027, tid=0x00007f5df8bcb700 > # > # JRE version: OpenJDK Runtime Environment (8.0_161-b14) (build 1.8.0_161-b14) > # Java VM: OpenJDK 64-Bit Server VM (25.161-b14 mixed mode linux-amd64 ) > # Problematic frame: > # C [libmkl_avx512.so+0x206d2d6][thread 140041622857472 also had an error] > mkl_dnn_avx512_bkdGemmDirectConv_F64+0x276 > {code} > Hence, when native BLAS or DNN libraries are loaded, we should be more > conservative and not over-provision at all. -- This message was sent by Atlassian JIRA (v7.6.3#76005)