[ 
https://issues.apache.org/jira/browse/SYSTEMML-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm closed SYSTEMML-2487.
------------------------------------
       Resolution: Fixed
         Assignee: Matthias Boehm
    Fix Version/s: SystemML 1.2

> Native Dnn operations crashing in over-provisioned parfor
> ---------------------------------------------------------
>
>                 Key: SYSTEMML-2487
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2487
>             Project: SystemML
>          Issue Type: Bug
>            Reporter: Matthias Boehm
>            Assignee: Matthias Boehm
>            Priority: Major
>             Fix For: SystemML 1.2
>
>
> In case parfor does not consume all the available parallelism, we propagate 
> this parallelism down to individual operations with slight (max 50%) 
> overprovisioning. For example, if we have 80vcores, and parfor is assigned 
> k=47, we still assign k=2 to individual operations. 
> However, with native DNN operations this causes JVM crashes as follows:
> {code}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGFPE (0x8) at pc=0x00007f5de21902d6, pid=335027, tid=0x00007f5df8bcb700
> #
> # JRE version: OpenJDK Runtime Environment (8.0_161-b14) (build 1.8.0_161-b14)
> # Java VM: OpenJDK 64-Bit Server VM (25.161-b14 mixed mode linux-amd64 )
> # Problematic frame:
> # C  [libmkl_avx512.so+0x206d2d6][thread 140041622857472 also had an error]
>   mkl_dnn_avx512_bkdGemmDirectConv_F64+0x276
> {code}
> Hence, when native BLAS or DNN libraries are loaded, we should be more 
> conservative and not over-provision at all. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to