https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82495
Bug ID: 82495 Summary: forall is very slow comparing to other compilers! Product: gcc Version: 7.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: chinoune.mehdi at hotmail dot com Target Milestone: --- This forall construct is very slow comparing to other compilers: PROGRAM FORALL_EXECUTION IMPLICIT NONE REAL, ALLOCATABLE :: A(:,:,:), B(:,:,:), C(:,:,:) INTEGER :: I, J, K REAL :: TIC INTEGER :: START, FINISH1, FINISH2, FINISH3, FINISH4, FINISH5, FINISH6 INTEGER, PARAMETER :: L = 1024, M = 512, N = 512 ALLOCATE(A(L,M,N), B(L,M,N), C(L,M,N) ) CALL RANDOM_NUMBER(A) CALL RANDOM_NUMBER(B) CALL RANDOM_NUMBER(C) CALL SYSTEM_CLOCK(START, TIC) FORALL(I=1:L, J=1:M, K=1:N) C(I,J,K) = A(I,J,K) + B(I,J,K) END FORALL CALL SYSTEM_CLOCK(FINISH1) PRINT*,'I,J,K ',(FINISH1-START)/TIC FORALL(I=1:L, K=1:N, J=1:M) C(I,J,K) = A(I,J,K) +B(I,J,K) END FORALL CALL SYSTEM_CLOCK(FINISH2) PRINT*,'I,K,J ',(FINISH2-FINISH1)/TIC FORALL(J=1:M, I=1:L, K=1:N) C(I,J,K) = A(I,J,K) +B(I,J,K) END FORALL CALL SYSTEM_CLOCK(FINISH3) PRINT*,'J,I,K ',(FINISH3-FINISH2)/TIC FORALL(J=1:M, K=1:N, I=1:L) C(I,J,K) = A(I,J,K) +B(I,J,K) END FORALL CALL SYSTEM_CLOCK(FINISH4) PRINT*,'J,K,I ',(FINISH4-FINISH3)/TIC FORALL(K=1:N, I=1:L, J=1:M) C(I,J,K) = A(I,J,K) +B(I,J,K) END FORALL CALL SYSTEM_CLOCK(FINISH5) PRINT*,'K,I,J ',(FINISH5-FINISH4)/TIC FORALL(K=1:N, J=1:M, I=1:L) C(I,J,K) = A(I,J,K) +B(I,J,K) END FORALL CALL SYSTEM_CLOCK(FINISH6) PRINT*,'K,J,I ',(FINISH6-FINISH5)/TIC END PROGRAM This program gives: MinGW 7.2.0 : -Ofast I,J,K 0.453000009 I,K,J 0.531000018 J,I,K 9.34400082 J,K,I 14.0630007 K,I,J 24.0460014 K,J,I 25.9690018 Ubuntu 7.2.0 : -Ofast I,J,K 0.454000026 I,K,J 0.441000015 J,I,K 9.14200020 J,K,I 13.2140007 K,I,J 22.3860016 K,J,I 24.7680016 But with other compilers: PGI Fortran 17.04 : -fast I,J,K 0.5161750 I,K,J 0.3963450 J,I,K 0.2786350 J,K,I 0.3162010 K,I,J 0.3141180 K,J,I 0.2789040 Flang : -Ofast I,J,K 0.4740010 I,K,J 0.2965370 J,I,K 0.3045340 J,K,I 0.4017220 K,I,J 0.2853640 K,J,I 0.3081510 Intel Fortran 18.0 : -fast I,J,K 0.4370000 I,K,J 0.3910000 J,I,K 0.2810000 J,K,I 0.3600000 K,I,J 0.3280000 K,J,I 0.2810000 I think this bug is independent of do-concurrent!