Re: [PATCH] Fix PR53295
On 05/11/2012 01:59 PM, Richard Guenther wrote: This fixes the dependency of vectorization of strided loads on gather support. For that to work we need to lift the restriction in data-ref analysis that requries a constant DR_STEP. Fortunately fallout is small. Would this also vectorize strided loops when the architecture doesn't have a gather instruction ? If so, it doesn't work for the attached case, which *does* vectorize with a gather instruction: $ /tmp/c/bin/gfortran -g -O3 -ftree-vectorizer-verbose=2 -mavx2 -S verintlin.f Analyzing loop at verintlin.f:68 Analyzing loop at verintlin.f:69 Vectorizing loop at verintlin.f:69 69: LOOP VECTORIZED. verintlin.f:1: note: vectorized 1 loops in function. whereas: $ /tmp/c/bin/gfortran -g -O3 -ftree-vectorizer-verbose=2 -mavx -S verintlin.f Analyzing loop at verintlin.f:68 Analyzing loop at verintlin.f:69 69: not vectorized: not suitable for gather load D.2051_74 = *parg_73(D)[D.2050_72]; 69: not vectorized: not suitable for gather load D.2051_74 = *parg_73(D)[D.2050_72]; verintlin.f:1: note: vectorized 0 loops in function. -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news SUBROUTINE VERINT ( I KLON , KLAT , KLEV , KINT , KHALO I , KLON1 , KLON2 , KLAT1 , KLAT2 I , KP , KQ , KR R , PARG , PRES R , PALFH , PBETH R , PALFA , PBETA , PGAMA ) C C*** C C VERINT - THREE DIMENSIONAL INTERPOLATION C C PURPOSE: C C THREE DIMENSIONAL INTERPOLATION C C INPUT PARAMETERS: C C KLON NUMBER OF GRIDPOINTS IN X-DIRECTION C KLAT NUMBER OF GRIDPOINTS IN Y-DIRECTION C KLEV NUMBER OF VERTICAL LEVELS C KINT TYPE OF INTERPOLATION C= 1 - LINEAR C= 2 - QUADRATIC C= 3 - CUBIC C= 4 - MIXED CUBIC/LINEAR C KLON1 FIRST GRIDPOINT IN X-DIRECTION C KLON2 LAST GRIDPOINT IN X-DIRECTION C KLAT1 FIRST GRIDPOINT IN Y-DIRECTION C KLAT2 LAST GRIDPOINT IN Y-DIRECTION C KPARRAY OF INDEXES FOR HORIZONTAL DISPLACEMENTS C KQARRAY OF INDEXES FOR HORIZONTAL DISPLACEMENTS C KRARRAY OF INDEXES FOR VERTICAL DISPLACEMENTS C PARG ARRAY OF ARGUMENTS C PALFH ALFA HAT C PBETH BETA HAT C PALFA ARRAY OF WEIGHTS IN X-DIRECTION C PBETA ARRAY OF WEIGHTS IN Y-DIRECTION C PGAMA ARRAY OF WEIGHTS IN VERTICAL DIRECTION C C OUTPUT PARAMETERS: C C PRES INTERPOLATED FIELD C C HISTORY: C C J.E. HAUGEN 1 1992 C C*** C IMPLICIT NONE C INTEGER KLON , KLAT , KLEV , KINT , KHALO, IKLON1 , KLON2 , KLAT1 , KLAT2 C INTEGER KP(KLON,KLAT), KQ(KLON,KLAT), KR(KLON,KLAT) REALPARG(2-KHALO:KLON+KHALO-1,2-KHALO:KLAT+KHALO-1,KLEV) , RPRES(KLON,KLAT) , R PALFH(KLON,KLAT) , PBETH(KLON,KLAT) , R PALFA(KLON,KLAT,4) , PBETA(KLON,KLAT,4), R PGAMA(KLON,KLAT,4) C INTEGER JX, JY, IDX, IDY, ILEV REAL Z1MAH, Z1MBH C C LINEAR INTERPOLATION C DO JY = KLAT1,KLAT2 DO JX = KLON1,KLON2 IDX = KP(JX,JY) IDY = KQ(JX,JY) ILEV = KR(JX,JY) C PRES(JX,JY) = PGAMA(JX,JY,1)*( C + PBETA(JX,JY,1)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY-1,ILEV-1) + + PALFA(JX,JY,2)*PARG(IDX ,IDY-1,ILEV-1) ) + + PBETA(JX,JY,2)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY ,ILEV-1) + + PALFA(JX,JY,2)*PARG(IDX ,IDY ,ILEV-1) ) ) C+ + + PGAMA(JX,JY,2)*( C+ + PBETA(JX,JY,1)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY-1,ILEV ) + + PALFA(JX,JY,2)*PARG(IDX ,IDY-1,ILEV ) ) + + PBETA(JX,JY,2)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY ,ILEV ) + + PALFA(JX,JY,2)*PARG(IDX ,IDY ,ILEV ) ) ) ENDDO ENDDO C RETURN END
Re: [PATCH] Fix PR53295
On Sat, May 12, 2012 at 9:53 AM, Toon Moene t...@moene.org wrote: On 05/11/2012 01:59 PM, Richard Guenther wrote: This fixes the dependency of vectorization of strided loads on gather support. For that to work we need to lift the restriction in data-ref analysis that requries a constant DR_STEP. Fortunately fallout is small. Would this also vectorize strided loops when the architecture doesn't have a gather instruction ? gather is different from strided loops. Gather is a[b[i]] while strided loops are for (i=0;; i+=stride) ...= a[i] with stride being non-constant. Your testcase requires gather support. Richard. If so, it doesn't work for the attached case, which *does* vectorize with a gather instruction: $ /tmp/c/bin/gfortran -g -O3 -ftree-vectorizer-verbose=2 -mavx2 -S verintlin.f Analyzing loop at verintlin.f:68 Analyzing loop at verintlin.f:69 Vectorizing loop at verintlin.f:69 69: LOOP VECTORIZED. verintlin.f:1: note: vectorized 1 loops in function. whereas: $ /tmp/c/bin/gfortran -g -O3 -ftree-vectorizer-verbose=2 -mavx -S verintlin.f Analyzing loop at verintlin.f:68 Analyzing loop at verintlin.f:69 69: not vectorized: not suitable for gather load D.2051_74 = *parg_73(D)[D.2050_72]; 69: not vectorized: not suitable for gather load D.2051_74 = *parg_73(D)[D.2050_72]; verintlin.f:1: note: vectorized 0 loops in function. -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Re: [PATCH] Fix PR53295
On 05/12/2012 12:36 PM, Richard Guenther wrote: On Sat, May 12, 2012 at 9:53 AM, Toon Moenet...@moene.org wrote: On 05/11/2012 01:59 PM, Richard Guenther wrote: This fixes the dependency of vectorization of strided loads on gather support. For that to work we need to lift the restriction in data-ref analysis that requries a constant DR_STEP. Fortunately fallout is small. Would this also vectorize strided loops when the architecture doesn't have a gather instruction ? gather is different from strided loops. Gather is a[b[i]] while strided loops are for (i=0;; i+=stride) ...= a[i] with stride being non-constant. Your testcase requires gather support. Yep, apparently I didn't read your explanation correctly. On the other hand, I'm wondering if - in the absence of a gather *instruction* - one could do a gather-by-hand, i.e., load 8 32-bit floating point values in a (temporary) consecutive buffer, then load it into a vector register ... -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Re: [PATCH] Fix PR53295
On Sat, May 12, 2012 at 1:39 PM, Toon Moene t...@moene.org wrote: On 05/12/2012 12:36 PM, Richard Guenther wrote: On Sat, May 12, 2012 at 9:53 AM, Toon Moenet...@moene.org wrote: On 05/11/2012 01:59 PM, Richard Guenther wrote: This fixes the dependency of vectorization of strided loads on gather support. For that to work we need to lift the restriction in data-ref analysis that requries a constant DR_STEP. Fortunately fallout is small. Would this also vectorize strided loops when the architecture doesn't have a gather instruction ? gather is different from strided loops. Gather is a[b[i]] while strided loops are for (i=0;; i+=stride) ...= a[i] with stride being non-constant. Your testcase requires gather support. Yep, apparently I didn't read your explanation correctly. On the other hand, I'm wondering if - in the absence of a gather *instruction* - one could do a gather-by-hand, i.e., load 8 32-bit floating point values in a (temporary) consecutive buffer, then load it into a vector register ... Sure - gather and non-constant stride support is somewhat related. We are also currently missing to handle non-power-of-two constant strides (which can simply use the non-constant stride path as well). Richard. -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news