[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2016-01-01 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

Dominique d'Humieres  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #48 from Dominique d'Humieres  ---
> Is there anything left in this PR or could it be closed as FIXED?

No feedback, closing. Please open new PR(s) for remaining issue(s).

[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2015-12-06 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

Dominique d'Humieres  changed:

   What|Removed |Added

 Status|NEW |WAITING

--- Comment #47 from Dominique d'Humieres  ---
Is there anything left in this PR or could it be closed as FIXED?

[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2011-07-29 Thread burnus at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

--- Comment #46 from Tobias Burnus  2011-07-29 
07:12:47 UTC ---
(In reply to comment #45)
[Commit to inline MINLOC/MAXLOC for a rank-1 array, which returns a
single-element rank-1 array.]


On my ~5 year old Athlon64 x2, I get with "-Ofast -march=native" (and with or
without -fwhole-program) a performance improvement of 3% (10.72s -> 10.41s).


The performance should further improve, if one fuses the loops (cf. comment 42,
but also comment 34 ff.) - and if one could move the memory allocation/freeing
of the automatic-array DTEMP out of the loop (after inlining). (Recall that
with -fstack-arrays/-Ofast, automatic arrays are allocated on the stack.)


As already mentioned indirectly in comment 0 (via PR31066): If one uses
reciprocal
approximation instructions and a Newton-Rhapson step (namely: -mrecip), the
performance improves a lot: 6.895s. By comparison, with ifort (11.1) -xHost -O3
the run time is 7.319s.


[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2011-07-28 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

--- Comment #45 from Jakub Jelinek  2011-07-28 
20:56:57 UTC ---
Author: jakub
Date: Thu Jul 28 20:56:50 2011
New Revision: 176897

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=176897
Log:
PR fortran/31067
* frontend-passes.c (optimize_minmaxloc): New function.
(optimize_expr): Call it.

* gfortran.dg/maxloc_2.f90: New test.
* gfortran.dg/maxloc_3.f90: New test.
* gfortran.dg/minloc_1.f90: New test.
* gfortran.dg/minloc_2.f90: New test.
* gfortran.dg/minloc_3.f90: New test.
* gfortran.dg/minmaxloc_7.f90: New test.

Added:
trunk/gcc/testsuite/gfortran.dg/maxloc_2.f90
trunk/gcc/testsuite/gfortran.dg/maxloc_3.f90
trunk/gcc/testsuite/gfortran.dg/minloc_1.f90
trunk/gcc/testsuite/gfortran.dg/minloc_2.f90
trunk/gcc/testsuite/gfortran.dg/minloc_3.f90
trunk/gcc/testsuite/gfortran.dg/minmaxloc_7.f90
Modified:
trunk/gcc/fortran/ChangeLog
trunk/gcc/fortran/frontend-passes.c
trunk/gcc/testsuite/ChangeLog


[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2011-07-28 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

Jakub Jelinek  changed:

   What|Removed |Added

  Attachment #24856|0   |1
is obsolete||

--- Comment #44 from Jakub Jelinek  2011-07-28 
16:00:54 UTC ---
Created attachment 24858
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24858
gcc47-pr31067.patch

Based on IRC discussions, this patch instead replaces
MINLOC (rank1)
with
/( MINLOC (rank1, DIM=1) )/
so that it should work even with allocatable LHS that should be reallocated on
assignment etc.  As an bonus, even e.g.
if (any (minloc (rank1).ne.6))
etc. can be optimized.  So far tested just with dg.exp=m*.f90


[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2011-07-28 Thread jakub at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

--- Comment #43 from Jakub Jelinek  2011-07-28 
13:51:32 UTC ---
Created attachment 24856
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24856
gcc47-pr31067.patch

Patch to optimize a = minloc (b) for rank 1 b into a = minloc (b, dim = 1),
according to the standard the latter function is supposed to return the first
element of the array returned by former function (which should return a rank 1,
 1 element array).  So, by instead initializing the (one element) array with
the scalar we can get it actually inlined.  Doing this during genericization
looked much harder to me.


[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2011-07-25 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067

--- Comment #42 from Richard Guenther  2011-07-25 
15:39:54 UTC ---
With gas_dyn changed to use MINLOC (DTEMP, 1) we now inline the intrinsic
(but not with MINLOC (DTEMP), even though we know it'll be a single-element
array result ...).

We completely lack a way to fuse the loops though.  Inlining the intrinsic
gives a moderate 5% speedup.


[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-28 Thread irar at il dot ibm dot com


--- Comment #41 from irar at il dot ibm dot com  2009-07-28 08:12 ---
That requires pattern recognition. MIN/MAX_EXPR are recognized by the first
phiopt pass, so MIN/MAXLOC should be either also recognized there or in the
vectorizer. (The phiopt pass transforms if clause to MIN/MAX_EXPR. The
vectorizer gets COND_EXPR after if-conversion pass).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-27 Thread jakub at gcc dot gnu dot org


--- Comment #40 from jakub at gcc dot gnu dot org  2009-07-27 14:51 ---
If the cond_expr compute a minimum or maximum and the other cond_exprs compute
something based on the IV at the extremum, then I don't see why it couldn't be
vectorized by computing extremes of odd/even and corresponding values based on
the IV at those points, then merging them in the final step, and similarly for
bigger vectorization steps.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-27 Thread burnus at gcc dot gnu dot org


--- Comment #39 from burnus at gcc dot gnu dot org  2009-07-27 13:15 ---
(In reply to comment #38)
> However, the loop can be split: [..]
> making the first loop vectorizable (inner-most loop vectorization).

OK. I tried it with a Fortran program:
http://users.physik.fu-berlin.de/~tburnus/tmp/vect-PR31067/maxloc.f90
http://users.physik.fu-berlin.de/~tburnus/tmp/vect-PR31067/maxloc2.f90
http://users.physik.fu-berlin.de/~tburnus/tmp/vect-PR31067/maxloc3.f90

maxloc.f90 is the program from comment 34  (maxloc.s = intel assembler)
maxloc2.f90 models what gfortran does for maxloc (maxloc.s = intel assembler)
maxloc3.f90 models what has a split loop

The splitting plus vectorization makes the calculation 5% faster - 0m2.152s
(maxloc3) vs 0m2.260s (maxloc2). Still, that's 35% more than ifort needs.

For some reason, maxloc2 with -fno-tree-vectorize takes only
0m1.840s.(Identical to intel's result for maxloc2/maxloc3. While for the
original maxloc.f90, there is no performance effect, and for maxloc3
vectorization makes it faster.)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-27 Thread irar at il dot ibm dot com


--- Comment #38 from irar at il dot ibm dot com  2009-07-27 12:44 ---
I am not sure that that kind of computation can be generated automatically,
since in general the order of caclulation of cond_expr cannot be changed. 

However, the loop can be split:

  for (i = 0; i < end; i++)
if (arr[i] < limit)
  limit = arr[i];

  for (i = 0; i < end; i++)
if (arr[i] == limit)
  {
pos = i + 1;
break;
  }

making the first loop vectorizable (inner-most loop vectorization).

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-27 Thread jakub at gcc dot gnu dot org


--- Comment #37 from jakub at gcc dot gnu dot org  2009-07-27 11:10 ---
Oh, and on 64-bit arches and float or 32-bit arches and double there is another
complication - the comparison has different mode size from the cond_expr for
pos.  For 32-bit pos and 64-bit double it could perhaps just do the computation
in 64-bit integers (vector of 2 (resp. 4 for avx)), for the other case it would
need to shuffle the max and compute the pos in 2 vectors, or e.g. the Fortran
FE could hand in the common case, emit a likely look for the case where array
size is smaller than 4GB, using 32-bit pos zero extended to 64-bits at the end.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-27 Thread jakub at gcc dot gnu dot org


--- Comment #36 from jakub at gcc dot gnu dot org  2009-07-27 11:02 ---
Here is the loop in C and vectorized by hand as well:
#include 

float arr[1024];

unsigned int
foo (unsigned int end)
{
  unsigned int pos = 1;
  unsigned int i;
  float limit = __FLT_MAX__;
  for (i = 0; i < end; i++)
if (arr[i] < limit)
  {
limit = arr[i];
pos = i + 1;
  }
  return pos;
}

unsigned int
bar (unsigned int end)
{
  __m128 pos = (__m128) _mm_set1_epi32 (1);
  __m128 limit = _mm_set1_ps (__FLT_MAX__);
  __m128i curi = _mm_set_epi32 (4, 3, 2, 1);
  __m128i inc = _mm_set1_epi32 (4);
  unsigned int i = 0;
  if (end > 4)
{
  for (; i < end - 4; i += 4)
{
  __m128 val = _mm_loadu_ps (arr + i);
  __m128 mask = _mm_cmplt_ps (val, limit);
  limit = _mm_min_ps (limit, val);
  pos = _mm_andnot_ps (mask, pos);
  pos = _mm_or_ps (pos, _mm_and_ps (mask, (__m128) curi));
  curi = _mm_add_epi32 (curi, inc);
}
  /* Reduction.  */
  __m128 tmp1 = _mm_movehl_ps (limit, limit);
  __m128 tmp2 = _mm_movehl_ps (pos, pos);
  __m128 mask = _mm_cmplt_ps (tmp1, limit);
  limit = _mm_min_ps (tmp1, limit);
  tmp2 = _mm_and_ps (mask, tmp2);
  pos = _mm_or_ps (tmp2, _mm_andnot_ps (mask, pos));
  tmp1 = _mm_shuffle_ps (limit, limit, _MM_SHUFFLE (1, 1, 1, 1));
  tmp2 = _mm_shuffle_ps (pos, pos, _MM_SHUFFLE (1, 1, 1, 1));
  mask = _mm_cmplt_ps (tmp1, limit);
  limit = _mm_min_ps (tmp1, limit);
  tmp2 = _mm_and_ps (mask, tmp2);
  pos = _mm_or_ps (tmp2, _mm_andnot_ps (mask, pos));
}
  float limit_ = _mm_cvtss_f32 (limit);
  unsigned int pos_ = (unsigned int) _mm_cvtsi128_si32 ((__m128i) pos);
  for (; i < end; i++)
if (arr[i] < limit_)
  {
limit_ = arr[i];
pos_ = i + 1;
  }
  return pos_;
}

int
main (void)
{
  unsigned int k;
  arr[0] = -1;
  arr[2] = -3;
  arr[8] = -5;
  arr[9] = -6;
  if (foo (32) != bar (32))
__builtin_abort ();
  for (k = 10; k < 32; k++)
{
  arr[k] = -k;
  if (foo (32) != bar (32))
__builtin_abort ();
}
  return 0;
}

Don't know how hard would be to vectorize this in the vectorizer, but clearly
icc manages to handle that.
The loop is:
:
  # pos_22 = PHI 
  # i_23 = PHI 
  # limit_24 = PHI 
  limit_11 = arr[i_23];
  D.2700_12 = limit_11 < limit_24;
  pos_1 = [cond_expr] D.2700_12 ? i_23 : pos_22;
  limit_4 = [cond_expr] D.2700_12 ? limit_11 : limit_24;
  i_15 = i_23 + 1;
  D.2703_9 = (long unsigned int) i_15;
  if (D.2703_9 < end_10(D))
goto ;
  else
goto ;

:
  goto ;
before vectorization.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-27 Thread burnus at gcc dot gnu dot org


--- Comment #35 from burnus at gcc dot gnu dot org  2009-07-27 09:18 ---
(In reply to comment #34)
> Does ifort vectorize the exact same implemantion of minloc?

I tried to convert the minloc implementation into Fortran loops - and the
result is at
http://users.physik.fu-berlin.de/~tburnus/tmp/vect-PR31067/maxloc2.f90

$ ifort -O3 -xHost -diag-enable all maxloc2.f90
maxloc2.f90(25): (col. 5) remark: LOOP WAS VECTORIZED.
[timing: 0m1.384s]

$ gfortran -O3 -ffast-math -march=native -ftree-vectorize
-ftree-vectorizer-verbose=5 maxloc2.f90
maxloc2.f90:1: note: vectorized 0 loops in function.
[timing: 0m2.212s]

In case it helps: I put the ifort assembler output at:
http://users.physik.fu-berlin.de/~tburnus/tmp/vect-PR31067/maxloc2.s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-27 Thread irar at il dot ibm dot com


--- Comment #34 from irar at il dot ibm dot com  2009-07-27 08:36 ---
(In reply to comment #33)
> Using the example from comment 23 with
...
> gfortran shows: test.f90:12: note: not vectorized: unsupported use in stmt.
> and needs 2.272s. (By comparison. 4.4 needs 3.688s.)

This is for the inner loop vectorization. For the outer loop we get:
tmp.f90:11: note: not vectorized: control flow in loop.
because of the if's. Maybe loop unswitching can help us. 
Vectorizable outer-loops look like this:

(pre-header)
   |
  header <---+
   | |
  inner-loop |
   | |
  tail --+
   |
(exit-bb)


Does ifort vectorize the exact same implemantion of minloc?

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-26 Thread burnus at gcc dot gnu dot org


--- Comment #33 from burnus at gcc dot gnu dot org  2009-07-26 09:50 ---
(In reply to comment #32)
> > Regarding the just committed inline version: It would be interesting to know
> > whether it is vectorizable (with/without -ffinite-math-only [i.e.
> > -ffast-math]).
> 
> It depends on where it is inlined. It has to be vectorized in outer loop (see
> my previous comment), so it needs another loop around it.

Using the example from comment 23 with
a) gfortran -O3 -ffast-math -march=native -ftree-vectorize
-ftree-vectorizer-verbose=5
b) ifort -O3 -xHost -diag-enable all

ifort shows: test.f90(12): (col. 9) remark: LOOP WAS VECTORIZED.
and needs 1.476s.

gfortran shows: test.f90:12: note: not vectorized: unsupported use in stmt.
and needs 2.272s. (By comparison. 4.4 needs 3.688s.)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-26 Thread irar at il dot ibm dot com


--- Comment #32 from irar at il dot ibm dot com  2009-07-26 07:48 ---
(In reply to comment #30)
> Regarding the just committed inline version: It would be interesting to know
> whether it is vectorizable (with/without -ffinite-math-only [i.e.
> -ffast-math]).

It depends on where it is inlined. It has to be vectorized in outer loop (see
my previous comment), so it needs another loop around it.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-24 Thread jakub at gcc dot gnu dot org


--- Comment #31 from jakub at gcc dot gnu dot org  2009-07-24 08:30 ---
Vectorization questions I'll defer to Ira.

For !optimize I even had a change to forcibly use the function call instead of
inline version.  But it didn't really work, as there are only array versions of
the library functions.  In case of minloc/maxloc we could just set up a fake
array descriptor for one element array and call the library functions (the
*loc0 ones).  But minval/maxval library functions only handle DIM=N cases, not
finding a minimum/maximum of the whole array.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-24 Thread burnus at gcc dot gnu dot org


--- Comment #30 from burnus at gcc dot gnu dot org  2009-07-24 08:19 ---
Regarding the just committed inline version: It would be interesting to know
whether it is vectorizable (with/without -ffinite-math-only [i.e.
-ffast-math]).

Additionally, for size-1 result arrays, the function should be inlined except
for -O0 and -Os. That affects especially {min,max}loc as for {min,max}val this
looks like a very special case.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-24 Thread jakub at gcc dot gnu dot org


--- Comment #29 from jakub at gcc dot gnu dot org  2009-07-24 07:57 ---
Subject: Bug 31067

Author: jakub
Date: Fri Jul 24 07:57:13 2009
New Revision: 150041

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=150041
Log:
PR fortran/40643
PR fortran/31067
* trans-intrinsic.c (gfc_conv_intrinsic_minmaxloc,
gfc_conv_intrinsic_minmaxval): Handle Infinities and NaNs properly,
optimize.
* trans-array.c (gfc_trans_scalarized_loop_end): No longer static.
* trans-array.h (gfc_trans_scalarized_loop_end): New prototype.

* libgfortran.h (GFC_REAL_4_INFINITY, GFC_REAL_8_INFINITY,
GFC_REAL_10_INFINITY, GFC_REAL_16_INFINITY, GFC_REAL_4_QUIET_NAN,
GFC_REAL_8_QUIET_NAN, GFC_REAL_10_QUIET_NAN, GFC_REAL_16_QUIET_NAN):
Define.
* m4/iparm.m4 (atype_inf, atype_nan): Define.
* m4/ifunction.m4: Formatting.
* m4/iforeach.m4: Likewise.
(START_FOREACH_FUNCTION): Initialize dest to all 1s, not all 0s.
(START_FOREACH_BLOCK, FINISH_FOREACH_FUNCTION,
FINISH_MASKED_FOREACH_FUNCTION): Run foreach block inside a loop
until count[0] == extent[0].
* m4/minval.m4: Formatting.  Handle NaNs and infinities.  Optimize.
* m4/maxval.m4: Likewise.
* m4/minloc0.m4: Likewise.
* m4/maxloc0.m4: Likewise.
* m4/minloc1.m4: Likewise.
* m4/maxloc1.m4: Likewise.
* generated/maxloc0_16_i16.c: Regenerated.
* generated/maxloc0_16_i1.c: Likewise.
* generated/maxloc0_16_i2.c: Likewise.
* generated/maxloc0_16_i4.c: Likewise.
* generated/maxloc0_16_i8.c: Likewise.
* generated/maxloc0_16_r10.c: Likewise.
* generated/maxloc0_16_r16.c: Likewise.
* generated/maxloc0_16_r4.c: Likewise.
* generated/maxloc0_16_r8.c: Likewise.
* generated/maxloc0_4_i16.c: Likewise.
* generated/maxloc0_4_i1.c: Likewise.
* generated/maxloc0_4_i2.c: Likewise.
* generated/maxloc0_4_i4.c: Likewise.
* generated/maxloc0_4_i8.c: Likewise.
* generated/maxloc0_4_r10.c: Likewise.
* generated/maxloc0_4_r16.c: Likewise.
* generated/maxloc0_4_r4.c: Likewise.
* generated/maxloc0_4_r8.c: Likewise.
* generated/maxloc0_8_i16.c: Likewise.
* generated/maxloc0_8_i1.c: Likewise.
* generated/maxloc0_8_i2.c: Likewise.
* generated/maxloc0_8_i4.c: Likewise.
* generated/maxloc0_8_i8.c: Likewise.
* generated/maxloc0_8_r10.c: Likewise.
* generated/maxloc0_8_r16.c: Likewise.
* generated/maxloc0_8_r4.c: Likewise.
* generated/maxloc0_8_r8.c: Likewise.
* generated/maxloc1_16_i16.c: Likewise.
* generated/maxloc1_16_i1.c: Likewise.
* generated/maxloc1_16_i2.c: Likewise.
* generated/maxloc1_16_i4.c: Likewise.
* generated/maxloc1_16_i8.c: Likewise.
* generated/maxloc1_16_r10.c: Likewise.
* generated/maxloc1_16_r16.c: Likewise.
* generated/maxloc1_16_r4.c: Likewise.
* generated/maxloc1_16_r8.c: Likewise.
* generated/maxloc1_4_i16.c: Likewise.
* generated/maxloc1_4_i1.c: Likewise.
* generated/maxloc1_4_i2.c: Likewise.
* generated/maxloc1_4_i4.c: Likewise.
* generated/maxloc1_4_i8.c: Likewise.
* generated/maxloc1_4_r10.c: Likewise.
* generated/maxloc1_4_r16.c: Likewise.
* generated/maxloc1_4_r4.c: Likewise.
* generated/maxloc1_4_r8.c: Likewise.
* generated/maxloc1_8_i16.c: Likewise.
* generated/maxloc1_8_i1.c: Likewise.
* generated/maxloc1_8_i2.c: Likewise.
* generated/maxloc1_8_i4.c: Likewise.
* generated/maxloc1_8_i8.c: Likewise.
* generated/maxloc1_8_r10.c: Likewise.
* generated/maxloc1_8_r16.c: Likewise.
* generated/maxloc1_8_r4.c: Likewise.
* generated/maxloc1_8_r8.c: Likewise.
* generated/maxval_i16.c: Likewise.
* generated/maxval_i1.c: Likewise.
* generated/maxval_i2.c: Likewise.
* generated/maxval_i4.c: Likewise.
* generated/maxval_i8.c: Likewise.
* generated/maxval_r10.c: Likewise.
* generated/maxval_r16.c: Likewise.
* generated/maxval_r4.c: Likewise.
* generated/maxval_r8.c: Likewise.
* generated/minloc0_16_i16.c: Likewise.
* generated/minloc0_16_i1.c: Likewise.
* generated/minloc0_16_i2.c: Likewise.
* generated/minloc0_16_i4.c: Likewise.
* generated/minloc0_16_i8.c: Likewise.
* generated/minloc0_16_r10.c: Likewise.
* generated/minloc0_16_r16.c: Likewise.
* generated/minloc0_16_r4.c: Likewise.
* generated/minloc0_16_r8.c: Likewise.
* generated/minloc0_4_i16.c: Likewise.
* generated/minloc0_4_i1.c: Likewise.
* generated/minloc0_4_i2.c: Likewise.
* generated/minloc0_4_i4.c: Likewise.
* generated/minloc0_4_i8.c: Li

[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-20 Thread irar at il dot ibm dot com


--- Comment #28 from irar at il dot ibm dot com  2009-07-20 12:03 ---
I've just committed a patch that adds support of cond_expr in reductions in
nested cycles (http://gcc.gnu.org/ml/gcc-patches/2009-07/msg01124.html). 

cond_expr cannot be vectorized in reduction of inner-most loop, because such
reduction changes the order of computation, and that cannot be done for
cond_expr.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-04 Thread irar at il dot ibm dot com


--- Comment #27 from irar at il dot ibm dot com  2009-07-05 06:48 ---
(In reply to comment #23)
> because there are two reductions in that loop which I think the vectorizer
> cannot handle:

Actually, the vectorizer can vectorize two reductions. I think, the problem is
in cond_expr in reduction:

>   pos.0_3 = [cond_expr] D.1599_29 ? pos.0_32 : pos.0_31;
>   limit.2_5 = [cond_expr] D.1599_29 ? limit.2_22 : limit.2_8;

I'll look into it.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-03 Thread burnus at gcc dot gnu dot org


--- Comment #26 from burnus at gcc dot gnu dot org  2009-07-03 13:07 ---
> has a superfluous check || (pos.0 == 0 && (*D.1568)[S.3 + D.1569] == limit.2)
> at least for flag_finite_math_only.  If the array cannot contain Inf or NaN
> then it either has all elements == FLT_MAX, so pos will stay zero, or at
> least one is less than FLT_MAX in which case pos will be adjusted anyway.

I have not checked whether algorithm requires the check; NaN/Inf are possible,
but maybe the check is still not needed. And if it is, one could enclose it in
a if(!flag_finite_math_only) condition.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-03 Thread rguenth at gcc dot gnu dot org


--- Comment #25 from rguenth at gcc dot gnu dot org  2009-07-03 12:57 
---
Btw, the inlined minloc

  D.1570 = a.dim[0].lbound;
  D.1571 = a.dim[0].ubound;
  pos.0 = 0;
  {
integer(kind=8) S.3;

 ({ S.3 = D.1570;
while (1)
  {
 ({ if (S.3 > D.1571) goto L.3;
offset.1 = 1 - D.1570;
if ((*D.1568)[S.3 + D.1569] < limit.2 || pos.0 == 0 &&
(*D.1568)[S.3 + D.1569] == limit.2)
  {
 ({ limit.2 = (*D.1568)[S.3 + D.1569];
pos.0 = S.3 + offset.1; }) void
  }
S.3 = S.3 + 1; }) void
  }
L.3:; }) void

has a superfluous check || (pos.0 == 0 && (*D.1568)[S.3 + D.1569] == limit.2)
at least for flag_finite_math_only.  If the array cannot contain Inf or NaN
then it either has all elements == FLT_MAX, so pos will stay zero, or at
least one is less than FLT_MAX in which case pos will be adjusted anyway.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-03 Thread burnus at gcc dot gnu dot org


--- Comment #24 from burnus at gcc dot gnu dot org  2009-07-03 12:40 ---
> One issue is that
>  ISET = MINLOC (DTEMP)
> will cause GCC to assume that DTEMP is clobbered.

The problem is that while "MINLOC" is pure, we cannot use DECL_PURE_P as the
result is passed by reference:
  (void)  minloc(&isset, DTEMP)
 ^^--- result


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-03 Thread rguenth at gcc dot gnu dot org


--- Comment #23 from rguenth at gcc dot gnu dot org  2009-07-03 12:19 
---
We are not able to vectorize the loop in

program main
  implicit none
  integer, volatile, dimension(1) :: n
  real, allocatable :: a(:)
  integer :: i
  real :: t1, t2

  allocate (a(100))
  call random_number(a) ! negligible time
  !call cpu_time(t1)
  do i=1, 1000
n = minloc(a, dim=1)
  end do
  !call cpu_time(t2)
 print *, n !, t2-t1
end program main

because there are two reductions in that loop which I think the vectorizer
cannot handle:

:
  # pos.0_31 = PHI 
  # limit.2_8 = PHI 
  # S.3_74 = PHI 
  D.1593_21 = S.3_74 + pretmp.22_77;
  limit.2_22 = (*D.1568_14)[D.1593_21];
  D.1595_23 = limit.2_22 < limit.2_8;
  D.1596_24 = pos.0_31 == 0;
  D.1597_27 = limit.2_8 == limit.2_22;
  D.1598_28 = D.1597_27 & D.1596_24;
  D.1599_29 = D.1595_23 | D.1598_28;
  pos.0_32 = S.3_74 + pretmp.28_90;
  pos.0_3 = [cond_expr] D.1599_29 ? pos.0_32 : pos.0_31;
  limit.2_5 = [cond_expr] D.1599_29 ? limit.2_22 : limit.2_8;
  S.3_33 = S.3_74 + 1;
  if (S.3_33 > pretmp.22_81)
goto ;
  else
goto ;

:
  goto ;

we reduce over limit.2_5 and pos.0_3.  The intel compiler vectorizes
the function just fine.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||irar at il dot ibm dot com,
   ||rguenth at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2009-07-03 Thread rguenth at gcc dot gnu dot org


--- Comment #22 from rguenth at gcc dot gnu dot org  2009-07-03 10:00 
---
One issue is that

 ISET = MINLOC (DTEMP)

will cause GCC to assume that DTEMP is clobbered.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2008-01-02 Thread dominiq at lps dot ens dot fr


--- Comment #21 from dominiq at lps dot ens dot fr  2008-01-02 20:27 ---
> MATMUL is one distinctly possible one

Paul,
If you are interested, I have a variant of induct.f90 in which I have replaced
three dot-products by the matrix-vector product for a total disaster on all
compilers I have tried (for gfortran from 66s to 196s).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2008-01-02 Thread pault at gcc dot gnu dot org


--- Comment #20 from pault at gcc dot gnu dot org  2008-01-02 19:30 ---
(In reply to comment #19)
> gfortran does inline most array intrinsics, but only if the result is a 
> scalar.
> For most array intrinsics this isn't that much of a problem since usually one
> uses the variant that returns a scalar, but MINLOC is different in that 
> usually
> one wants to use the version that returns an array. If one implements this I
> guess it would be straightforward to replicate the solution to many other 
> array
> intrinsics as well.
> 
Janne,

In contemplating what to do with gfortran in the New Year, I have been mulling
over in-lining of array intrinsics; MATMUL is one distinctly possible one, as
are MINLOC and MAXLOC.

Cheers

Paul


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-06-27 Thread jb at gcc dot gnu dot org


--- Comment #19 from jb at gcc dot gnu dot org  2007-06-27 14:49 ---
gfortran does inline most array intrinsics, but only if the result is a scalar.
For most array intrinsics this isn't that much of a problem since usually one
uses the variant that returns a scalar, but MINLOC is different in that usually
one wants to use the version that returns an array. If one implements this I
guess it would be straightforward to replicate the solution to many other array
intrinsics as well.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-06-15 Thread tkoenig at gcc dot gnu dot org


--- Comment #18 from tkoenig at gcc dot gnu dot org  2007-06-15 20:35 
---
Too little time right now.

Unassigning myself.


-- 

tkoenig at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|tkoenig at gcc dot gnu dot  |unassigned at gcc dot gnu
   |org |dot org
 Status|ASSIGNED|NEW


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-05-18 Thread jb at gcc dot gnu dot org


--- Comment #17 from jb at gcc dot gnu dot org  2007-05-18 21:20 ---
Or even better (duh):

REAL :: DTEMP

DT = HUGE(1.0d0)
DO I = 1, NODES
DTEMP = DX(I)/(ABS(VEL(I)+SOUND(I))
IF (DTEMP < DT) THEN
DT = DTEMP
END IF
END DO


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-05-18 Thread jb at gcc dot gnu dot org


--- Comment #16 from jb at gcc dot gnu dot org  2007-05-18 21:15 ---
The critical thing with inlining array intrinsics, IMHO is to give the
optimizer more data to work with allowing it to get rid of temp arrays, perform
loop fusion or fission etc. So with a trivial benchmark like #15, you don't see
any difference, except with potentially higher optimization for user code than
libgfortran. For example in gas_dyn/chozdt:

  REAL, DIMENSION (NODES) :: DTEMP
!---
! Profile for gfortran 4.3:
! CPU_CLK_UNHALTED L2_CACHE_MISS
! samp  %runtim  samp %tot
! 59887 22.4783  1484 10.9828   :  DTEMP = DX/(ABS(VEL) + SOUND)
! ifort 9.1 profile
! 40104 16.2034  1198  8.8166   :  DTEMP = DX/(ABS(VEL) + SOUND)

  DTEMP = DX/(ABS(VEL) + SOUND)
  ISET = MINLOC (DTEMP)
  DT = DTEMP(ISET(1))

If MINLOC were inlined, perhaps a sufficiently optimizer could convert this
into the equivalent

REAL :: DTEMPMIN
INTEGER :: i

DTEMPMIN = HUGE(1.0d0)
DO I = 1, NODES
DT = DX(I)/(ABS(VEL(I)+SOUND(I))
IF (DT < DTEMPMIN) THEN
DTEMPMIN = DT
END IF
END DO
DT = DTEMPMIN

i.e. avoid the temporary array entirely. Yes, I guess this is quite a lot to
ask.


-- 

jb at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||jb at gcc dot gnu dot org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-04-02 Thread tkoenig at gcc dot gnu dot org


--- Comment #15 from tkoenig at gcc dot gnu dot org  2007-04-02 21:00 
---
The library version doesn't do too badly compared
to the inline version:

$ cat benchmark-inline.f90
program main
  implicit none
  integer, dimension(1) :: n
  real, allocatable :: a(:)
  integer :: i

  allocate (a(100))
  call random_number(a)
  do i=1, 1000
n = minloc(a, dim=1)
 end do
end program main
$ cat benchmark-library.f90
program main
  implicit none
  integer, dimension(1) :: n
  real, allocatable :: a(:)
  integer :: i

  allocate (a(100))
  call random_number(a)
  do i=1, 1000
n = minloc(a)
 end do
end program main
$ gfortran -O3 -static benchmark-inline.f90 && ./a.out && time ./a.out

real0m6.941s
user0m5.232s
sys 0m0.016s
$ gfortran -O3 -static benchmark-library.f90 && ./a.out && time ./a.out

real0m5.720s
user0m5.472s
sys 0m0.004s


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-04-02 Thread tkoenig at gcc dot gnu dot org


--- Comment #14 from tkoenig at gcc dot gnu dot org  2007-04-02 17:44 
---
I'll give this another shot.

Maybe inlining isn't even necessary for good performance...


-- 

tkoenig at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |tkoenig at gcc dot gnu dot
   |dot org |org
 Status|NEW |ASSIGNED
   Last reconfirmed|2007-03-07 21:09:53 |2007-04-02 17:44:28
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-03-26 Thread pault at gcc dot gnu dot org


--- Comment #13 from pault at gcc dot gnu dot org  2007-03-26 12:43 ---
(In reply to comment #11)
> (In reply to comment #10)
Thomas,

It does not look too bad:

Look at the tail end of array_transfer -

  gfc_trans_create_temp_array (&se->pre, &se->post, se->loop,
   info, mold_type, false, true, false);

  /* Cast the pointer to the result.  */
  tmp = gfc_conv_descriptor_data_get (info->descriptor);

Will produce a temporary array of the right dimension, together with its
descriptor.

For minmaxloc, mold_type will have to be replaced by the TREE_TYPE of the
result (gfc_array_index_type, I suppose?  or else, you will have to use
gfc_typenode_for_spec (&expr->ts);, I think). tmp will now be a pointer to your
result array.  This needs to appear fairly early on in minmaxloc, so that the
array can be set to zero to initialize it and so that the location can be
tranferred to it.  The standard checks will have to be made that ss->loop
exists etc.. - just check some of the other array valued in-line intrinsics.

Having done this, you will need to replace the line (~2049), folowing "remember
where we are", with a loop over the n dimensions (note, we do not have to
restrict ourselves to one dimension:) ).

Something like:

  /* Remember where we are.  */
  for (n = 0; n < loop.dimen; n++)
{
  pos = build_fold_indirect_ref (gfc_conv_array_data info->descriptor));
  pos = gfc_build_array_ref (pos, build_int_cst (gfc_array_index_type, n))
  gfc_add_modify_expr (&ifblock, pos, loop.loopvar[n]);
}
should bang the position into the result array, which is transferred at the end
with
  se->expr = info->descriptor;

Good luck

Cheers

Paul


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-03-26 Thread pault at gcc dot gnu dot org


--- Comment #12 from pault at gcc dot gnu dot org  2007-03-26 11:37 ---
(In reply to comment #11)
> (In reply to comment #10)

> Do you have any idea what I cold do to turn this into an array?
> All the "interesting" gfc_conv_intrinsic_* functions have the
> "if (se->ss)" statement on top.

I'll put my thinking cap on.  To first order, you have to follow the same route
as array_transfer. I'll come back to you.

Paul


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-03-13 Thread tkoenig at gcc dot gnu dot org


--- Comment #11 from tkoenig at gcc dot gnu dot org  2007-03-13 20:12 
---
(In reply to comment #10)

> Thomas, it's a bit kludgy, but why not add a constant expression = 1, if dim 
> is
> not present?

Hi Paul,

unless I'm mistaken, this would also change the rank of the
function to 0, which FX explained is wrong.

Do you have any idea what I cold do to turn this into an array?
All the "interesting" gfc_conv_intrinsic_* functions have the
"if (se->ss)" statement on top.

Cheers,
  Thomas


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-03-12 Thread pault at gcc dot gnu dot org


--- Comment #10 from pault at gcc dot gnu dot org  2007-03-12 19:04 ---
(In reply to comment #9)

> 
> As a workaround, one could always use "minloc(...,dim=1)", then
> we get the inline version.
> 
Thomas, it's a bit kludgy, but why not add a constant expression = 1, if dim is
not present?

Cheers

Paul


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-03-11 Thread tkoenig at gcc dot gnu dot org


--- Comment #9 from tkoenig at gcc dot gnu dot org  2007-03-11 19:43 ---
I have looked at this some more.  Channging gfc_conv_intrinsic_function so that
we call gfc_conv_intrinsic_minmaxloc is easy enough:

@@ -3481,7 +3481,9 @@ gfc_conv_intrinsic_function (gfc_se * se

   name = &expr->value.function.name[2];

-  if (expr->rank > 0 && !expr->inline_noncopying_intrinsic)
+  if (expr->rank > 0 && !expr->inline_noncopying_intrinsic
+  && ! (expr->rank == 1 && (isym->generic_id ==  GFC_ISYM_MINLOC
+   || isym->generic_id == GFC_ISYM_MAXLOC)))
 {
   lib = gfc_is_intrinsic_libcall (expr);
   if (lib != 0)

If we do that, we hit the  "if (se->ss)" contition on top of that function, and
we would have to handle scalarization of that one-trip loop.  I have currently
no idea how to go about that.  Simply removing the condition doesn't work :-)

As a workaround, one could always use "minloc(...,dim=1)", then
we get the inline version.

Unassigning myself.


-- 

tkoenig at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|tkoenig at gcc dot gnu dot  |unassigned at gcc dot gnu
   |org |dot org
 Status|ASSIGNED|NEW


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-03-10 Thread tkoenig at gcc dot gnu dot org


--- Comment #8 from tkoenig at gcc dot gnu dot org  2007-03-10 12:34 ---
(In reply to comment #7)
> (In reply to comment #6)
> > This makes minloc have rank 0, and allows for
> > inlining.
> 
> No, it's wrong. See F95 13.14.71: "Result Characteristics. The result is of
> type default integer.  If DIM is absent, the result is an array of rank one 
> and
> of size equal to the rank of ARRAY; otherwise, the result is of rank n-1 and
> shape ..."

You're right.  Back to the drawing board.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-03-07 Thread fxcoudert at gcc dot gnu dot org


--- Comment #7 from fxcoudert at gcc dot gnu dot org  2007-03-08 05:50 
---
(In reply to comment #6)
> This makes minloc have rank 0, and allows for
> inlining.

No, it's wrong. See F95 13.14.71: "Result Characteristics. The result is of
type default integer.  If DIM is absent, the result is an array of rank one and
of size equal to the rank of ARRAY; otherwise, the result is of rank n-1 and
shape ..."

Note in particular the example for case (i), on the next page: "The value of
MINLOC((/4,3,6,3/)) is [2]". If DIM is absent, the result is always an array of
rank 1. Only if DIM is present, and the ARRAY is of rank 1, then MINLOC is a
scalar: MINLOC((/4,3,6,3/),1) == 2


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-03-07 Thread tkoenig at gcc dot gnu dot org


--- Comment #6 from tkoenig at gcc dot gnu dot org  2007-03-07 21:29 ---
Created an attachment (id=13165)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13165&action=view)
Setting the correct rank in minloc

This makes minloc have rank 0, and allows for
inlining.  I guess we'll find out now wether the
inline code works :-)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-03-07 Thread tkoenig at gcc dot gnu dot org


--- Comment #5 from tkoenig at gcc dot gnu dot org  2007-03-07 21:09 ---
(In reply to comment #3)

> In  gfc_conv_intrinsic_function, expr->rank is 0 for minval
> and 1 for minloc (which is bogus).  I wonder where this is
> set...

To answer my own question:  This is set in gfc_resolve_minloc.
I'll try to give it a shot.


-- 

tkoenig at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||tkoenig at gcc dot gnu dot
   ||org
 AssignedTo|unassigned at gcc dot gnu   |tkoenig at gcc dot gnu dot
   |dot org |org
 Status|NEW |ASSIGNED
   Last reconfirmed|2007-03-07 12:18:10 |2007-03-07 21:09:53
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-03-07 Thread fxcoudert at gmail dot com


--- Comment #4 from fxcoudert at gmail dot com  2007-03-07 21:09 ---
Subject: Re:  MINLOC should sometimes be inlined (gas_dyn is so slw)

> In  gfc_conv_intrinsic_function, expr->rank is 0 for minval
> and 1 for minloc (which is bogus).

It's not bogus. The MINLOC is an array of rank 1, that's correct.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-03-07 Thread tkoenig at gcc dot gnu dot org


--- Comment #3 from tkoenig at gcc dot gnu dot org  2007-03-07 21:00 ---
(In reply to comment #2)

> No, because we never get into gfc_conv_intrinsic_minmaxloc. We translate the
> expression directly into a function call by calling
> gfc_conv_intrinsic_funcall() at the head of gfc_conv_intrinsic_function(),
> instead of going down the list. I wonder how SUM and PRODUCT are inlined...

In  gfc_conv_intrinsic_function, expr->rank is 0 for minval
and 1 for minloc (which is bogus).  I wonder where this is
set...


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-03-07 Thread fxcoudert at gcc dot gnu dot org


--- Comment #2 from fxcoudert at gcc dot gnu dot org  2007-03-07 12:18 
---
(In reply to comment #1)
> We do this for minval, and from glancing at
> gfc_conv_intrinsic_minmaxval and gfc_conv_intrinsic_minmaxloc,
> it should happen already.

No, because we never get into gfc_conv_intrinsic_minmaxloc. We translate the
expression directly into a function call by calling
gfc_conv_intrinsic_funcall() at the head of gfc_conv_intrinsic_function(),
instead of going down the list. I wonder how SUM and PRODUCT are inlined...


-- 

fxcoudert at gcc dot gnu dot org changed:

   What|Removed |Added

   Last reconfirmed|2007-03-07 11:27:52 |2007-03-07 12:18:10
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067



[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)

2007-03-07 Thread tkoenig at gcc dot gnu dot org


--- Comment #1 from tkoenig at gcc dot gnu dot org  2007-03-07 11:27 ---
(In reply to comment #0)

> Maybe we should have MINLOC inlined when there's no mask, stride 1 and
> one-dimensional?

Definitely.  We do this for minval, and from glancing at
gfc_conv_intrinsic_minmaxval and gfc_conv_intrinsic_minmaxloc,
it should happen already.


-- 

tkoenig at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2007-03-07 11:27:52
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067