Re: [patch] Make vector::at() assertion message more useful (try #2)

2013-09-21 Thread Paul Pluzhnikov
On Wed, Sep 18, 2013 at 3:00 PM, Paolo Carlini  wrote:

> In the meanwhile, since you are also touching debug-mode and
> profile-mode, make sure to run check-debug and check-profile too.

Thanks for mentioning that. Several more tests needed line number
adjustments.

There are also tests that are failing there independently of my patch.

Committed as r202818.

Thanks,
-- 
Paul Pluzhnikov


Re: Revisit Core tunning flags

2013-09-21 Thread Xinliang David Li
On Sat, Sep 21, 2013 at 3:51 PM, Xinliang David Li  wrote:
> On Sat, Sep 21, 2013 at 12:54 PM, Jan Hubicka  wrote:
>> Hi,
>> this is upated version of patch discussed at
>> http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00841.html
>>
>> It makes CORE tuning to more follow the optimization guidelines.
>> In particular it removes some tuning flags for features I implemented years
>> back specifically for K7/K8 chips that ended up in Core tunning becuase
>> it was based on generic. Incrementally I plan to drop some of these from
>> generic, too.
>>
>> Compared to previous version of patch I left out INC_DEC change, even
>> though Core I7+ should resolve dependencies on partial flags correctly.
>> Optimization manual still seems to suggest to not use this:
>>
>> Assembly/Compiler Coding Rule 33. (M impact, H generality)
>> INC and DEC instructions should be replaced with ADD or SUB instructions,
>> because ADD and SUB overwrite all flags, whereas INC and DEC do not, 
>> therefore
>> creating false dependencies on earlier instructions that set the flags.
>>
>> Other change dropped is use_vector_fp_converts that seems to improve
>> Core perofrmance.
>
> I did not see this in your patch, but Wei has this tuning in this patch:
>

Sorry, I meant to ask why dropping this part?

David

> http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00884.html
>
> thanks,
>
> David
>
>
>>
>> I benchmarked the patch on SPEC2k and earlier it was benchmarked on 2k6
>> and the performance difference seems in noise.  It causes about 0.3% code
>> size reduction.  Main motivation for the patch is to drop some codegen
>> oddities that do not make sense on modern chips.
>>
>> Bootstrapped/regtested x86_64-linux, will commit it shortly.
>> Honza
>>
>> * x86-tune.def (partial_reg_stall): Disable for CoreI7 and newer.
>> (sse_typeless_stores): Enable for core
>> (sse_load0_by_pxor): Likewise.
>> (four_jump_limit): Disable for core.
>> (pad_returns): Likewise.
>> (avoid_vector_decode): Likewise.
>> (fuse_cmp_and_branch): Enable for cores.
>> * i386.c (x86_accumulate_outgoing_args): Disable for cores.
>> Index: x86-tune.def
>> ===
>> *** x86-tune.def(revision 202812)
>> --- x86-tune.def(working copy)
>> *** DEF_TUNE (X86_TUNE_MOVX, "movx",
>> *** 52,58 
>>  and can happen in caller/callee saving sequences.  */
>>   DEF_TUNE (X86_TUNE_PARTIAL_REG_STALL, "partial_reg_stall", m_PPRO)
>>   DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STALL, "partial_flag_reg_stall",
>> !   m_CORE_ALL | m_GENERIC)
>>   /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
>>* on 16-bit immediate moves into memory on Core2 and Corei7.  */
>>   DEF_TUNE (X86_TUNE_LCP_STALL, "lcp_stall", m_CORE_ALL | m_GENERIC)
>> --- 52,58 
>>  and can happen in caller/callee saving sequences.  */
>>   DEF_TUNE (X86_TUNE_PARTIAL_REG_STALL, "partial_reg_stall", m_PPRO)
>>   DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STALL, "partial_flag_reg_stall",
>> !   m_CORE2 | m_GENERIC)
>>   /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
>>* on 16-bit immediate moves into memory on Core2 and Corei7.  */
>>   DEF_TUNE (X86_TUNE_LCP_STALL, "lcp_stall", m_CORE_ALL | m_GENERIC)
>> *** DEF_TUNE (X86_TUNE_SSE_PACKED_SINGLE_INS
>> *** 125,132 
>>  maintain just lower part of scalar values in proper format leaving the
>>  upper part undefined.  */
>>   DEF_TUNE (X86_TUNE_SSE_SPLIT_REGS, "sse_split_regs", m_ATHLON_K8)
>> ! DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores", 
>> m_AMD_MULTIPLE)
>> ! DEF_TUNE (X86_TUNE_SSE_LOAD0_BY_PXOR, "sse_load0_by_pxor", m_PPRO | 
>> m_P4_NOCONA)
>>   DEF_TUNE (X86_TUNE_MEMORY_MISMATCH_STALL, "memory_mismatch_stall",
>> m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_AMD_MULTIPLE | 
>> m_GENERIC)
>>   DEF_TUNE (X86_TUNE_PROLOGUE_USING_MOVE, "prologue_using_move",
>> --- 125,134 
>>  maintain just lower part of scalar values in proper format leaving the
>>  upper part undefined.  */
>>   DEF_TUNE (X86_TUNE_SSE_SPLIT_REGS, "sse_split_regs", m_ATHLON_K8)
>> ! DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores",
>> ! m_AMD_MULTIPLE | m_CORE_ALL)
>> ! DEF_TUNE (X86_TUNE_SSE_LOAD0_BY_PXOR, "sse_load0_by_pxor",
>> ! m_PPRO | m_P4_NOCONA | m_CORE_ALL)
>>   DEF_TUNE (X86_TUNE_MEMORY_MISMATCH_STALL, "memory_mismatch_stall",
>> m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_AMD_MULTIPLE | 
>> m_GENERIC)
>>   DEF_TUNE (X86_TUNE_PROLOGUE_USING_MOVE, "prologue_using_move",
>> *** DEF_TUNE (X86_TUNE_INTER_UNIT_CONVERSION
>> *** 144,150 
>>   /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
>>  than 4 branch instructions in the 16 byte window.  */
>>   DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, "four_jump_limit",
>> !   m_PPRO | m_P4_NOCO

Re: [c++-concepts] Constrained friends

2013-09-21 Thread Jason Merrill

On 09/21/2013 08:52 AM, Andrew Sutton wrote:

It is wrong, but not for the reasons I gave. This only happens when
you try to constrain a friend function that declares a specialization,
which happens to be completely separate from the previously declared
template.



I'm going to disallow the ability to declare constrained friend
specializations. They don't really make sense.


Hmm, it seems to me the constraints on the specialization help to select 
the template the specialization belongs to.  I'm not sure that would be 
useful, but I don't see a need to specifically prevent it.


Jason



Re: Revisit Core tunning flags

2013-09-21 Thread Xinliang David Li
On Sat, Sep 21, 2013 at 12:54 PM, Jan Hubicka  wrote:
> Hi,
> this is upated version of patch discussed at
> http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00841.html
>
> It makes CORE tuning to more follow the optimization guidelines.
> In particular it removes some tuning flags for features I implemented years
> back specifically for K7/K8 chips that ended up in Core tunning becuase
> it was based on generic. Incrementally I plan to drop some of these from
> generic, too.
>
> Compared to previous version of patch I left out INC_DEC change, even
> though Core I7+ should resolve dependencies on partial flags correctly.
> Optimization manual still seems to suggest to not use this:
>
> Assembly/Compiler Coding Rule 33. (M impact, H generality)
> INC and DEC instructions should be replaced with ADD or SUB instructions,
> because ADD and SUB overwrite all flags, whereas INC and DEC do not, therefore
> creating false dependencies on earlier instructions that set the flags.
>
> Other change dropped is use_vector_fp_converts that seems to improve
> Core perofrmance.

I did not see this in your patch, but Wei has this tuning in this patch:

http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00884.html

thanks,

David


>
> I benchmarked the patch on SPEC2k and earlier it was benchmarked on 2k6
> and the performance difference seems in noise.  It causes about 0.3% code
> size reduction.  Main motivation for the patch is to drop some codegen
> oddities that do not make sense on modern chips.
>
> Bootstrapped/regtested x86_64-linux, will commit it shortly.
> Honza
>
> * x86-tune.def (partial_reg_stall): Disable for CoreI7 and newer.
> (sse_typeless_stores): Enable for core
> (sse_load0_by_pxor): Likewise.
> (four_jump_limit): Disable for core.
> (pad_returns): Likewise.
> (avoid_vector_decode): Likewise.
> (fuse_cmp_and_branch): Enable for cores.
> * i386.c (x86_accumulate_outgoing_args): Disable for cores.
> Index: x86-tune.def
> ===
> *** x86-tune.def(revision 202812)
> --- x86-tune.def(working copy)
> *** DEF_TUNE (X86_TUNE_MOVX, "movx",
> *** 52,58 
>  and can happen in caller/callee saving sequences.  */
>   DEF_TUNE (X86_TUNE_PARTIAL_REG_STALL, "partial_reg_stall", m_PPRO)
>   DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STALL, "partial_flag_reg_stall",
> !   m_CORE_ALL | m_GENERIC)
>   /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
>* on 16-bit immediate moves into memory on Core2 and Corei7.  */
>   DEF_TUNE (X86_TUNE_LCP_STALL, "lcp_stall", m_CORE_ALL | m_GENERIC)
> --- 52,58 
>  and can happen in caller/callee saving sequences.  */
>   DEF_TUNE (X86_TUNE_PARTIAL_REG_STALL, "partial_reg_stall", m_PPRO)
>   DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STALL, "partial_flag_reg_stall",
> !   m_CORE2 | m_GENERIC)
>   /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
>* on 16-bit immediate moves into memory on Core2 and Corei7.  */
>   DEF_TUNE (X86_TUNE_LCP_STALL, "lcp_stall", m_CORE_ALL | m_GENERIC)
> *** DEF_TUNE (X86_TUNE_SSE_PACKED_SINGLE_INS
> *** 125,132 
>  maintain just lower part of scalar values in proper format leaving the
>  upper part undefined.  */
>   DEF_TUNE (X86_TUNE_SSE_SPLIT_REGS, "sse_split_regs", m_ATHLON_K8)
> ! DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores", 
> m_AMD_MULTIPLE)
> ! DEF_TUNE (X86_TUNE_SSE_LOAD0_BY_PXOR, "sse_load0_by_pxor", m_PPRO | 
> m_P4_NOCONA)
>   DEF_TUNE (X86_TUNE_MEMORY_MISMATCH_STALL, "memory_mismatch_stall",
> m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_AMD_MULTIPLE | 
> m_GENERIC)
>   DEF_TUNE (X86_TUNE_PROLOGUE_USING_MOVE, "prologue_using_move",
> --- 125,134 
>  maintain just lower part of scalar values in proper format leaving the
>  upper part undefined.  */
>   DEF_TUNE (X86_TUNE_SSE_SPLIT_REGS, "sse_split_regs", m_ATHLON_K8)
> ! DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores",
> ! m_AMD_MULTIPLE | m_CORE_ALL)
> ! DEF_TUNE (X86_TUNE_SSE_LOAD0_BY_PXOR, "sse_load0_by_pxor",
> ! m_PPRO | m_P4_NOCONA | m_CORE_ALL)
>   DEF_TUNE (X86_TUNE_MEMORY_MISMATCH_STALL, "memory_mismatch_stall",
> m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_AMD_MULTIPLE | 
> m_GENERIC)
>   DEF_TUNE (X86_TUNE_PROLOGUE_USING_MOVE, "prologue_using_move",
> *** DEF_TUNE (X86_TUNE_INTER_UNIT_CONVERSION
> *** 144,150 
>   /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
>  than 4 branch instructions in the 16 byte window.  */
>   DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, "four_jump_limit",
> !   m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM| m_AMD_MULTIPLE
> | m_GENERIC)
>   DEF_TUNE (X86_TUNE_SCHEDULE, "schedule",
> m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_SLM | m_K6_GEODE
> --- 146,152 
>   /* X86_

Revisit Core tunning flags

2013-09-21 Thread Jan Hubicka
Hi,
this is upated version of patch discussed at 
http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00841.html

It makes CORE tuning to more follow the optimization guidelines.
In particular it removes some tuning flags for features I implemented years
back specifically for K7/K8 chips that ended up in Core tunning becuase
it was based on generic. Incrementally I plan to drop some of these from
generic, too.

Compared to previous version of patch I left out INC_DEC change, even
though Core I7+ should resolve dependencies on partial flags correctly.
Optimization manual still seems to suggest to not use this:

Assembly/Compiler Coding Rule 33. (M impact, H generality)
INC and DEC instructions should be replaced with ADD or SUB instructions,
because ADD and SUB overwrite all flags, whereas INC and DEC do not, therefore
creating false dependencies on earlier instructions that set the flags. 

Other change dropped is use_vector_fp_converts that seems to improve
Core perofrmance.

I benchmarked the patch on SPEC2k and earlier it was benchmarked on 2k6
and the performance difference seems in noise.  It causes about 0.3% code
size reduction.  Main motivation for the patch is to drop some codegen
oddities that do not make sense on modern chips.

Bootstrapped/regtested x86_64-linux, will commit it shortly.
Honza

* x86-tune.def (partial_reg_stall): Disable for CoreI7 and newer.
(sse_typeless_stores): Enable for core
(sse_load0_by_pxor): Likewise.
(four_jump_limit): Disable for core.
(pad_returns): Likewise.
(avoid_vector_decode): Likewise.
(fuse_cmp_and_branch): Enable for cores.
* i386.c (x86_accumulate_outgoing_args): Disable for cores.
Index: x86-tune.def
===
*** x86-tune.def(revision 202812)
--- x86-tune.def(working copy)
*** DEF_TUNE (X86_TUNE_MOVX, "movx",
*** 52,58 
 and can happen in caller/callee saving sequences.  */
  DEF_TUNE (X86_TUNE_PARTIAL_REG_STALL, "partial_reg_stall", m_PPRO)
  DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STALL, "partial_flag_reg_stall",
!   m_CORE_ALL | m_GENERIC)
  /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
   * on 16-bit immediate moves into memory on Core2 and Corei7.  */
  DEF_TUNE (X86_TUNE_LCP_STALL, "lcp_stall", m_CORE_ALL | m_GENERIC)
--- 52,58 
 and can happen in caller/callee saving sequences.  */
  DEF_TUNE (X86_TUNE_PARTIAL_REG_STALL, "partial_reg_stall", m_PPRO)
  DEF_TUNE (X86_TUNE_PARTIAL_FLAG_REG_STALL, "partial_flag_reg_stall",
!   m_CORE2 | m_GENERIC)
  /* X86_TUNE_LCP_STALL: Avoid an expensive length-changing prefix stall
   * on 16-bit immediate moves into memory on Core2 and Corei7.  */
  DEF_TUNE (X86_TUNE_LCP_STALL, "lcp_stall", m_CORE_ALL | m_GENERIC)
*** DEF_TUNE (X86_TUNE_SSE_PACKED_SINGLE_INS
*** 125,132 
 maintain just lower part of scalar values in proper format leaving the
 upper part undefined.  */
  DEF_TUNE (X86_TUNE_SSE_SPLIT_REGS, "sse_split_regs", m_ATHLON_K8)
! DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores", m_AMD_MULTIPLE)
! DEF_TUNE (X86_TUNE_SSE_LOAD0_BY_PXOR, "sse_load0_by_pxor", m_PPRO | 
m_P4_NOCONA)
  DEF_TUNE (X86_TUNE_MEMORY_MISMATCH_STALL, "memory_mismatch_stall",
m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_AMD_MULTIPLE | 
m_GENERIC)
  DEF_TUNE (X86_TUNE_PROLOGUE_USING_MOVE, "prologue_using_move", 
--- 125,134 
 maintain just lower part of scalar values in proper format leaving the
 upper part undefined.  */
  DEF_TUNE (X86_TUNE_SSE_SPLIT_REGS, "sse_split_regs", m_ATHLON_K8)
! DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores",
! m_AMD_MULTIPLE | m_CORE_ALL)
! DEF_TUNE (X86_TUNE_SSE_LOAD0_BY_PXOR, "sse_load0_by_pxor",
! m_PPRO | m_P4_NOCONA | m_CORE_ALL)
  DEF_TUNE (X86_TUNE_MEMORY_MISMATCH_STALL, "memory_mismatch_stall",
m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM | m_AMD_MULTIPLE | 
m_GENERIC)
  DEF_TUNE (X86_TUNE_PROLOGUE_USING_MOVE, "prologue_using_move", 
*** DEF_TUNE (X86_TUNE_INTER_UNIT_CONVERSION
*** 144,150 
  /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
 than 4 branch instructions in the 16 byte window.  */
  DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, "four_jump_limit",
!   m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_ATOM | m_SLM| m_AMD_MULTIPLE 
| m_GENERIC)
  DEF_TUNE (X86_TUNE_SCHEDULE, "schedule",
m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_SLM | m_K6_GEODE 
--- 146,152 
  /* X86_TUNE_FOUR_JUMP_LIMIT: Some CPU cores are not able to predict more
 than 4 branch instructions in the 16 byte window.  */
  DEF_TUNE (X86_TUNE_FOUR_JUMP_LIMIT, "four_jump_limit",
!   m_PPRO | m_P4_NOCONA | m_ATOM | m_SLM | m_AMD_MULTIPLE 
| m_GENERIC)
  DEF_TUNE (X86_TUNE_SCHEDULE, "schedule",
m_PENT | m_PPRO | m_CORE_ALL | m_ATOM | m_SLM | m_K

[Patch, Fortran] PR 58355: [4.7/4.8/4.9 Regression] [F03] ICE with TYPE, EXTENDS before parent TYPE defined

2013-09-21 Thread Janus Weil
Hi all,

the straightforward patch in the attachment does two things:

1) It prevents a segfault, which is a regression on 4.7/4.8/trunk (by
simply switching the order of two statements).
2) It modifies an error message, which was not perfectly correct in my opinion.

The patch was regtested successfully on x86_64-unknown-linux-gnu. Ok for trunk?

Part #1, as a regression fix, should also be backported to 4.7 and 4.8. Ok?

Cheers,
Janus



2013-09-21  Janus Weil  

PR fortran/58355
* decl.c (check_extended_derived_type): Prevent segfault, modify error
message.

2013-09-21  Janus Weil  

PR fortran/58355
* gfortran.dg/extends_15.f90: New.


pr58355.diff
Description: Binary data


extends_15.f90
Description: Binary data


Re: [Patch, Fortran] PR 58099: [4.8/4.9 Regression] [F03] over-zealous procedure-pointer error checking

2013-09-21 Thread Janus Weil
> Regtested on x86_64-unknown-linux-gnu. Ok for trunk?
>
 This looks good to me
 but I let Tobias have the final word as he
 expressed some concerns in the PR audit trail.
>>
>> Sorry for the very belated replay. I played with the patch and it looks
>> okay.
>
> thanks. Committed as r202766.

Btw, since this guy fixes a regression which also exists on the 4.8
branch, I forgot to ask: Ok also for 4.8?

Cheers,
Janus


Re: [c++-concepts] Constrained friends

2013-09-21 Thread Andrew Sutton
I'm going to rewrite this patch tomorrow morning. The semantics aren't
quite right --- they should be simpler.

>> Previously, if constraints were not
>> satisfied, we would not record the template as a candidate. However,
>> this causes errors in class template instantiation if there are
>> constrained friend declarations whose constraints are not satisfied
>> ("no matching template declaration").
>
> Is that wrong?  We normally give errors about friend declarations that don't
> match any template.  Why don't we want this error when the friend
> declaration is looking for a constrained template?

It is wrong, but not for the reasons I gave. This only happens when
you try to constrain a friend function that declares a specialization,
which happens to be completely separate from the previously declared
template.

I'm going to disallow the ability to declare constrained friend
specializations. They don't really make sense.

>> +  if (is_non_template_member_fn (fn) || is_non_template_friend (fn))
>
>
> Let's have one predicate instead of two; the condition here is a temploid
> that is not a specialization of a primary template.

Agreed.

>
>> +  if (!check_template_constraints (tmpl, args) && (complain &
>> tf_error))
>>  {
>>reason = template_constraint_failure (tmpl, args);
>>viable = false;
>
>
> Why add the complain check?  A constraint failure should make a candidate
> non-viable even in SFINAE context.

What have I done? That's awful...


>> +  // If we're not instantiating a friend function, then we need to
>> +  // ensure the specialization of the best template satisfies its
>> +  // constraints.
>
>
> Surely we need to check constraints in the earlier loop, so that we don't
> treat as a candidate a template that doesn't satisfy the constraints;
> otherwise if we have two templates
>
> template  T f(T) requires Thing;
> template  T* f(T*);
>
> and our specialization requires Thing, we would select the second (because
> it is otherwise more specialized) and then give an error about constraint
> mismatch; I would think we want to select the first.

I believe that's what the previous version did, and we'll go back to
that. This change was part of the semantics that I didn't get right.

>> +// If there is an overload with the same type and
>> +// constraints, then this is a good declaration.
>> +if (same_type_p (TREE_TYPE (fn), TREE_TYPE (f)))
>> +  if (equivalent_constraints (constr, get_constraints (f)))
>> +return;
>
>
> It seems that this will only allow friend declarations that match the
> template exactly, not friend declarations that are more specialized than the
> matching template.  It looks like you're trying to implement a subset of
> determine_specialization here, which I think is a mistake.

I agree. It's a mistake. This is also related to the semantics that I got wrong.

Effectively, the only changes needed for constrained friends are that:

a) you can't constrain non-dependent friends
b) you can't constraint non-template friend functions that declare a
specialization
and c) that we check non-template friends the same way as non-template
member fns

There should be no changes to any of the rules for determining specializations.

Andrew


Re: [c++-concepts] Constrained friends

2013-09-21 Thread Jason Merrill

On 09/13/2013 12:21 PM, Andrew Sutton wrote:

Previously, if constraints were not
satisfied, we would not record the template as a candidate. However,
this causes errors in class template instantiation if there are
constrained friend declarations whose constraints are not satisfied
("no matching template declaration").


Is that wrong?  We normally give errors about friend declarations that 
don't match any template.  Why don't we want this error when the friend 
declaration is looking for a constrained template?



+  if (is_non_template_member_fn (fn) || is_non_template_friend (fn))


Let's have one predicate instead of two; the condition here is a 
temploid that is not a specialization of a primary template.



+  if (!check_template_constraints (tmpl, args) && (complain & tf_error))
 {
   reason = template_constraint_failure (tmpl, args);
   viable = false;


Why add the complain check?  A constraint failure should make a 
candidate non-viable even in SFINAE context.



+  // If we're not instantiating a friend function, then we need to
+  // ensure the specialization of the best template satisfies its
+  // constraints.


Surely we need to check constraints in the earlier loop, so that we 
don't treat as a candidate a template that doesn't satisfy the 
constraints; otherwise if we have two templates


template  T f(T) requires Thing;
template  T* f(T*);

and our specialization requires Thing, we would select the second 
(because it is otherwise more specialized) and then give an error about 
constraint mismatch; I would think we want to select the first.



+// If there is an overload with the same type and
+// constraints, then this is a good declaration.
+if (same_type_p (TREE_TYPE (fn), TREE_TYPE (f)))
+  if (equivalent_constraints (constr, get_constraints (f)))
+return;


It seems that this will only allow friend declarations that match the 
template exactly, not friend declarations that are more specialized than 
the matching template.  It looks like you're trying to implement a 
subset of determine_specialization here, which I think is a mistake.


Jason



Re: [gomp4] C++ OpenMP user defined reductions (take 2)

2013-09-21 Thread Jason Merrill

On 09/20/2013 12:25 PM, Jakub Jelinek wrote:

In templates the UDRs are always FUNCTION_DECLs in classes or
at function block scope, the above one liner was I believe for the latter,
where without it duplicate_decls was returning incorrectly 0; the UDRs
from mismatching templates would actually never be seen by duplicate_decls,
but t1 was different from t2.  That was before the changes to
add the mangled names to the UDR DECL_NAMEs though, I can try to remove it
and see if the whole testsuite still passes.


Please.


+ && ! (DECL_OMP_DECLARE_REDUCTION_P (newdecl)
+   && DECL_CONTEXT (newdecl) == NULL_TREE
+   && DECL_CONTEXT (olddecl) == current_function_decl))


And this looks like you need to set DECL_CONTEXT sooner on reductions.


I know it is ugly, but setting FUNCTION_DECL DECL_CONTEXT too early resulted
in all kinds of problems, when the C++ frontend doesn't support nested
functions.  So the patch doesn't set DECL_CONTEXT until it is pushdecled
into the block scope.


What is calling decls_match before pushdecl?


+   if (TREE_CODE (argtype) == REFERENCE_TYPE)
+ error_at (DECL_SOURCE_LOCATION (t),
+   "function, array or reference type in "
+   "%<#pragma omp declare reduction%>");


Let's just say "reference type", since we know that's what it is.


That is true, but I wanted to match the same error message elsewhere,
otherwise the error will be different (more specific) for instantiation
vs. in non-template code.


It's more important to be specific during instantiation because we can't 
always tell what the types involved actually are.  So let's also add the 
type in question to the diagnostic.



Though, I could of course in that second spot
just special case REFERENCE_TYPE with a separate error message
and just have one about function or array type.


That would be fine too, of course.


+  for (ix = 0; BINFO_BASE_ITERATE (binfo, ix, base_binfo); ix++)
+   {
+ id = omp_reduction_lookup (loc, orig_id, BINFO_TYPE (base_binfo));
+ if (id != NULL_TREE)
+   return id;


This should check for ambiguity rather than returning the first match.


I believe I need to discuss that on omp-lang, what exactly is the intended
behavior (and get the standard clarified).  Returning the first base class
is an option (and, depth-first vs. breadth-first?), or erroring out if more
than one base class has an UDR is another.


Normal C++ lookup behavior is to check for ambiguity, so I think that's 
the best bet for what the eventual defined semantics will be.



+   if (DECL_TEMPLATE_INFO (id))
+ id = instantiate_decl (id, /*defer_ok*/0, true);


Let's use mark_used instead.


Will that always instantiate it?


In contexts where that makes sense, which I would expect to be all 
contexts where you can see a reduction.


Jason



Re: [PATCH][AARCH64]Replace gen_rtx_PLUS with plus_constant

2013-09-21 Thread James Greenhalgh
On Fri, Sep 20, 2013 at 03:40:59PM +0100, Renlin Li wrote:
> Thank you, can you please commit it for me?
> 
> Kind regards,
> Renlin Li
> 
> On 09/20/13 15:26, Marcus Shawcroft wrote:
> > On 20 September 2013 15:18, Renlin Li  wrote:
> >
> >> 2013-09-20  Renlin Li  
> >>
> >>  * config/aarch64/aarch64.c (aarch64_expand_prologue): Use 
> >> plus_constant.
> >>  (aarch64_expand_epilogue): Likewise.
> >>  (aarch64_legitimize_reload_address): Likewise.
 
Hi Renlin,

This patch appears to have caused a number of regressions on
an aarch64-none-elf test run.

I see Internal Compiler Errors along these lines:

../src/gcc/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c: In function 'main1':
../src/gcc/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c:68:1: error: insn does 
not satisfy its constraints:
 }
 ^
(insn 182 472 183 (set (reg:QI 2 x2 [277])
(mem/c:QI (plus:DI (reg:DI 3 x3)
(const_int 6264 [0x1878])) [0 b1+0 S1 A64])) 
../src/gcc/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c:39 30 {*movqi_aarch64}
 (nil))
../src/gcc/gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c:68:1: internal compiler 
error: in final_scan_insn, at final.c:2886
0x8b8135 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
/work/gcc-clean/src/gcc/gcc/rtl-error.c:109
0x8b815f _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
/work/gcc-clean/src/gcc/gcc/rtl-error.c:120
0x6e1822 final_scan_insn(rtx_def*, _IO_FILE*, int, int, int*)
/work/gcc-clean/src/gcc/gcc/final.c:2886
0x6e1b01 final(rtx_def*, _IO_FILE*, int)
/work/gcc-clean/src/gcc/gcc/final.c:2017
0x6e1d49 rest_of_handle_final
/work/gcc-clean/src/gcc/gcc/final.c:4422
0x6e1d49 execute
/work/gcc-clean/src/gcc/gcc/final.c:4497
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

Thanks,
James