Re: Question about builtin_free doesn't read memory

2021-11-28 Thread Jan Hubicka via Gcc
> Hi,
> In function ref_maybe_used_by_call_p_1, there is below code snippet
>  /* The following builtins do not read from memory.  */
>  case BUILT_IN_FREE:
>  ...
>return false;
> 
> I am confused because free function does read from (and even write to)
> memory pointed to by passed argument?

Free is a black box and makes the memory pointed to disappear without
actually worrying what values it holds. We rely on fact that we do not
see free imlementation and does not worry about the details of its
implementation (whcih probably has sort of linked list before address
ptr points to)
> I am thinking DSE optimizations like:
>   *ptr = value;
>   free(ptr);
>   *ptr = undef;
> Does GCC take advantage of UB to eliminate the first store to ptr if
> free is considered not reading memory?

The aim here is to optimize out *ptr = value.

Honza
> 
> Thanks,
> bin


Re: distinguishing gcc compilation valgrind false positives

2021-11-24 Thread Jan Hubicka via Gcc
> ==5404== Conditional jump or move depends on uninitialised value(s)
> ==5404==    at 0x25DAAD7: incorporate_penalties (ipa-cp.c:3282)
> ==5404==    by 0x25DAAD7: good_cloning_opportunity_p(cgraph_node*, sreal,
> sreal, profile_count, int) (ipa-cp.c:3340)

I looked at this one (since it is in code I am familiar with) and I
think it may be real bug.  There are flags node_is_self_scc which does
not seem to be initialized in the constructor.

diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 42842d9466a..1d0c115465c 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -623,8 +623,8 @@ ipa_node_params::ipa_node_params ()
 : descriptors (NULL), lattices (NULL), ipcp_orig_node (NULL),
   known_csts (vNULL), known_contexts (vNULL), analysis_done (0),
   node_enqueued (0), do_clone_for_all_contexts (0), is_all_contexts_clone (0),
-  node_dead (0), node_within_scc (0), node_calling_single_call (0),
-  versionable (0)
+  node_dead (0), node_within_scc (0), node_is_self_scc (0),
+  node_calling_single_call (0), versionable (0)
 {
 }
 
Honza


Re: distinguishing gcc compilation valgrind false positives

2021-11-25 Thread Jan Hubicka via Gcc
> >
> > diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
> > index 42842d9466a..1d0c115465c 100644
> > --- a/gcc/ipa-prop.h
> > +++ b/gcc/ipa-prop.h
> > @@ -623,8 +623,8 @@ ipa_node_params::ipa_node_params ()
> >  : descriptors (NULL), lattices (NULL), ipcp_orig_node (NULL),
> >known_csts (vNULL), known_contexts (vNULL), analysis_done (0),
> >node_enqueued (0), do_clone_for_all_contexts (0), is_all_contexts_clone 
> > (0),
> > -  node_dead (0), node_within_scc (0), node_calling_single_call (0),
> > -  versionable (0)
> > +  node_dead (0), node_within_scc (0), node_is_self_scc (0),
> > +  node_calling_single_call (0), versionable (0)
> >  {
> >  }
> 
> Oops, can you please commit the change to master and all active
> branches?
OK.  To be honest I am not 100% sure if the flag is not always
initialized later.  

Zdenek, can you, please, check if the undefined warning in ipcp goes
away with this patch and if so, post what are the remaining ones?
> 
> Thanks,
> 
> Martin
> 


Re: distinguishing gcc compilation valgrind false positives

2021-11-25 Thread Jan Hubicka via Gcc
> 
> I can confirm that zero-initializing node_is_self_scc prevents the 
> uninitialised use warnings in incorporate_penalties (ipa-cp.c:3282)
Great, I will commit the patch.  But I also wonder if there are any
remaining unitialized warnings in ipa code?

Honza
> 
> Thanks,
> Zdenek
> 
> 


Re: Question on calling possible_polymorphic_call_targets, side effects, and virtual tables.

2021-10-27 Thread Jan Hubicka via Gcc
> Hello,
> 
> I have a SIMPLE_IPA_PASS that parses the program multiple times. As it
> parses gimple, it builds a data structure with the information
> collected that will provide some invariants to future iterations over
> the program.
> 
> I was looking into adding a new feature that would take advantage of
> devirtualization by calling possible_polymorphic_call_targets. All
> looks good, a large program that I use for verifying changes seems to
> compile nicely. However, as I generate a test file in which I discard
> most optimizations (except the one I am currently working on) I
> realize that an assertion on my pass is triggered.
> 
> I dig in and it looks like I am ignoring error_mark_nodes in varpools'
> DECL_INITIAL. The first passes essentially encode the information that
> these are error_mark_nodes. On the pass in which I call
> possible_polymorphic_call_targets I find that suddenly, these
> error_mark_nodes are gone and replaced with a virtual table, thus
> triggering the assertion.
> 
> Is there a way I can get rid of the error_mark_nodes from the earlier
> passes to match the changes brought by
> possible_polymorphic_call_targets? I suppose that finding
> polymorphic_call_targets at an earlier stage is a possibility, but I
> was wondering exactly which function/statement is able to fix these
> error_mark_nodes so that I can also learn about this.

Like function bodies the initializers are read on demand.
If you call get_constructor (varpool_node) you will trigger the lazy
stream in.

Honza
> 
> Thanks!


Re: Question on ipa_ref->referring and ipa_ref->stmt on all_late_ipa_passes

2022-02-14 Thread Jan Hubicka via Gcc
> Hi,
> 
> I would like to use ipa_ref in the PASS_LIST all_late_ipa_passes to query
> the statement (ref->stmt) of where a global variable is used. However, I am
> having some problems achieving this.
> 
> What I do is:
> 
> 1. Check that ipa_ref->referring has a body and is not inlined.
> 2. get_body
> 3. try to print out the gimple statement using print_gimple_stmt
> (dump_file, ref->stmt, 0, TDF_NONE).
> 
> This all seems correct to me, but I have been receiving errors that print
> is trying to print a tree with an incorrect TREE_CODE. I am assuming here
> that ref->stmt is not updated after all_regular_ipa_passes, much like how
> when looking at cgraph_edge the call statement is also not updated. Can
> someone please tell me if this is indeed the case or what is happening here?

Yes, while body materialization we keep cgraph edges up to date but we
do not keep references.  We probably should remove them earlier.
Keeping them up to date would need relatively some work. We do so for
calls since they hold important information (i.e. where to redirect the
call). For references we don't have such machinery in place (even though
I was thinking implementing it). Main difficulty is that inlining also
performs statement folding that moves referneces around statements quite
freely
> 
> Also, while I think that the gimple statements might not be maintained, I
> see that ipa_ref is still used in the ipa_pta pass during
> all_late_ipa_passes. I see that ipa_ref->referring and ipa_ref->stmt are
> not used. Instead the tree of the referred is obtained in the following
> way: ref->referred->decl. I am assuming that it would be possible to use
> ref->referred->decl and search for this tree everywhere in referring to
> find the uses. Can someone confirm this?

I will check what ipa-pta uses here.  I suppose it works since you still
have all references from the pre-IPA-transform stage, so it is
consistent...

Honza
> 
> Thanks!
> -Erick


Re: [GSoC]Bypass assembler when generating LTO object files

2022-04-12 Thread Jan Hubicka via Gcc
Hi,
> 
> 
> > On 08-Apr-2022, at 6:32 PM, Jan Hubicka  wrote:
> > 
> > Ankur,
> >> I was browsing the list of submitted GSoC projects this year and the
> >> project regarding bypassing assembler when generating LTO object files
> >> caught my eye.
> > I apologize for late reply.  I would be very happy to mentor this
> > project.
> 
> Thanks for the reply, but unfortunately, due to some reasons, I would not
> be able to take part in GSoC this year.  
> But the project seems interesting and would be amazing opportunity to
> learn a lot more things for me, so would it be okay if I try to give it a
> go outside GSoC if no-one else picks it as their GSoC project this year ?

I would be still very happy to help with that! However it would be also
pity to not take part of GSoC, so if there is something I can help with
on that let me know.
> 
> >> 
> >> I already have a gcc built from source (sync-ed with trunk/master) and
> >> launched the test-suite on it.
> >> 
> >> I am currently in process of understanding the primilary patch
> >> (https://gcc.gnu.org/legacy-ml/gcc/2014-09/msg00340.html), and
> >> experimenting with it.
> >> 
> >> are there any other things I should be aware of (useful Doc/blog or a
> >> bug tracking the project) before proceeding further ?
> > 
> > I think it is pretty much all that exists.  Basically we will need to
> > implement everything that is necessary to stream out valid object file
> > directly from GCC rather than going through gas.  The experimental
> > prototype sort of worked but it was lacking few things.
> 
> When I try to apply that patch on my local branch ( branched from trunk ),
> it seem to be incompatible with the current working tree. Is there a
> specific branch that I have to apply it to ? or is it due to the recent
> file rename patch ( changing extensions from .c to .cc ) ? 
> 
> ```
> $ git apply --check bypass_asm_patch
> 
> error: patch failed: Makefile.in:1300
> error: Makefile.in: patch does not apply
> error: common.opt: No such file or directory
> error: langhooks.c: No such file or directory
> error: lto/Make-lang.in: No such file or directory
> error: lto/lto-object.c: No such file or directory
> error: lto/lto.c: No such file or directory
> error: lto/lto.h: No such file or directory
> error: lto-streamer.h: No such file or directory
> error: toplev.c: No such file or directory
> ```

I can try to update the patch, or it probably should apply to trunk
checked out around the date I sent the patch.  Indeed we need to change
c to cc but there are likely more changes since then - most importnatly
the early debug info.
At I will see how easy/hard is to make the patch build with current
trunk.
Honza
> 
> Thanks 
> - Ankur 
> 


Re: Incremental LTO Project

2023-09-10 Thread Jan Hubicka via Gcc
> Hi!
> 
> On 2023-09-07T19:00:49-0400, James Hu via Gcc  wrote:
> > I noticed that adding incremental LTO was a GSoC project that was not
> > claimed this cycle (
> > https://summerofcode.withgoogle.com/programs/2023/organizations/gnu-compiler-collection-gcc).
> > I was curious about working on this project, but wanted to check on the
> > state of the project.
> 
> Thanks for your interest!  (... as a potential contributor, I presume?)
> 
> > Has it already been completed? Is someone actively
> > working on it?
> 
> Yesterday, when browsing the schedule of the GNU Tools Cauldron 2023,
> , I noticed there's going to be a
> presentation on "Incremental LTO in GCC" (Michal Jireš),
> .

Indeed Michal Jires (who is my student at Charles University) did a lot
of work on incremental LTO.  He is finishing his second bachelor in
physics and then we plan to start working towards contributing it to the
mainline.
His imlementation is described in thesis
https://dspace.cuni.cz/bitstream/handle/20.500.11956/183051/130360194.pdf?sequence=1

This is just a start of the project, further improvemnts will be welcome

Honza
> 
> > If not, what would be the appropriate method to contact the
> > mentor (Jan Hubička)?
> 
> He's reading this list, but I've now also put Honza in CC.
> 
> 
> Grüße
>  Thomas
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955


Re: Clarification regarding various classes DIE's attribute value class

2023-10-11 Thread Jan Hubicka via Gcc
> Hello,
> I am working on a project to produce the LTO object file from the compiler
> directly. So far, we have
> correctly outputted .symtab along with various .debug sections. The only
> thing remaining is to
> correctly output attribute values and their corresponding values in the
> .debug_info section. This is done by the output_die function in
> dwarf2out.cc based on the value's class. However, the same
> function is used in dwarf2out_finish  as well as dwarf2out_early_finish, so
> I suspect that not every value class is being used in dwarfout_early_finish
> (mainly I am interested in -flto mode). As there is little documentation on
> the same, I experimented by commenting out the various cases of value class
> in output die. I found that the classes such as dw_val_class_addr,
> dw_val_class_high_pc, and dw_val_class_vms_delta aren't being used during
> the early_finish of LTO mode. I might be wrong, as my observation is based
> on commenting out and testing a few pieces of code that might need to be
> completed. So, can anyone please tell out of these 30 classes that are
> relevant to dwarf2out_early_finish in LTO mode or at least point out some
> documentation if it exists?

You can probably do gcc_assert (!in_lto_p && flag_lto && !flag_fat_lto_objects)
on parts of output_die which you think are unused and then do make check
and if it passes also make bootstrap.  That should probably catch all
relevant cases.

Honza
> enum dw_val_class
> {
>   dw_val_class_none,
>   dw_val_class_addr,
>   dw_val_class_offset,
>   dw_val_class_loc,
>   dw_val_class_loc_list,
>   dw_val_class_range_list,
>   dw_val_class_const,
>   dw_val_class_unsigned_const,
>   dw_val_class_const_double,
>   dw_val_class_wide_int,
>   dw_val_class_vec,
>   dw_val_class_flag,
>   dw_val_class_die_ref,
>   dw_val_class_fde_ref,
>   dw_val_class_lbl_id,
>   dw_val_class_lineptr,
>   dw_val_class_str,
>   dw_val_class_macptr,
>   dw_val_class_loclistsptr,
>   dw_val_class_file,
>   dw_val_class_data8,
>   dw_val_class_decl_ref,
>   dw_val_class_vms_delta,
>   dw_val_class_high_pc,
>   dw_val_class_discr_value,
>   dw_val_class_discr_list,
>   dw_val_class_const_implicit,
>   dw_val_class_unsigned_const_implicit,
>   dw_val_class_file_implicit,
>   dw_val_class_view_list,
>   dw_val_class_symview
> };
> 
> --
> Rishi


Re: Help needed in output relocations

2023-10-18 Thread Jan Hubicka via Gcc
> Hello,
Hi,
> I have almost completed the output of relocation entries. The only thing
> that remains is to output the corresponding symbols in .symtab. In my
> current design, I store the info about relocation entry and the symbol
> name. However, the problem I am facing with this approach is that many
> relocation entries will have the same name, so we will need a hash table
> with the key as symbol name and value as symbol index in symtab. It would
> be really helpful if you could point out the relevant docs or help me
> figure out how to use the hash table in GCC.
> Another approach is, as we have only 4-5 unique relocation symbol names. We
> can output just those and store their .symtab index somewhere.

I am not 100% sure what precisely you need, but if you need just
hashtable translating strings to integers, you can probably just
follow section_name_hash in symtab.cc which is doing pretty much
the same.

In case you can already rely on the fact that the strings are unified
(which is the case for GCC's identifiers) you can also simply use
hash_map which has bit easier to set up.

Honza
> 
> --
> Rishi


Re: [GSoC]Bypass assembler when generating LTO object files

2022-04-08 Thread Jan Hubicka via Gcc
Ankur,
> I was browsing the list of submitted GSoC projects this year and the
> project regarding bypassing assembler when generating LTO object files
> caught my eye.
I apologize for late reply.  I would be very happy to mentor this
project.
> 
> I already have a gcc built from source (sync-ed with trunk/master) and
> launched the test-suite on it.
> 
> I am currently in process of understanding the primilary patch
> (https://gcc.gnu.org/legacy-ml/gcc/2014-09/msg00340.html), and
> experimenting with it.
> 
> are there any other things I should be aware of (useful Doc/blog or a
> bug tracking the project) before proceeding further ?

I think it is pretty much all that exists.  Basically we will need to
implement everything that is necessary to stream out valid object file
directly from GCC rather than going through gas.  The experimental
prototype sort of worked but it was lacking few things.  First is the
production of proper object file hearder (it encodes things like
architeture ELF is produced for), production of symbol table that is
necessary to mark the LTO object file and also we now need a way to
stream debug info (DWARF) directly to the object from dwarf2out.

So I think first step would be to produce object files w/o debug info
which can be consumed by unmodified linkers and then look into DWARF
bytecode streaming.

Honza
> 
> I am Ankur Saini, a B.Tech CSE 3rd year student at USICT, GGSIPU india
> and a former GSoC contributor at gcc ( worked on expanding gcc static
> analyzer's C++ support in GSoC 2021
> [https://gist.github.com/Arsenic-ATG/8f4ac194f460dd9b2c78cf51af39afef])
> 
> Thanks
> - Ankur


GNU Tools Cauldron 2022

2022-05-14 Thread Jan Hubicka via Gcc
Hello,

We are pleased to invite you all to the next GNU Tools Cauldron,
taking place in Paris on September 16-18, 2022.  We are looking forward
to meet you again after three years!

As for the previous instances, we have setup a wiki page for
details:

 https://gcc.gnu.org/wiki/cauldron2022  


In previous years we have been able to make the Cauldron free to all to attend
thanks to the generosity of our sponsors. However this year we are having to
charge for attendance. We are still working out what we will need to charge,
but it will be no more than £200.

Attendance will remain free for community volunteers and others who do not have
a commercial backer and we will be providing a small number of travel bursaries
for students to attend. 

We will also make it possible to attend remotely in the hybrid form and
remote attendance will be for free as well.

Please email to tools-cauldron-admin AT googlegroups.com if you would like to
attend, indicating your T-shirt size and any dietary requirements. We'll
contact you later in the year with details of how to pay.

To send abstracts or ask administrative questions, please email to
tools-cauldron-admin AT googlegroups.com  .

The Cauldron is organized by a group of volunteers. We are keen to add some
more people so others can stand down. If you'd like to be part of that
organizing committee, please email the same address.

This announcement is being sent to the main mailing list of the following
groups: GCC, GDB, binutils, CGEN, DejaGnu, newlib and glibc.

Please feel free to share with other groups as appropriate.

Honza


Re: GNU Tools Cauldron 2022

2022-05-14 Thread Jan Hubicka via Gcc
> On Sat, May 14, 2022 at 7:03 PM Jan Hubicka via Gcc  wrote:
> >
> > Hello,
> >
> > We are pleased to invite you all to the next GNU Tools Cauldron,
> > taking place in Paris on September 16-18, 2022.  We are looking forward
> > to meet you again after three years!
> 
> Did you intend to announce the location as Prague, Czechia, not Paris, France?
Thanks! I was checking it several times for similar mistakes :(
So here should be corrected announcement before I send it to ohter
mailing lists.

Hello,

We are pleased to invite you all to the next GNU Tools Cauldron,
taking place in Prague on September 16-18, 2022.  We are looking forward
to meet you again after three years!

As for the previous instances, we have setup a wiki page for
details:

 https://gcc.gnu.org/wiki/cauldron2022  
<https://gcc.gnu.org/wiki/cauldron2022>

In previous years we have been able to make the Cauldron free to all to attend
thanks to the generosity of our sponsors. However this year we are having to
charge for attendance. We are still working out what we will need to charge,
but it will be no more than £200.

Attendance will remain free for community volunteers and others who do not have
a commercial backer and we will be providing a small number of travel bursaries
for students to attend. 

We will also make it possible to attend remotely in the hybrid form and
remote attendance will be for free as well.

Please email to tools-cauldron-admin AT googlegroups.com if you would like to
attend, indicating your T-shirt size and any dietary requirements. We'll
contact you later in the year with details of how to pay.

To send abstracts or ask administrative questions, please email to
tools-cauldron-admin AT googlegroups.com  <http://googlegroups.com>.

The Cauldron is organized by a group of volunteers. We are keen to add some
more people so others can stand down. If you'd like to be part of that
organizing committee, please email the same address.

This announcement is being sent to the main mailing list of the following
groups: GCC, GDB, binutils, CGEN, DejaGnu, newlib and glibc.

Please feel free to share with other groups as appropriate.

Honza


Re: State of AutoFDO in GCC

2022-07-27 Thread Jan Hubicka via Gcc
> On Tue, Jul 26, 2022 at 4:13 PM Eugene Rozenfeld via Gcc
>  wrote:
> >
> > Hello GCC community.
> >
> > I started this thread on the state of AutoFDO in GCC more than a year ago. 
> > Here is the first message in the thread: 
> > https://gcc.gnu.org/pipermail/gcc/2021-April/235860.html
> >
> > Since then I committed a number of patches to revive AutoFDO in GCC:
> >
> > Fix a typo in an AutoFDO error 
> > string
> > Update gen_autofdo_event.py and 
> > gcc-auto-profile.
> > Fixes for AutoFDO 
> > tests
> > Fix indir-call-prof-2.c with 
> > AutoFDO
> > Fixes for AutoFDO 
> > testing
> > Fix indirect call inlining with 
> > AutoFDO
> > Improve AutoFDO count propagation 
> > algorithm
> > AutoFDO: don't set param_early_inliner_max_iterations to 
> > 10.
> > AutoFDO: Don't try to promote indirect calls that result in recursive 
> > direct 
> > calls
> > Fix profile count maintenance in vectorizer 
> > peeling.
> >
> > I also made a number of fixes and improvements to create_gcov tool in 
> > https://github.com/google/autofdo .
> >
> > AutoFDO in GCC is in a much better shape now.
> >
> > I have a further set of patches that improve DWARF discriminator support in 
> > GCC and enable AutoFDO to use discriminators. It's based on commits in an 
> > old Google vendor branch as described in Andi's mail below
> > but uses a different approach for keeping track of per-instruction 
> > discriminators.
> >
> > I submitted the first (and the biggest) of these patches almost 2 months 
> > ago on June 2: 
> > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=5af22024f62f1f596a35d3c138d41d47d5697ca0
> > but only got a review from Andi 
> > (https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596549.html) who is 
> > not allowed to approve patches for commit. I pinged gcc-patches twice with 
> > no success.
> >
> > I would appreciate help in getting a review on this patch so that I can get 
> > it committed and submit patches that depend on it.
> 
> Hi, Eugene
> 
> Thanks for your efforts to fix and improve AutoFDO in GCC.  I believe
> that part of the difficulty with obtaining a review of the patches is
> that the original authors have dispersed and no one in the GCC
> community officially is the maintainer for the feature.  Because you
> seem to be one of the primary users and developers, would you be
> interested to take on the responsibility of maintaining the
> AutoFDO-specific portions of the code, with guidance and mentorship
> from other GCC maintainers, especially the ones responsible for gcov
> and PDO?

I missed the patches (it would help to add me to CC :) and will review
the FDO/profile facing parts.  Since it also extends debug info
generation and front-ends I think we also need reviewer for that part.

Having auto-FDO co-maintainer would be welcome.

Honza
> 
> Thanks, David
> 
> >
> > Thank you,
> >
> > Eugene
> >
> > -Original Message-
> > From: Andi Kleen 
> > Sent: Monday, May 10, 2021 10:21 AM
> > To: Joseph Myers 
> > Cc: Jan Hubicka ; gcc ; Eugene Rozenfeld 
> > 
> > Subject: [EXTERNAL] Re: State of AutoFDO in GCC
> >
> > On Mon, May 10, 2021 at 04:55:50PM +, Joseph Myers wrote:
> > > On Mon, 10 May 2021, Andi Kleen via Gcc wrote:
> > >
> > > > It's difficult to find now because it was a branch in the old SVN
> > > > that wasn't converted. Sadly the great git conversion was quite lossy.
> > >
> > > All branches and tags, including deleted ones, were converted (under
> > > not-fetched-by-default refs in some cases); the git repository has
> > > everything that might plausibly be useful, omitting only a few things
> > > that would have been meaningless to convert, such as mistaken branch
> > > creations in the root of the repository and the SVN hooks directory.
> > > Use "git ls-remote git://gcc.gnu.org/git/gcc.git" to see the full list
> > > of over 5000 refs available in the repository (or do a clone with
> > > --mirror to fetch them all).
> >
> > Okay thanks. I don't see them in any of the web interfaces, neither on
> > 

Re: GNU Tools Cauldron 2022

2022-09-16 Thread Jan Hubicka via Gcc
Hello,
> Hello,
> 
> We are pleased to invite you all to the next GNU Tools Cauldron,
> taking place in Paris on September 16-18, 2022.  We are looking forward
> to meet you again after three years!
> 
> As for the previous instances, we have setup a wiki page for
> details:
> 
>  https://gcc.gnu.org/wiki/cauldron2022  
> 

The Cauldron is going to start in few minutes.  We also set up youtube
channel https://www.youtube.com/channel/UCCJUY8xtvLa05oODFj6FHcQ where
you can watch us live.

Honza
> 
> In previous years we have been able to make the Cauldron free to all to attend
> thanks to the generosity of our sponsors. However this year we are having to
> charge for attendance. We are still working out what we will need to charge,
> but it will be no more than £200.
> 
> Attendance will remain free for community volunteers and others who do not 
> have
> a commercial backer and we will be providing a small number of travel 
> bursaries
> for students to attend. 
> 
> We will also make it possible to attend remotely in the hybrid form and
> remote attendance will be for free as well.
> 
> Please email to tools-cauldron-admin AT googlegroups.com if you would like to
> attend, indicating your T-shirt size and any dietary requirements. We'll
> contact you later in the year with details of how to pay.
> 
> To send abstracts or ask administrative questions, please email to
> tools-cauldron-admin AT googlegroups.com  .
> 
> The Cauldron is organized by a group of volunteers. We are keen to add some
> more people so others can stand down. If you'd like to be part of that
> organizing committee, please email the same address.
> 
> This announcement is being sent to the main mailing list of the following
> groups: GCC, GDB, binutils, CGEN, DejaGnu, newlib and glibc.
> 
> Please feel free to share with other groups as appropriate.
> 
> Honza


Re: cgraph: does node->inlined_to imply node->clones is non-empty?

2023-03-24 Thread Jan Hubicka via Gcc
Hi,
> 
> It seems to me that the most correct fix is to add to
> walk_polymorphic_call_targets a check that the obtained possible target
> is still referenced_from_vtable_p() - because the alias that was
> originally a virtual function is referenced from a vtable that at this
> point is also known to be gone.  But the check looks like it is possibly
> expensive, so I wanted to discuss this with Honza first (hopefully next
> week).

External vtables are bit special since they can refer to things that are
not accessible in current unit (static functions from other translation
units etc).
We already have and test can_refer_decl_in_current_unit_p but we test it
before alias resolution (which is there just to avoid artifically
bumping up the number of possible targets)

So perhaps following untested patch would work?

diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc
index 14cf132c767..b33ec708d47 100644
--- a/gcc/ipa-devirt.cc
+++ b/gcc/ipa-devirt.cc
@@ -2420,7 +2420,8 @@ maybe_record_node (vec  ,
   alias_target = target_node->ultimate_alias_target ();
   if (target_node != alias_target
  && avail >= AVAIL_AVAILABLE
- && target_node->get_availability ())
+ && target_node->get_availability ()
+ && can_refer_decl_in_current_unit_p (target_node->decl, NULL))
target_node = alias_target;
 }
 
I am at conference and will be able to test it only during weekend.
Honza
> 
> >
> > I had already figured that an error could've likely been in
> > reach-ability analysis, but my time ran low, and I had not confirmed
> > anything, or as little as formalized a theory, so I just wrote the
> > original email instead of following this trail of thought fully.
> >
> > Thank you for your guidance!  Have a lovely night :)
> 
> It is good thing that you asked, I also learned something new (that
> virtual and non-virtual functions can be ICFed together).
> 
> Martin


Re: [GSoC] Introduction and query on LTO object emmission project

2023-03-03 Thread Jan Hubicka via Gcc
Hello,
> Hi! I've been interested in compiler development for a while, and would love 
> to
> work with any of you as part of GSoC, or even just as a side-project on my 
> own.
> 
> I'm an 18 year-old student going into university next year with a passion for 
> all
> things open source and low level. I consider myself fluent in c, and 
> proficient
> with c++, rust, and x86 assembly, but unfamiliar with practical compiler 
> design.
> I have done some reading on the theoretical aspects of compilers, however.
> 
> While I haven't worked with the GCC community before, I have worked with the 
> linux
> community and have made several small patches there, so I am familiar with 
> both
> email-based workflows and the principles of open-source development. 
> 
> This summer, I'm looking for more experience working on larger projects, as 
> well
> as getting into real compilers.
> 
> Of particular interest to me is the project idea labelled "Bypass assembler 
> when
> generating LTO object files." I see that the project was taken last year, but
> I can find no sign of any changes committed to trunk 
> (`git shortlog --after=2022-01-01 | grep -i -E "lto|assembl(er|y)"` shows 
> nothing
> related to this project) and no sign of any needed change made in the code.
> Is this project still available?

yes, the project is available and Maritn Jambor  and me
would be happy to mentor it. Please add me and Martin to CC of any
emails on the proejt.

I think it would be good to start by looking at the original work in
progress patch (linked from the task description, an updated version is
here https://gcc.gnu.org/pipermail/gcc/2022-May/238670.html).   Overall
struture of the LTO optimization is described in paper
https://arxiv.org/abs/1010.2196

One problem is that the object files produced by libiberty/simple-object.c
(which is the low-level API used by the LTO code)
are missing some information (such as the architecture info and symbol
table) and API of the simple object will need to be extended to handle
that.  So I think getting familiar with simple-object.c and the
elf/mach/coff handlers of it would be a good initital step.  It will be
necessary to add API to specify the architecture and symbols in the
symbol table so the resulting object files are recognized by linker.

Second part would be handling of output of dwarf debug information, so
it may make sense to also take a quick look on its documentation.  The
on-disk representation is actually not too hard and it should be
possible to make a direct writer.

Jan
> 
> I'm also willing to work on other projects, ideally in the middle/backend, but
> currently I have only been experimenting with the gcc/[lto,data]-streamer* 
> files.
> If anyone has a small or medium sized project idea, please feel free to let 
> me know.
> 
> 
> I look forward to working with all of you in the future,
> 
> Peter Lafreniere
> 
> 


Re: Stepping down as gcov maintainer and callgraph reviewer

2023-02-16 Thread Jan Hubicka via Gcc
Martin,
> Hello GCC community.
> 
> After spending last decade (including my diploma thesis even more)
> of my life working on GCC, I decided to leave the project and try
> a different job. I would like to thank all the community members I had
> change to interact with, I learned so much and enjoyed the journey!
> I'll be leaving somewhen at the beginning of May.
> 
> That said, I'm stepping down from my 2 positions as I won't have a time
> for proper patch review and bugs in the area I'm responsible for.

I am sad to hear this news and will definitely miss you as coleague
and co-maintaner.  Thank you for all the work on GCC!

Honza
> 
> I wish the project all the best!
> 
> Cheers,
> Martin

> From bb3aee20cdeeb6399ca77ac05cd8093d66256df3 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Thu, 16 Feb 2023 16:50:38 +0100
> Subject: [PATCH] MAINTAINERS: stepping down from my positions
> 
> ChangeLog:
> 
>   * MAINTAINERS: I'm stepping down from my positions.
> ---
>  MAINTAINERS | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 18edc86df67..a61d3ae06df 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -230,7 +230,6 @@ docstring relicensing Gerald Pfeifer  
> 
>  docstring relicensingJoseph Myers
> 
>  predict.def  Jan Hubicka 
>  gcov Jan Hubicka 
> -gcov Martin Liska
>  gcov Nathan Sidwell  
>  option handling  Joseph Myers
> 
>  middle-end   Jeff Law
> @@ -268,7 +267,6 @@ check in changes outside of the parts of the compiler 
> they maintain.
>   Reviewers
>  
>  arc port Claudiu Zissulescu  
> -callgraphMartin Liska
>  callgraphMartin Jambor   
>  C front end  Marek Polacek   
>  CTF, BTF David Faust 
> @@ -519,6 +517,7 @@ Kriang Lerdsuwanakij  
> 
>  Renlin Li
>  Xinliang David Li
>  Chen Liqin   
> +Martin Liska 
>  Jiangning Liu
>  Sa Liu   
>  Ralph Loader 
> -- 
> 2.39.1
> 



Re: [GSOC] few question about Bypass assembler when generating LTO object files

2023-04-03 Thread Jan Hubicka via Gcc
Hello,
> While going through the patch and simple-object.c I understood that the
> file simple-object.c is used to handle the object file format. However,
> this file does not contain all the architecture information required for
> LTO object files, so the workaround used in the patch is to read the
> crtbegin.o file and merge the missing attributes. While this workaround is
> functional, it is not optimal, and the ideal solution would be to extend
> simple-object.c to include the missing information.

Yes, simple-object.c simply uses architecture settings it read earlier
which is problem since at compile time we do not read any object files,
just parse sources). In my original patch the architecture flags were
simply left blank.  I am not sure if there is a version reading
crtbeing.o which would probably not a be that bad workaround, at least
for the start.  Having a way to specify this from the machine descriptions
would be better.

Besides the architecture bits, for simple-object files to work we need
to add the symbol table. For practically useful information we also need
to stream the debug info.
> 
> Regarding the phrase "Support in the driver to properly execute *1 binary",
> it is not entirely clear what it refers to. My interpretation is that the
> compiler driver (the program that coordinates the compilation process)
> needs to be modified to correctly output LTO object files instead of
> assembler files (the current approach involves passing the -S and -o
> .o options) and also skip the assembler option while using
> -fbypass-asm option but I am not sure. Can Jan or Martin please shed some
> light on this?
Yes, compiler drivers decides what to do and it needs to know that with
-flto it does not need to produce assembly file and then invoke gas.  If
we go the way of reading in crtbegin.o it will also need to pass correct
crtbegin to *1 binary.  This is generally not that hard to do, just
needs to be done :)

Honza
> 
> Thanks & Regards
> 
> Rishi Raj
> 
> On Sun, 2 Apr 2023 at 03:05, Rishi Raj  wrote:
> 
> > Hii Everyone,
> > I had already expressed my interest in the " Bypass assembler when
> > generating LTO object files" project and making a proposal for the same. I
> > know I should have done it earlier but I was admitted to the hospital for
> > past few days :(.
> > I have a few doubts.
> > 1)
> >
> > "One problem is that the object files produced by libiberty/simple-object.c
> > (which is the low-level API used by the LTO code)
> > are missing some information (such as the architecture info and symbol
> > table) and API of the simple object will need to be extended to handle
> > that" I found this in the previous mailing list discussion. So who output 
> > this information currently in the object file, is it assembler?
> >
> > Also in the current patch for this project by Jan Hubica, from where are we 
> > getting these information from? Is it from crtbegin.o?
> >
> > 2)
> > "Support in driver to properly execute *1 binary." I found this on Jan 
> > original patch's email. what does it mean
> >
> > exactly?
> >
> > Regards
> >
> > Rishi Raj
> >
> >
> >
> >


Re: [GSOC] few question about Bypass assembler when generating LTO object files

2023-04-04 Thread Jan Hubicka via Gcc
Hello,
> Thanks, Jan for the Reply! I have completed a draft proposal for this
> project. I will appreciate your's, Martin's, or anybody else feedback on
> the same.
> Here is the link to my proposal
> https://docs.google.com/document/d/1r9kzsU96kOYfIhWZx62jx4ALG-J_aJs5U0sDpwFUtts/edit?usp=sharing

Here are few comments on the proposal:

> The current Implementation of GCC first write the IL representation along 
> with other section in an assembly file and then the assembler is used to 
> convert it into LTO object files. Sections containing different IL 
> representation is created and data is appended in lto-streamer-out.cc.I

The .o generated withh -flto file contains the IL (in different
sections), debug info, symbol table, etc.
"along with other section" sounds odd to me. Perhaps sections.

Second sentence seems bit odd too. Perhaps "Streaming of individual
sections is implemented in lto-streamer-out.cc which can either be used
to produce assembly code containing the section data (dumped
hexadecimally) or simple-object API provided by libiberty to produce
object files directly"

> In the slim object file (Default when using -flto, fat lto can be obtained 
> using -ffat-lto-object) some section contains the IL and other contains the 
> info related to architecture, command line options, symbol table, etc. 

Technically the architecture is part of ELF header and not section
itself (I think).

There are some other grammar errors, but I am not too good on fixing
these, so perhaps Martin can help.

The timeline looks reasonable.  It certianly makes sense to look into
non-ELF object files to understand what API we need, but implementation
wise I would suggest implementing ELF path first to get a working
implementation. Adding support for other object formats can be done
incrementally.

Honza
> 
> On Tue, 4 Apr 2023 at 04:35, Jan Hubicka  wrote:
> 
> > Hello,
> > > While going through the patch and simple-object.c I understood that the
> > > file simple-object.c is used to handle the object file format. However,
> > > this file does not contain all the architecture information required for
> > > LTO object files, so the workaround used in the patch is to read the
> > > crtbegin.o file and merge the missing attributes. While this workaround
> > is
> > > functional, it is not optimal, and the ideal solution would be to extend
> > > simple-object.c to include the missing information.
> >
> > Yes, simple-object.c simply uses architecture settings it read earlier
> > which is problem since at compile time we do not read any object files,
> > just parse sources). In my original patch the architecture flags were
> > simply left blank.  I am not sure if there is a version reading
> > crtbeing.o which would probably not a be that bad workaround, at least
> > for the start.  Having a way to specify this from the machine descriptions
> > would be better.
> >
> 
> 
> >
> > Besides the architecture bits, for simple-object files to work we need
> > to add the symbol table. For practically useful information we also need
> > to stream the debug info.
> >
> >
> > > Regarding the phrase "Support in the driver to properly execute *1
> > binary",
> > > it is not entirely clear what it refers to. My interpretation is that the
> > > compiler driver (the program that coordinates the compilation process)
> > > needs to be modified to correctly output LTO object files instead of
> > > assembler files (the current approach involves passing the -S and -o
> > > .o options) and also skip the assembler option while using
> > > -fbypass-asm option but I am not sure. Can Jan or Martin please shed some
> > > light on this?
> > Yes, compiler drivers decides what to do and it needs to know that with
> > -flto it does not need to produce assembly file and then invoke gas.  If
> > we go the way of reading in crtbegin.o it will also need to pass correct
> > crtbegin to *1 binary.  This is generally not that hard to do, just
> > needs to be done :)
> >
> Honza
> > >
> > > Thanks & Regards
> > >
> > > Rishi Raj
> > >
> > > On Sun, 2 Apr 2023 at 03:05, Rishi Raj  wrote:
> > >
> > > > Hii Everyone,
> > > > I had already expressed my interest in the " Bypass assembler when
> > > > generating LTO object files" project and making a proposal for the
> > same. I
> > > > know I should have done it earlier but I was admitted to the hospital
> > for
> > > > past few days :(.
> > > > I have a few doubts.
> > > > 1)
> > > >
> > > > "One problem is that the object files produced by
> > libiberty/simple-object.c
> > > > (which is the low-level API used by the LTO code)
> > > > are missing some information (such as the architecture info and symbol
> > > > table) and API of the simple object will need to be extended to handle
> > > > that" I found this in the previous mailing list discussion. So who
> > output this information currently in the object file, is it assembler?
> > > >
> > > > Also in the current patch for this project by Jan Hubica, from where
> > 

Re: [GSOC] few question about Bypass assembler when generating LTO object files

2023-04-04 Thread Jan Hubicka via Gcc
> Thanks to Martin, Honza, and Théo for your feedback. I have incorporated
> almost all of it, updated my proposal accordingly, and submitted it.
> Regarding grammar errors, I have fixed many, but there may still be some
> left (I could be better at grammar, to be honest :( ).

I could be better too, I think grammar is not critical here.
Thanks a lot for making and submitting the proposal.

Honza
> 
> On Tue, 4 Apr 2023 at 15:55, Jan Hubicka  wrote:
> 
> > Hello,
> > > Thanks, Jan for the Reply! I have completed a draft proposal for this
> > > project. I will appreciate your's, Martin's, or anybody else feedback on
> > > the same.
> > > Here is the link to my proposal
> > >
> > https://docs.google.com/document/d/1r9kzsU96kOYfIhWZx62jx4ALG-J_aJs5U0sDpwFUtts/edit?usp=sharing
> >
> > Here are few comments on the proposal:
> >
> > > The current Implementation of GCC first write the IL representation
> > along with other section in an assembly file and then the assembler is used
> > to convert it into LTO object files. Sections containing different IL
> > representation is created and data is appended in lto-streamer-out.cc.I
> >
> > The .o generated withh -flto file contains the IL (in different
> > sections), debug info, symbol table, etc.
> > "along with other section" sounds odd to me. Perhaps sections.
> >
> > Second sentence seems bit odd too. Perhaps "Streaming of individual
> > sections is implemented in lto-streamer-out.cc which can either be used
> > to produce assembly code containing the section data (dumped
> > hexadecimally) or simple-object API provided by libiberty to produce
> > object files directly"
> >
> > > In the slim object file (Default when using -flto, fat lto can be
> > obtained using -ffat-lto-object) some section contains the IL and other
> > contains the info related to architecture, command line options, symbol
> > table, etc.
> >
> > Technically the architecture is part of ELF header and not section
> > itself (I think).
> >
> > There are some other grammar errors, but I am not too good on fixing
> > these, so perhaps Martin can help.
> >
> > The timeline looks reasonable.  It certianly makes sense to look into
> > non-ELF object files to understand what API we need, but implementation
> > wise I would suggest implementing ELF path first to get a working
> > implementation. Adding support for other object formats can be done
> > incrementally.
> >
> > Honza
> > >
> > > On Tue, 4 Apr 2023 at 04:35, Jan Hubicka  wrote:
> > >
> > > > Hello,
> > > > > While going through the patch and simple-object.c I understood that
> > the
> > > > > file simple-object.c is used to handle the object file format.
> > However,
> > > > > this file does not contain all the architecture information required
> > for
> > > > > LTO object files, so the workaround used in the patch is to read the
> > > > > crtbegin.o file and merge the missing attributes. While this
> > workaround
> > > > is
> > > > > functional, it is not optimal, and the ideal solution would be to
> > extend
> > > > > simple-object.c to include the missing information.
> > > >
> > > > Yes, simple-object.c simply uses architecture settings it read earlier
> > > > which is problem since at compile time we do not read any object files,
> > > > just parse sources). In my original patch the architecture flags were
> > > > simply left blank.  I am not sure if there is a version reading
> > > > crtbeing.o which would probably not a be that bad workaround, at least
> > > > for the start.  Having a way to specify this from the machine
> > descriptions
> > > > would be better.
> > > >
> > >
> > >
> > > >
> > > > Besides the architecture bits, for simple-object files to work we need
> > > > to add the symbol table. For practically useful information we also
> > need
> > > > to stream the debug info.
> > > >
> > > >
> > > > > Regarding the phrase "Support in the driver to properly execute *1
> > > > binary",
> > > > > it is not entirely clear what it refers to. My interpretation is
> > that the
> > > > > compiler driver (the program that coordinates the compilation
> > process)
> > > > > needs to be modified to correctly output LTO object files instead of
> > > > > assembler files (the current approach involves passing the -S and -o
> > > > > .o options) and also skip the assembler option while
> > using
> > > > > -fbypass-asm option but I am not sure. Can Jan or Martin please shed
> > some
> > > > > light on this?
> > > > Yes, compiler drivers decides what to do and it needs to know that with
> > > > -flto it does not need to produce assembly file and then invoke gas.
> > If
> > > > we go the way of reading in crtbegin.o it will also need to pass
> > correct
> > > > crtbegin to *1 binary.  This is generally not that hard to do, just
> > > > needs to be done :)
> > > >
> > > Honza
> > > > >
> > > > > Thanks & Regards
> > > > >
> > > > > Rishi Raj
> > > > >
> > > > > On Sun, 2 Apr 2023 at 03:05, Rishi Raj 
> > wrote:
> > > > >
> > > > > > Hii 

Re: Regarding bypass-asm patch and help in an error during bootstrapped build

2023-07-06 Thread Jan Hubicka via Gcc
> Hi,
> 
> I have added the patch (
> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623379.html ) on the
> devel/bypass-asm branch.
> Although I am able to build using the --disable-bootstrap option but while
> doing a bootstrapped build, I am getting these errors ( as warnings while
> doing the non-bootstrapped build.)
> 
> In file included from ../../gcc/gcc/lto-object.cc:23:0:
> ../../gcc/gcc/is-a.h:196:22: error: inline function ‘static bool
> is_a_helper::test(U*) [with U = symtab_node; T = cgraph_node*]’ used but
> never defined [enabled by default]
>static inline bool test (U *p);
> 
>   ^
> ../../gcc/gcc/is-a.h:196:22: error: inline function ‘static bool
> is_a_helper::test(U*) [with U = symtab_node; T = varpool_node*]’ used
> but never defined [enabled by default]
> 
> 
> In file included from ../../gcc/gcc/coretypes.h:489:0,
>  from ../../gcc/gcc/lto/lto-lang.cc:23:
> ../../gcc/gcc/is-a.h:196:22: error inline function ‘static bool
> is_a_helper::test(U*) [with U = symtab_node; T = cgraph_node*]’ used but
> never defined [enabled by default]
>static inline bool test (U *p);
>   ^
> ../../gcc/gcc/is-a.h:196:22: error inline function ‘static bool
> is_a_helper::test(U*) [with U = symtab_node; T = varpool_node*]’ used
> but never defined [enabled by default]
> 
> I have tested the .symtab and dummy symbols addition on the
> non-bootstrapped build, and they are working fine. I also ran the lto test
> suite, and it passed as expected. I am looking into the error produced
> during bootstrapped build currently. I would appreciate any help/guidance
> regarding this.
This is because you miss some #inline or you do these in wrong order
(at some point it was decided to drop most #inlines inside header files,
so one needs to always do the transitive closure by  hand which is quite
anoying).

This is a conversion helper from cgraph_node to symtab_node, so pehraps
you need to include cgraph.h?

Honza
> 
> --
> Regards
> Rishi


Re: About addition of .symtab and .strtab sections in simple-object-elf.c

2023-06-11 Thread Jan Hubicka via Gcc
> Hi Everyone,
Hello,
> I am working on the GSOC project "Bypass Assembler when generating LTO
> object files." My mentors and I have decided to work on the ELF files
> first, so I will add .symtab along with the symbol __gnu_lto_slim to
> the ELF file as a first step.
> When I was going through the simple-object-elf.c:
> simple_object_elf_write_to_file() I found out that it writes the
> following:
> /* Write out a complete ELF file.
>Ehdr
>initial dummy Shdr
>user-created Shdrs
>.shstrtab Shdr
>user-created section data
>.shstrtab data  */
> and .symtab is missing here. To add the missing symtab I have thought
> of these two possible implementations.
> 1) Add it in simple-object-elf.c (based on -fbypass-asm flag).
> 2) We can add .symtab section in lto-object.cc along with other LTO sections.
> I am a bit skeptical about the second one as .symtab with other lto
> sections might be confusing. Any comments regarding which one should I
> proceed with will be helpful.

I think we need to take into consideration that eventually we want to
support also coff/xcoff and macho so writting symtab section directly
out of lto-object.cc (which is supposed to be file format agnostic) is
hard.  On the other hand libiberty knows nothing about internals of LTO
implementatoin.
I think we want to extend simple-object.c by (simple at first at
least) API to add symbols to the symbol table and interface it to
simple-object-elf.c via simple_object_functions. That way
lto-object.cc (which knows the symbol names but no particular file
format) can call generic simple-object.cc API which will in turn call
simple-object-elf.c to create the symtab section.

Honza
> --
> Thanks & Regards
> Rishi Raj


Re: [Predicated Ins vs Branches] O3 and PGO result in 2x performance drop relative to O2

2023-08-01 Thread Jan Hubicka via Gcc
> > If I comment it out as above patch, then O3/PGO can get 16% and 12% 
> > performance
> > improvement compared to O2 on x86.
> >
> > O2  O3  PGO
> > cycles  2,497,674,824   2,104,993,224   2,199,753,593
> > instructions10,457,508,646  9,723,056,131   10,457,216,225
> > branches2,303,029,380   2,250,522,323   2,302,994,942
> > branch-misses   0.00%   0.01%   0.01%
> >
> > The main difference in the compilation output about code around the 
> > miss-prediction
> > branch is:
> >   o In O2: predicated instruction (cmov here) is selected to eliminate above
> > branch. cmov is true better than branch here.
> >   o In O3/PGO: bitout() is inlined into encode_file(), and branch 
> > instruction
> > is selected. But this branch is obviously *unpredictable* and the 
> > compiler
> > doesn't know it. This why O3/PGO are are so bad for this program.
> >
> > Gcc doesn't support __builtin_unpredictable() which has been introduced by 
> > llvm.
> > Then I tried to see if __builtin_expect_with_probability(e,x, 0.5) can 
> > serve the
> > same purpose. The result is negative.
> 
> But does it appear to be predictable with your profiling data?

Also one thing is that __builtin_expect and
__builtin_expect_with_probability only affects the static branch
prediciton algorithm, so with profile feedback they are ignored on every
branch executed at least once during the train run.

setting probability 0.5 is really not exactly the same as hint that the
branch will be mispredicted, since modern CPUs handle well regularly
behaving branchs (such as a branch firing every even iteration of loop).

So I think having the builting is not a bad idea.  I was thinking if it
makes sense to represent it withing profile_probability type and I am
not convinced, since "unpredictable probability" sounds counceptually
odd and we would need to keep the flag intact over all probability
updates we do.  For things like loop exits we recompute probabilities
from frequencies after unrolling/vectorizaiton and other things and we
would need to invent new API to propagate the flag from previous
probability (which is not even part of the computation right now)

So I guess the challenge is how to pass this info down through the
optimization pipeline, since we would need to annotate gimple
conds/switches and manage it to RTL level.  On gimple we have flags and
on rtl level notes so there is space for it, but we would need to
maintain the info through CFG changes.

Auto-FDO may be interesting way to detect such branches.

Honza
> 
> > I think we could come to a conclusion that there must be something can 
> > improve in
> > Gcc's heuristic strategy about Predicated Instructions and branches, at 
> > least
> > for O3 and PGO.
> >
> > And can we add __builtin_unpredictable() support for Gcc? As usually it's 
> > hard
> > for the compiler to detect unpredictable branches.
> >
> > --
> > Cheers,
> > Changbin Du


Re: Patch regarding addition of .symtab while generating object file from libiberty [WIP]

2023-06-24 Thread Jan Hubicka via Gcc
> Hi,
Hi,
I am sorry for late reaction.
> I am working on the GSOC project "Bypass Assembler when generating LTO
> object files." So as a first step, I am adding .symtab along with
> __gnu_lto_slim symbol into it so that at a later stage, it can be
> recognized that this object file has been produced using -flto enabled.
> This patch is regarding the same. Although I am still testing this patch, I
> want general feedback on my code and design choice.
> I have extended simple_object_wrtie_struct to hold a list of symbols (
> similar to sections ). A function in simple-object.c to add symbols. I am
> calling this function in lto-object.cc to add __gnu_lto_v1.
> Right now, as we are only working on ELF support first, I am adding .symtab
> in elf object files only.
> 
> ---
>  gcc/lto-object.cc|   4 +-
>  include/simple-object.h  |  10 +++
>  libiberty/simple-object-common.h |  18 +
>  libiberty/simple-object-elf.c| 130 +--
>  libiberty/simple-object.c|  32 
>  5 files changed, 187 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/lto-object.cc b/gcc/lto-object.cc
> index cb1c3a6cfb3..680977cb327 100644
> --- a/gcc/lto-object.cc
> +++ b/gcc/lto-object.cc
> @@ -187,7 +187,9 @@ lto_obj_file_close (lto_file *file)
>int err;
> 
>gcc_assert (lo->base.offset == 0);
> -
> +  /*Add __gnu_lto_slim symbol*/
> +  if(flag_bypass_asm)
> +simple_object_write_add_symbol (lo->sobj_w, "__gnu_lto_slim",1,1);

You can probably do this unconditionally.  The ltrans files we produce
are kind of wrong by missing the symbol table currently.
> +simple_object_write_add_symbol(simple_object_write *sobj, const char *name,
> +size_t size, unsigned int align);

Symbols has much more properties in addition to sizes and alignments.
We will eventually need to get dwarf writting, so we will need to
support them. However right now we only do these fake lto object
symbols, so perhaps for start we could kep things simple and assume that
size is always 0 and align always 1 or so.

Overall this looks like really good start to me (both API and
imllementation looks reasonable to me and it is good that you follow the
coding convention).  I guess you can create a branch (see git info on
the gcc homepage) and put the patch there?

I am also adding Ian to CC as he is maintainer of the simple-object and
he may have some ideas.

Honza
> 
>  /* Release all resources associated with SIMPLE_OBJECT, including any
> simple_object_write_section's that may have been created.  */
> diff --git a/libiberty/simple-object-common.h
> b/libiberty/simple-object-common.h
> index b9d10550d88..df99c9d85ac 100644
> --- a/libiberty/simple-object-common.h
> +++ b/libiberty/simple-object-common.h
> @@ -58,6 +58,24 @@ struct simple_object_write_struct
>simple_object_write_section *last_section;
>/* Private data for the object file format.  */
>void *data;
> +  /*The start of the list of symbols.*/
> +  simple_object_symbol *symbols;
> +  /*The last entry in the list of symbols*/
> +  simple_object_symbol *last_symbol;
> +};
> +
> +/*A symbol in object file being created*/
> +struct simple_object_symbol_struct
> +{
> +  /*Next in the list of symbols attached to an
> +  simple_object_write*/
> +  simple_object_symbol *next;
> +  /*The name of this symbol. */
> +  char *name;
> +  /* Symbol value */
> +  unsigned int align;
> +  /* Symbol size */
> +  size_t size;
>  };
> 
>  /* A section in an object file being created.  */
> diff --git a/libiberty/simple-object-elf.c b/libiberty/simple-object-elf.c
> index eee07039984..cbba88186bd 100644
> --- a/libiberty/simple-object-elf.c
> +++ b/libiberty/simple-object-elf.c
> @@ -787,9 +787,9 @@ simple_object_elf_write_ehdr (simple_object_write
> *sobj, int descriptor,
>  ++shnum;
>if (shnum > 0)
>  {
> -  /* Add a section header for the dummy section and one for
> - .shstrtab.  */
> -  shnum += 2;
> +  /* Add a section header for the dummy section,
> + .shstrtab, .symtab and .strtab.  */
> +  shnum += 4;
>  }
> 
>ehdr_size = (cl == ELFCLASS32
> @@ -882,6 +882,51 @@ simple_object_elf_write_shdr (simple_object_write
> *sobj, int descriptor,
> errmsg, err);
>  }
> 
> +/* Write out an ELF Symbol*/
> +
> +static int
> +simple_object_elf_write_symbol(simple_object_write *sobj, int descriptor,
> +off_t offset, unsigned int st_name, unsigned int st_value,
> size_t st_size,
> +unsigned char st_info, unsigned char st_other, unsigned int
> st_shndx,
> +const char **errmsg, int *err)
> +{
> +  struct simple_object_elf_attributes *attrs =
> +(struct simple_object_elf_attributes *) sobj->data;
> +  const struct elf_type_functions* fns;
> +  unsigned char cl;
> +  size_t sym_size;
> +  unsigned char buf[sizeof (Elf64_External_Shdr)];
> +
> +  fns = attrs->type_functions;
> +  cl = attrs->ei_class;
> +
> +  sym_size = (cl == 

Re: Patch regarding addition of .symtab while generating object file from libiberty [WIP]

2023-06-26 Thread Jan Hubicka via Gcc
> > > +simple_object_write_add_symbol(simple_object_write *sobj, const char
> > *name,
> > > +size_t size, unsigned int align);
> >
> > Symbols has much more properties in addition to sizes and alignments.
> > We will eventually need to get dwarf writting, so we will need to
> > support them. However right now we only do these fake lto object
> > symbols, so perhaps for start we could kep things simple and assume that
> > size is always 0 and align always 1 or so.
> >
> > Overall this looks like really good start to me (both API and
> > imllementation looks reasonable to me and it is good that you follow the
> > coding convention).  I guess you can create a branch (see git info on
> > the gcc homepage) and put the patch there?
> >
> I can, but I don't have write access to git.  Also, for the next part, I am
This is easy to fix. Follow instructions here
https://gcc.gnu.org/gitwrite.html
and add me as sponsor.  There are also instructions how to produce a
branch on git correctly.
> thinking of properly configuring the driver
> to directly produce an executable or adding debug info. What do you think?

I think we should try to aim to something that can be tested.  If we add
symbol table and get the ELF header right, linker should accept the
object files and properly recognize them as LTO.

What exactly you mean by configuring the driver?

Honza
> --
> Regards
> Rishi
> 
> >
> > I am also adding Ian to CC as he is maintainer of the simple-object and
> > he may have some ideas.
> >
> > Honza
> > >
> > >  /* Release all resources associated with SIMPLE_OBJECT, including any
> > > simple_object_write_section's that may have been created.  */
> > > diff --git a/libiberty/simple-object-common.h
> > > b/libiberty/simple-object-common.h
> > > index b9d10550d88..df99c9d85ac 100644
> > > --- a/libiberty/simple-object-common.h
> > > +++ b/libiberty/simple-object-common.h
> > > @@ -58,6 +58,24 @@ struct simple_object_write_struct
> > >simple_object_write_section *last_section;
> > >/* Private data for the object file format.  */
> > >void *data;
> > > +  /*The start of the list of symbols.*/
> > > +  simple_object_symbol *symbols;
> > > +  /*The last entry in the list of symbols*/
> > > +  simple_object_symbol *last_symbol;
> > > +};
> > > +
> > > +/*A symbol in object file being created*/
> > > +struct simple_object_symbol_struct
> > > +{
> > > +  /*Next in the list of symbols attached to an
> > > +  simple_object_write*/
> > > +  simple_object_symbol *next;
> > > +  /*The name of this symbol. */
> > > +  char *name;
> > > +  /* Symbol value */
> > > +  unsigned int align;
> > > +  /* Symbol size */
> > > +  size_t size;
> > >  };
> > >
> > >  /* A section in an object file being created.  */
> > > diff --git a/libiberty/simple-object-elf.c
> > b/libiberty/simple-object-elf.c
> > > index eee07039984..cbba88186bd 100644
> > > --- a/libiberty/simple-object-elf.c
> > > +++ b/libiberty/simple-object-elf.c
> > > @@ -787,9 +787,9 @@ simple_object_elf_write_ehdr (simple_object_write
> > > *sobj, int descriptor,
> > >  ++shnum;
> > >if (shnum > 0)
> > >  {
> > > -  /* Add a section header for the dummy section and one for
> > > - .shstrtab.  */
> > > -  shnum += 2;
> > > +  /* Add a section header for the dummy section,
> > > + .shstrtab, .symtab and .strtab.  */
> > > +  shnum += 4;
> > >  }
> > >
> > >ehdr_size = (cl == ELFCLASS32
> > > @@ -882,6 +882,51 @@ simple_object_elf_write_shdr (simple_object_write
> > > *sobj, int descriptor,
> > > errmsg, err);
> > >  }
> > >
> > > +/* Write out an ELF Symbol*/
> > > +
> > > +static int
> > > +simple_object_elf_write_symbol(simple_object_write *sobj, int
> > descriptor,
> > > +off_t offset, unsigned int st_name, unsigned int st_value,
> > > size_t st_size,
> > > +unsigned char st_info, unsigned char st_other, unsigned int
> > > st_shndx,
> > > +const char **errmsg, int *err)
> > > +{
> > > +  struct simple_object_elf_attributes *attrs =
> > > +(struct simple_object_elf_attributes *) sobj->data;
> > > +  const struct elf_type_functions* fns;
> > > +  unsigned char cl;
> > > +  size_t sym_size;
> > > +  unsigned char buf[sizeof (Elf64_External_Shdr)];
> > > +
> > > +  fns = attrs->type_functions;
> > > +  cl = attrs->ei_class;
> > > +
> > > +  sym_size = (cl == ELFCLASS32
> > > +   ? sizeof (Elf32_External_Shdr)
> > > +   : sizeof (Elf64_External_Shdr));
> > > +  memset (buf, 0, sizeof (Elf64_External_Shdr));
> > > +
> > > +  if(cl==ELFCLASS32)
> > > +  {
> > > +ELF_SET_FIELD(fns, cl, Sym, buf, st_name, Elf_Word, st_name);
> > > +ELF_SET_FIELD(fns, cl, Sym, buf, st_value, Elf_Addr, st_value);
> > > +ELF_SET_FIELD(fns, cl, Sym, buf, st_size, Elf_Addr, st_size);
> > > +buf[4]=st_info;
> > > +buf[5]=st_other;
> > > +ELF_SET_FIELD(fns, cl, Sym, buf, st_shndx, Elf_Half, st_shndx);
> > > +  }
> > > +  else
> > > +  {
> > > +

Re: libgcov, fork, and mingw (and other targets without the full POSIX set)

2023-12-01 Thread Jan Hubicka via Gcc
> On Dez 01 2023, Richard Biener via Gcc wrote:
> 
> > Hmm, so why's it then referenced and not "GCed"?
> 
> This has nothing to do with garbage collection.  It's just the way
> libgcc avoids having too many source files.  It would be exactly the
> same if every function were in its own file.

THe ifdef machinery makes every function to go insto its own .o file
which are then archived.  So if user code never calls to fork, the .o
file with fork wrapper should not be picked by linker and we should not
have link error.

If user code calls fork, then the .o file with wrapper should be picked
and we will get linker error on missing fork.  So I think it ought to
work as it is now.  Does mingw linker behave somehow differently with
archives?  Or is there problem with a libgcov being DLL or something?

Honza
> 
> -- 
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."


Re: How stable is the CFG and basic block IDs?

2024-04-30 Thread Jan Hubicka via Gcc
> > The problem is testing. If gcc would re-number the basic blocks then
> > tests comparing hard-coded test paths would break, even though the path
> > coverage itself would be just fine (and presumably the change to the
> > basic block indices), which would add an unreasonable maintenance
> > burden. If anything, it feels very fragile to write tests against the
> > basic block indices.
> 
> Problematic is usually when early canonicalization changes the
> direction of a branch which affects the block IDs of the true/false
> destinations (and their downstream blocks).

I think we can renumber Bbs sequentially before profile generation (or
maybe we do already).  But indeed the CfG may change with time.
> 
> > On the other hand, if it can be expected that the same code should
> > always yield the same CFG, the same BBs, and the same BB indices then I
> > would happily test against them. I suppose this makes the basic blocks
> > basically a public interface, which granted feels odd.
> >
> > If you have any good idea on how to test paths in a robust way please
> > let me know.
> 
> Is there enough info to print a path like the following?
> 
> path not covered: t.c:2(true) t.c:4(false) t.c:11(true) ...
> 
> instead of printing (condition destination?) block IDs?

This was my first idea too.  Thre can be multiple BBs per line but you
can print both BB ID and source file nameand ignore the BB ID for
testsuite pruposes.

Honza
> 
> Richard.
> 
> >
> > Thanks,
> > Jørgen


GNU Tools Cauldron 2024

2024-05-09 Thread Jan Hubicka via Gcc
Hello,
we are pleased to invite you all to the next GNU Tools Cauldron,
taking place in Prague, Czech Republic, on September 14-16, 2024.

As for the previous instances, we have setup a wiki page for
details:

  https://gcc.gnu.org/wiki/cauldron2024


Like last year, we are having to charge for attendance.  We are still
working out what we will need to charge.

Attendance will remain free for community volunteers and others who do
not have a commercial backer and we will be providing a small number of
travel bursaries for students to attend.

For all details of how to register, and how to submit a proposal for a  
track session, please see the wiki page.

The Cauldron is organized by a group of volunteers. We are keen to add
some more people so others can stand down. If you'd like to be part of
that organizing committee, please email the same address.

This announcement is being sent to the main mailing list of the
following groups: GCC, GDB, binutils, CGEN, DejaGnu, newlib and glibc.

Please feel free to share with other groups as appropriate.

Honza (on behalf of the GNU Tools Cauldron organizing committee).


Revert accidental change in ipa-modref-tree.h

2021-10-11 Thread Jan Hubicka via Gcc-patches
Hi,
I managed to commit an unrelatd change that was sitting my tree that
breaks bootstrap.  I have reverted it now and checked bootstrap gets
past the failing point (still waiting for full bootstrap to
finish at x86_64-linux).

Honza

gcc/ChangeLog:

* ipa-modref-tree.h (struct modref_access_node): Revert
accidental change.
(struct modref_ref_node): Likewise.

diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index 52f225b1aae..9795e2b8405 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -148,8 +148,7 @@ struct GTY(()) modref_access_node
   poly_int64 offset1, poly_int64 size1, poly_int64 max_size1,
   bool record_adjustments)
 {
-  if (known_eq (parm_offset, parm_offset1)
- && known_eq (offset, offset1)
+  if (known_eq (offset, offset1)
  && known_eq (size, size1)
  && known_eq (max_size, max_size1))
return;
@@ -578,10 +577,6 @@ struct GTY((user)) modref_ref_node
  }
(*accesses)[best1].forced_merge (best2 < 0 ? a : (*accesses)[best2],
 record_adjustments);
-   /* CHeck that merging indeed merged ranges.  */
-   gcc_checking_assert ((*accesses)[best1].contains (best2 < 0 ? a : 
(*accesses)[best2]));
-   /*if (best2 >= 0)
- accesses->unordered_remove (best2);*/
if (!(*accesses)[best1].useful_p ())
  {
collapse ();


Re: [Patch] (was: Re: [r12-4457 Regression] FAIL: gfortran.dg/deferred_type_param_6.f90 -Os execution test on Linux/x86_64)

2021-10-16 Thread Jan Hubicka via Gcc-patches
> 
> Fortran has for a long time 'character(len=5), allocatable" or
> "character(len=*)". In the first case, the "5" can be ignored as both
> caller and callee know the length. In the second case, the length is
> determined by the argument, but it cannot be changed.
> 
> Since a not-that-short while, 'len=:' together with allocatable/pointer
> is supported.
> 
> In the latter case, the value can be change when the array
> association/allocation is changed.
> 
> I attached a patch, which was not tested. I am not quite sure whether
> the pointer address can actually escape or not - I think cannot but I
> played safe.
> 
> Tobias
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
> München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
> Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
> München, HRB 106955

> Fortran: Fix fn spec for character-returning functions
> 
> gcc/fortran/ChangeLog
>   * trans-types.c (create_fn_spec): For character-returning functions,
>   set the hidden string-length argument to 'R' only when the "len=:",
>   i.e. deferred length which goes alongside with allocatable/pointer.
> 
> diff --git a/gcc/fortran/trans-types.c b/gcc/fortran/trans-types.c
> index 220976babb8..637d2c71d01 100644
> --- a/gcc/fortran/trans-types.c
> +++ b/gcc/fortran/trans-types.c
> @@ -3008,7 +3008,14 @@ create_fn_spec (gfc_symbol *sym, tree fntype)
>   }
>if (sym->ts.type == BT_CHARACTER)
>   {
> -   spec[spec_len++] = 'R';
> +   if (!sym->ts.u.cl->length
> +   && ((sym->attr.allocatable && sym->attr.target)
> +   || sym->attr.pointer))
> + spec[spec_len++] = '.';
> +   if (!sym->ts.u.cl->length && sym->attr.allocatable)
> + spec[spec_len++] = 'w';
> +   else
> + spec[spec_len++] = 'R';
> spec[spec_len++] = ' ';

Thanks a lot! I was just looking into that function and was quite
confused on what is going there. Are you going to commit the patch?
Also escaping is quite important bit of information so it would be
good to figure out if it really can escape rather than playing safe.

Honza
>   }
>  }



Re: [r12-4457 Regression] FAIL: gfortran.dg/deferred_type_param_6.f90 -Os execution test on Linux/x86_64

2021-10-16 Thread Jan Hubicka via Gcc-patches
Hi,
> 
> FAIL: gfortran.dg/deferred_type_param_6.f90   -O1  execution test
> FAIL: gfortran.dg/deferred_type_param_6.f90   -Os  execution test
Sorry for the breakage.  This time it seems like bug in Fortran FE
which was previously latent:

__attribute__((fn spec (". . R ")))
void subfunc (character(kind=1)[1:..__result] * & __result, integer(kind=8) * 
.__result)
{
  # PT = nonlocal 
  character(kind=1)[1:..__result] * & __result_3(D) = __result;
  # PT = nonlocal null 
  integer(kind=8) * .__result_5(D) = .__result;
  integer(kind=4) _1;

   [local count: 1073741824]:
  *__result_3(D) = 
  # USE = nonlocal escaped { D.4230 } (nonlocal, escaped)
  _1 = _gfortran_compare_string (5, , 5, &"FIVEC"[1]{lb: 1 sz: 1});
  if (_1 != 0)
goto ; [0.04%]
  else
goto ; [99.96%]

   [local count: 429496]:
  # USE = nonlocal escaped null 
  # CLB = nonlocal escaped null 
  _gfortran_stop_numeric (10, 0);

   [local count: 1073312329]:
  *.__result_5(D) = 5;
  return;
}

The fnspec ". . R " specifies that .__result is readonly however we
have:
  *.__result_5(D) = 5;

I am not sure I understand fortran FE well enough to figure out why
it is set so.  The function is declared as:

  function subfunc() result(res)
character(len=:), pointer :: res
res => fifec
if (len(res) /= 5) STOP 9
if (res /= "FIVEC") STOP 10
  end function subfunc

and we indeed optimize load of the result:
-  # USE = nonlocal escaped { D.4252 D.4254 } (nonlocal, escaped)
-  # CLB = nonlocal escaped { D.4254 } (escaped)
+  # USE = nonlocal escaped 
+  # CLB = { D.4254 }
   subfunc (, );
-  .s2_34 = slen.4;
-  # PT = nonlocal escaped null { D.4254 } (escaped)
-  s2_35 = pstr.5;
   pstr.5 ={v} {CLOBBER};

and I think tat is what breaks the testcase (I also verified that
ignoring the fnspec 'R' fixes it).

Honza
> 
> with GCC configured with
> 
> ../../gcc/configure 
> --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4457/usr
>  --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="dg.exp=gfortran.dg/deferred_type_param_6.f90 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="dg.exp=gfortran.dg/deferred_type_param_6.f90 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="dg.exp=gfortran.dg/deferred_type_param_6.f90 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="dg.exp=gfortran.dg/deferred_type_param_6.f90 
> --target_board='unix{-m64\ -march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at skpgkp2 at gmail dot com)


Cleanup compute_points_to_sets

2021-10-19 Thread Jan Hubicka via Gcc-patches
Hi,
this patch fixes two issues I noticed while proofreading the code.
First is that I have added conditional around setting of nonlocal and
escaped flags (since they may be set from solver) while keeping the
variable in assignment that is confusing.

Second is that we still do not set pt in the case function has no memory
side effects.  In this case the call use is not going to be used since
uses_global_memory is false only if either function is const or modref
determined that all loads are from memory pointed to by parameters.  In
both cases we will disambiguate earlier before asking PTA oracle, but it
is better to avoid stale PTA sets (which shows in -alias dumps etc.)

Most of builtins are not modifying global memory, one option would be to
stick another flag into the fnspecs strings for this property.

Bootstrapped/regtested x86_64-linux, OK?

Honza

gcc/ChangeLog:

* tree-ssa-structalias.c (compute_points_to_sets): Cleanup.

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 2e6513bb72a..35971a54e02 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -7550,8 +7550,8 @@ compute_points_to_sets (void)
 always escaped.  */
  if (uses_global_memory)
{
- pt->nonlocal = uses_global_memory;
- pt->escaped = uses_global_memory;
+ pt->nonlocal = 1;
+ pt->escaped = 1;
}
}
  else if (uses_global_memory)
@@ -7561,6 +7561,8 @@ compute_points_to_sets (void)
  *pt = cfun->gimple_df->escaped;
  pt->nonlocal = 1;
}
+ else
+   memset (pt, 0, sizeof (struct pt_solution));
}
 
  pt = gimple_call_clobber_set (stmt);
@@ -7582,8 +7584,8 @@ compute_points_to_sets (void)
 always escaped.  */
  if (writes_global_memory)
{
- pt->nonlocal = writes_global_memory;
- pt->escaped = writes_global_memory;
+ pt->nonlocal = 1;
+ pt->escaped = 1;
}
}
  else if (writes_global_memory)
@@ -7593,6 +7595,8 @@ compute_points_to_sets (void)
  *pt = cfun->gimple_df->escaped;
  pt->nonlocal = 1;
}
+ else
+   memset (pt, 0, sizeof (struct pt_solution));
}
}
 }


Fix wrong code in ldist-strlen-1.c

2021-10-16 Thread Jan Hubicka via Gcc-patches
Hi,
while updating compute_points_to_sets I missed that the code not only
sets the nonlocal/escaped flags but also initializes pt.  With my
previous change if uses_global_memory is false pt is not updated
correctly which may lead to wrong code.

This is fixed by the following patch I comitted to avoid strange
misoptimizations.

Bootstrapped/regtested x86_64-linux and also tested with LTO.

Honza

gcc/ChangeLog:

PR tree-optimization/102720
* tree-ssa-structalias.c (compute_points_to_sets): Fix producing
of call used and clobbered sets.

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 6f12a66ee0d..2e6513bb72a 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -7541,17 +7541,18 @@ compute_points_to_sets (void)
  determine_global_memory_access (stmt, NULL,
  _global_memory,
  _global_memory);
- if (!uses_global_memory)
-   ;
- else if ((vi = lookup_call_use_vi (stmt)) != NULL)
+ if ((vi = lookup_call_use_vi (stmt)) != NULL)
{
  *pt = find_what_var_points_to (cfun->decl, vi);
  /* Escaped (and thus nonlocal) variables are always
 implicitly used by calls.  */
  /* ???  ESCAPED can be empty even though NONLOCAL
 always escaped.  */
- pt->nonlocal = uses_global_memory;
- pt->escaped = uses_global_memory;
+ if (uses_global_memory)
+   {
+ pt->nonlocal = uses_global_memory;
+ pt->escaped = uses_global_memory;
+   }
}
  else if (uses_global_memory)
{
@@ -7572,17 +7573,18 @@ compute_points_to_sets (void)
  determine_global_memory_access (stmt, _global_memory,
  NULL, NULL);
 
- if (!writes_global_memory)
-   ;
- else if ((vi = lookup_call_clobber_vi (stmt)) != NULL)
+ if ((vi = lookup_call_clobber_vi (stmt)) != NULL)
{
  *pt = find_what_var_points_to (cfun->decl, vi);
  /* Escaped (and thus nonlocal) variables are always
 implicitly clobbered by calls.  */
  /* ???  ESCAPED can be empty even though NONLOCAL
 always escaped.  */
- pt->nonlocal = writes_global_memory;
- pt->escaped = writes_global_memory;
+ if (writes_global_memory)
+   {
+ pt->nonlocal = writes_global_memory;
+ pt->escaped = writes_global_memory;
+   }
}
  else if (writes_global_memory)
{


Re: [PATCH] ipa-param-manip: Be careful about a reallocating hash_map (PR 103449)

2021-11-29 Thread Jan Hubicka via Gcc-patches
> Hi,
> 
> PR 103449 revealed that when I was storing result of one hash_map
> lookup into another entry in the hash_map, I was still accessing the
> entry in the table, which meanwhile could get reallocated, making the
> accesses invalid-after-free.
> 
> Fixed with the following, which also simplifies the return statement
> which must have been true even now.
> 
> Bootstrapped and tested on x86_64-linux.  OK for master?
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2021-11-29  Martin Liska  
>   Martin Jambor  
> 
>   PR ipa/103449
>   * ipa-param-manipulation.c
>   (ipa_param_body_adjustments::prepare_debug_expressions): Be
>   careful about hash_map reallocating itself.  Simpify a return
>   which always returns true.
OK, thanks!
Honza


Re: [PATCH] alias: Optimise call_may_clobber_ref_p

2021-12-06 Thread Jan Hubicka via Gcc-patches
> On Mon, Dec 6, 2021 at 4:03 PM Richard Biener
>  wrote:
> >
> > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
> >  wrote:
> > >
> > > Richard Biener  writes:
> > > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
> > > >  wrote:
> > > >>
> > > >> When compiling an optabs.ii at -O2 with a release-checking build,
> > > >> we spent a lot of time in call_may_clobber_ref_p.  One of the
> > > >> first things the function tries is the get_modref_function_summary
> > > >> approach, but that also seems to be the most expensive check.

In general it should not be that expensive since the trees are generally
small (despite the 3 nested loops making them look dangerously big) and
most of time we resort to simple pointer/array lookups.

>From lcov stats https://splichal.eu/lcov/gcc/tree-ssa-alias.c.gcov.html
(which are quite useful when trying to understand typical behaviour of
the alias oracle)

In call_may_clobber_ref_p we get

2980 :  256141790 :   callee = gimple_call_fndecl (call);
2981 :: 
2982 :  256141790 :   if (callee != NULL_TREE && 
!ref->volatile_p)
2983 :: {
2984 :  240446150 :   struct cgraph_node *node = 
cgraph_node::get (callee);
this is direct pointer from node
2985 :  240446150 :   if (node)
2986 :: {
2987 :  240234208 :   modref_summary *summary = 
get_modref_function_summary (node);
This is fast summary so it is array lookup
2988 :  240234208 :   if (summary)
2989 :: {
2990 :   19732392 :   if (!modref_may_conflict 
(call, summary->stores, ref, tbaa_p)

And we get here in 8% of invocations

And in modref_may_conflict we have:


   21147836 : modref_may_conflict (const gcall *stmt,
2552 ::  modref_tree 
 *tt, ao_ref *ref, bool tbaa_p)
2553 :: {
2554 :   21147836 :   alias_set_type base_set, ref_set;
2555 :: 
2556 :   21147836 :   if (tt->every_base)
2557 :: return true;
2558 :: 
2559 :3407876 :   if (!dbg_cnt (ipa_mod_ref))
2560 :: return true;
2561 :: 
2562 :3407876 :   base_set = ao_ref_base_alias_set 
(ref);
2563 :: 
2564 :3407876 :   ref_set = ao_ref_alias_set (ref);
All direct lookups in most cases
2565 :: 
2566 :3407876 :   int num_tests = 0, max_tests = 
param_modref_max_tests;
2567 :   14479077 :   for (auto base_node : tt->bases)
2568 :: {
2569 :6171739 :   if (tbaa_p && 
flag_strict_aliasing)
At average there are 2 bases travelled by the loop.
2570 :: {
2571 :5253690 :   if (num_tests >= max_tests)
2572 :: return true;
2573 :5253690 :   alias_stats.modref_tests++;
2574 :5253690 :   if (!alias_sets_conflict_p 
(base_set, base_node->base))
alias set checks cheecks are also reasonably cheap and 50% of time avoids 
walking rest of the tree
2575 :2238398 : continue;
2576 :3015292 :   num_tests++;
2577 :: }
2578 :: 
2579 :3933341 :   if (base_node->every_ref)

We hit the loop only 0.15 times per invocation of the function

2580 :: return true;
2581 :: 
2582 :   14943624 :   for (auto ref_node : 
base_node->refs)
2583 :: {
2584 ::   /* Do not repeat same test as 
before.  */
2585 :4381325 :   if ((ref_set != base_set || 
base_node->base != ref_node->ref)

and usually visit only bit more htan one refs
2586 :2548127 :   && tbaa_p && 
flag_strict_aliasing)
2587 :: {
2588 :1840866 :   if (num_tests >= 
max_tests)
2589 :: return true;
2590 :1809594 :   
alias_stats.modref_tests++;
2591 :1809594 :   if 
(!alias_sets_conflict_p (ref_set, ref_node->ref))

Fix early exit in modref_merge_call_site_flags

2021-12-19 Thread Jan Hubicka via Gcc-patches
Hi,
when adding support for static chain and return slot flags I forgot to update
early exit condition in modref_merge_call_site_flags.  This yields to wrong
code as demonstrated by the Fortran testcase attached to PR (which I hope
someone will help me to turn into testuite one).

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

2021-12-19  Jan Hubicka  

PR ipa/103766
* ipa-modref.c (modref_merge_call_site_flags): Fix early exit condition

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index d3590f0b62b..9c411a6297a 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -5019,9 +5019,15 @@ modref_merge_call_site_flags (escape_summary *sum,
   bool changed = false;
   bool ignore_stores = ignore_stores_p (caller, callee_ecf_flags);
 
-  /* If we have no useful info to propagate.  */
-  if ((!cur_summary || !cur_summary->arg_flags.length ())
-  && (!cur_summary_lto || !cur_summary_lto->arg_flags.length ()))
+  /* Return early if we have no useful info to propagate.  */
+  if ((!cur_summary
+   || (!cur_summary->arg_flags.length ()
+  && !cur_summary->static_chain_flags
+  && !cur_summary->retslot_flags))
+  && (!cur_summary_lto
+ || (!cur_summary_lto->arg_flags.length ()
+ && !cur_summary_lto->static_chain_flags
+ && !cur_summary_lto->retslot_flags)))
 return false;
 
   FOR_EACH_VEC_ELT (sum->esc, i, ee)


Fix handling of deferred SSA names in modref

2021-12-19 Thread Jan Hubicka via Gcc-patches
Hi,
in the testcase we fail to analyze SSA name because flag do_dataflow is set
and thus triggers early exist in analyze_ssa_name.  Fixed by disabling
early exits when handling deferred names.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

2021-12-20  Jan Hubicka  

PR ipa/103669
* ipa-modref.c (modref_eaf_analysis::analyze_ssa_name): Add deferred
parameter.
(modref_eaf_analysis::propagate): Use it.

gcc/testsuite/ChangeLog:

2021-12-20  Jan Hubicka  

PR ipa/103669
* g++.dg/torture/pr103669.C: New test.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 9c411a6297a..733fc212fcc 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -2232,7 +2232,7 @@ class modref_eaf_analysis
 {
 public:
   /* Mark NAME as relevant for analysis.  */
-  void analyze_ssa_name (tree name);
+  void analyze_ssa_name (tree name, bool deferred = false);
   /* Dataflow slover.  */
   void propagate ();
   /* Return flags computed earlier for NAME.  */
@@ -2373,33 +2373,36 @@ callee_to_caller_flags (int call_flags, bool 
ignore_stores,
are processed later)  */
 
 void
-modref_eaf_analysis::analyze_ssa_name (tree name)
+modref_eaf_analysis::analyze_ssa_name (tree name, bool deferred)
 {
   imm_use_iterator ui;
   gimple *use_stmt;
   int index = SSA_NAME_VERSION (name);
 
-  /* See if value is already computed.  */
-  if (m_lattice[index].known || m_lattice[index].do_dataflow)
-   return;
-  if (m_lattice[index].open)
+  if (!deferred)
 {
-  if (dump_file)
-   fprintf (dump_file,
-"%*sCycle in SSA graph\n",
-m_depth * 4, "");
-  return;
-}
-  /* Recursion guard.  */
-  m_lattice[index].init ();
-  if (m_depth == param_modref_max_depth)
-{
-  if (dump_file)
-   fprintf (dump_file,
-"%*sMax recursion depth reached; postponing\n",
-m_depth * 4, "");
-  m_deferred_names.safe_push (name);
-  return;
+  /* See if value is already computed.  */
+  if (m_lattice[index].known || m_lattice[index].do_dataflow)
+   return;
+  if (m_lattice[index].open)
+   {
+ if (dump_file)
+   fprintf (dump_file,
+"%*sCycle in SSA graph\n",
+m_depth * 4, "");
+ return;
+   }
+  /* Recursion guard.  */
+  m_lattice[index].init ();
+  if (m_depth == param_modref_max_depth)
+   {
+ if (dump_file)
+   fprintf (dump_file,
+"%*sMax recursion depth reached; postponing\n",
+m_depth * 4, "");
+ m_deferred_names.safe_push (name);
+ return;
+   }
 }
 
   if (dump_file)
@@ -2742,10 +2745,9 @@ modref_eaf_analysis::propagate ()
   while (m_deferred_names.length ())
 {
   tree name = m_deferred_names.pop ();
-  m_lattice[SSA_NAME_VERSION (name)].open = false;
   if (dump_file)
fprintf (dump_file, "Analyzing deferred SSA name\n");
-  analyze_ssa_name (name);
+  analyze_ssa_name (name, true);
 }
 
   if (!m_names_to_propagate.length ())
diff --git a/gcc/testsuite/g++.dg/torture/pr103669.C 
b/gcc/testsuite/g++.dg/torture/pr103669.C
new file mode 100644
index 000..a9509c354f1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr103669.C
@@ -0,0 +1,22 @@
+// { dg-do run }
+/* { dg-additional-options "--param=modref-max-depth=1" } */
+#include 
+
+typedef std::list PtrList;
+
+void
+SlList (PtrList *l)
+{
+  PtrList temp = *l;
+  PtrList::iterator iter;
+  for (iter = temp.begin (); iter != temp.end (); ++iter)
+__builtin_abort ();
+}
+
+int
+main (void)
+{
+  PtrList list;
+  SlList ();
+  return 0;
+}


Re: Patch ping

2022-01-03 Thread Jan Hubicka via Gcc-patches
> Hi!
> 
> I'd like to ping the
> https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586553.html
>   symtab: Fold  ==  to 0 if folding_initializer [PR94716]
> 
> patch.  Thanks.
OK.
Note that with LTO partitioning it may happen that alias is defined in
one partition but used in another.  We do take care to bring the symtab
info with it so it should be safe.

Honza
> 
>   Jakub
> 


Re: [PATCH] Fix ICE in lsplit when built with -O3 -fno-guess-branch-probability [PR103793]

2021-12-28 Thread Jan Hubicka via Gcc-patches
> - /* Proportion second loop's bb counts except those dominated by false
> -branch to avoid drop 1s down.  */
> - basic_block bbi_copy = get_bb_copy (false_edge->dest);
> - bbs2 = get_loop_body (loop2);
> - for (j = 0; j < loop2->num_nodes; j++)
> -   if (bbs2[j] == loop2->latch
> -   || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
> - bbs2[j]->count = bbs2[j]->count.apply_probability (
> -   true_edge->probability.invert ());
> - free (bbs2);
> + if (true_edge->probability.initialized_p ())
> +   {
> + edge exit_to_latch1 = single_pred_edge (loop1->latch);
> + exit_to_latch1->probability
> +   = exit_to_latch1->probability.apply_scale (
> + true_edge->probability.to_reg_br_prob_base (),
> + REG_BR_PROB_BASE);
This should be
  exit_to_latch1->probability *= true_edge->probability;
whici will do the right thing to undefined probabilities and will not
cause unnecesary roundoff errors and precision info loss.

Can you please update that in the patch (and drop the initialized_p
check)?

Honza


Re: [PATCH 1/3] loop-invariant: Don't move cold bb instructions to preheader in RTL

2021-12-29 Thread Jan Hubicka via Gcc-patches
> > 
> > From: Xiong Hu Luo 
> > 
> > gcc/ChangeLog:
> > 
> > * loop-invariant.c (find_invariants_bb): Check profile count
> > before motion.
> > (find_invariants_body): Add argument.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.dg/loop-invariant-2.c: New.
OK,
thanks!
Honza


Re: [Bug tree-optimization/103168] Value numbering for PRE of pure functions can be improved

2021-11-22 Thread Jan Hubicka via Gcc-bugs
This is bit modified patch I am testing.  I added pre-computation of the
number of accesses, enabled the path for const functions (in case they
have memory operand), initialized alias sets and clarified the logic
around every_* and global_memory_accesses

PR tree-optimization/103168
(modref_summary::finalize): Initialize load_accesses.
* ipa-modref.h (struct modref_summary): Add load_accesses.
* tree-ssa-sccvn.c (visit_reference_op_call): Use modref
info to walk the virtual use->def chain to CSE pure
function calls.

* g++.dg/tree-ssa/pr103168.C: New testcase.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 4f9323165ea..595eb6e0d8f 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -725,6 +727,23 @@ modref_summary::finalize (tree fun)
break;
}
 }
+  if (loads->every_base)
+load_accesses = 1;
+  else
+{
+  load_accesses = 0;
+  for (auto base_node : loads->bases)
+   {
+ if (base_node->every_ref)
+   load_accesses++;
+ else
+   for (auto ref_node : base_node->refs)
+ if (ref_node->every_access)
+   load_accesses++;
+ else
+   load_accesses += ref_node->accesses->length ();
+   }
+}
 }
 
 /* Get function summary for FUNC if it exists, return NULL otherwise.  */
diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h
index f868eb6de07..a7937d74945 100644
--- a/gcc/ipa-modref.h
+++ b/gcc/ipa-modref.h
@@ -53,6 +53,8 @@ struct GTY(()) modref_summary
 
   /* Flags coputed by finalize method.  */
 
+  /* Total number of accesses in loads tree.  */
+  unsigned int load_accesses;
   /* global_memory_read is not set for functions calling functions
  with !binds_to_current_def which, after interposition, may read global
  memory but do nothing useful with it (except for crashing if some
diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr103168.C 
b/gcc/testsuite/g++.dg/tree-ssa/pr103168.C
new file mode 100644
index 000..82924a3e3ce
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr103168.C
@@ -0,0 +1,24 @@
+// { dg-do compile }
+// { dg-options "-O2 -fdump-tree-fre1-details" }
+
+struct a
+{
+  int a;
+  static __attribute__ ((noinline))
+  int ret (int v) {return v;}
+
+  __attribute__ ((noinline))
+  int inca () {return a++;}
+};
+
+int
+test()
+{
+  struct a av;
+  av.a=1;
+  int val = av.ret (0) + av.inca();
+  av.a=2;
+  return val + av.ret(0) + av.inca();
+}
+
+/* { dg-final { scan-tree-dump-times "Replaced a::ret" 1 "fre1" } } */
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index 149674e6a16..719f5184654 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -71,6 +71,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-loop-niter.h"
 #include "builtins.h"
 #include "fold-const-call.h"
+#include "ipa-modref-tree.h"
+#include "ipa-modref.h"
 #include "tree-ssa-sccvn.h"
 
 /* This algorithm is based on the SCC algorithm presented by Keith
@@ -5084,12 +5086,118 @@ visit_reference_op_call (tree lhs, gcall *stmt)
   struct vn_reference_s vr1;
   vn_reference_t vnresult = NULL;
   tree vdef = gimple_vdef (stmt);
+  modref_summary *summary;
 
   /* Non-ssa lhs is handled in copy_reference_ops_from_call.  */
   if (lhs && TREE_CODE (lhs) != SSA_NAME)
 lhs = NULL_TREE;
 
   vn_reference_lookup_call (stmt, , );
+
+  /* If the lookup did not succeed for pure functions try to use
+ modref info to find a candidate to CSE to.  */
+  const int accesses_limit = 8;
+  if (!vnresult
+  && !vdef
+  && lhs
+  && gimple_vuse (stmt)
+  && (((summary = get_modref_function_summary (stmt, NULL))
+  && !summary->global_memory_read
+  && summary->load_accesses < accesses_limit)
+ || gimple_call_flags (stmt) & ECF_CONST))
+{
+  /* First search if we can do someting useful and build a
+vector of all loads we have to check.  */
+  bool unknown_memory_access = false;
+  auto_vec accesses;
+
+  if (summary)
+   {
+ for (auto base_node : summary->loads->bases)
+   if (unknown_memory_access)
+ break;
+   else for (auto ref_node : base_node->refs)
+ if (unknown_memory_access)
+   break;
+ else for (auto access_node : ref_node->accesses)
+   {
+ accesses.quick_grow (accesses.length () + 1);
+ if (!access_node.get_ao_ref (stmt,  ()))
+   {
+ /* We could use get_call_arg (...) and initialize
+a ref based on the argument and unknown offset in
+some cases, but we have to get a ao_ref to
+disambiguate against other stmts.  */
+ unknown_memory_access = true;
+ break;
+   }
+ else
+   {
+ 

Re: [Bug driver/100937] configure: Add --enable-default-semantic-interposition

2021-11-22 Thread Jan Hubicka via Gcc-bugs
> (The -fno-semantic-interposition thing is probably the biggest performance gap
> between gcc -fpic and clang -fpic.)
Yep, it is often confusing to users (who do not understand what ELF
interposition is) that clang and gcc disagree on default flags here.
Recently -Ofast was extended to imply -fno-semantic-interposition that
will hopefully make more people notice this.

While doing that I have added per-symbol flag about interposition to the
symbol table, so we can also support 

__atttribute__ ((semantic_interposition))

and

__attribute__((no_semantic_interpoition))

if that would be useful for something.


Improve tracking of bases in modref

2021-11-21 Thread Jan Hubicka via Gcc-patches
Hi,
on exchange2 benchamrk we miss some useful propagation because modref gives
up very early on analyzing accesses through pointers.  For example in
int test (int *a)
{
  int i;
  for (i=0; a[i];i++);
  return i+a[i];
}

We are not able to determine that a[i] accesses are relative to a.
This is because get_access requires the SSA name that is in MEM_REF to be
PARM_DECL while on other places we use ipa-prop helper to work out the proper
base pointers.

This patch commonizes the code in get_access and parm_map_for_arg so both
use the check properly and extends it to also figure out that newly allocated
memory is not a side effect to caller.

It improves disambiguation rates:

Alias oracle query stats:
  refs_may_alias_p: 77359588 disambiguations, 102170294 queries
  ref_maybe_used_by_call_p: 645390 disambiguations, 78392252 queries
  call_may_clobber_ref_p: 386653 disambiguations, 389576 queries
  stmt_kills_ref_p: 106470 kills, 5685744 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 8923 queries
  nonoverlapping_refs_since_match_p: 30581 disambiguations, 65481 must 
overlaps, 97009 queries
  aliasing_component_refs_p: 56854 disambiguations, 15459249 queries
  TBAA oracle: 28236957 disambiguations 104812620 queries
   15360807 are in alias set 0
   8863925 queries asked about the same object
   99 queries asked about the same alias set
   0 access volatile
   50367859 are dependent in the DAG
   1982973 are aritificially in conflict with void *

Modref stats:
  modref kill: 71 kills, 8151 queries
  modref use: 25273 disambiguations, 704264 queries
  modref clobber: 1676006 disambiguations, 21805867 queries
  5264985 tbaa queries (0.241448 per modref query)
  762265 base compares (0.034957 per modref query)

PTA query stats:
  pt_solution_includes: 13460623 disambiguations, 40881373 queries
  pt_solutions_intersect: 1668037 disambiguations, 13958255 queries

to:

Alias oracle query stats:
  refs_may_alias_p: 77575173 disambiguations, 102390852 queries
  ref_maybe_used_by_call_p: 645932 disambiguations, 78607413 queries
  call_may_clobber_ref_p: 386813 disambiguations, 389693 queries
  stmt_kills_ref_p: 106551 kills, 5688432 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 8936 queries
  nonoverlapping_refs_since_match_p: 30583 disambiguations, 65514 must 
overlaps, 97044 queries
  aliasing_component_refs_p: 56847 disambiguations, 15459371 queries
  TBAA oracle: 28238952 disambiguations 104938558 queries
   15435200 are in alias set 0
   8876784 queries asked about the same object
   89 queries asked about the same alias set
   0 access volatile
   50400613 are dependent in the DAG
   1986920 are aritificially in conflict with void *

Modref stats:
  modref kill: 71 kills, 8130 queries
  modref use: 30684 disambiguations, 704287 queries
  modref clobber: 1694295 disambiguations, 21697882 queries
  5233712 tbaa queries (0.241208 per modref query)
  902240 base compares (0.041582 per modref query)

PTA query stats:
  pt_solution_includes: 13495059 disambiguations, 40917961 queries
  pt_solutions_intersect: 1667032 disambiguations, 13951159 queries

So 20% more modref use disambiguations which accounts to 0.3% overal
disambiguation and alo improves a bit situation with exchange2 benchmark,
while the real problem is still present (as dicussed in the pr)

gcc/ChangeLog:

2021-11-21  Jan Hubicka  

PR ipa/103227
* ipa-modref.c (parm_map_for_arg): Rename to ...
(parm_map_for_ptr): .. this one; handle static chain and calls to
malloc functions.
(modref_access_analysis::get_access): Use parm_map_for_ptr.
(modref_access_analysis::process_fnspec): Update.
(modref_access_analysis::analyze_load): Update.
(modref_access_analysis::analyze_store): Update.

gcc/testsuite/ChangeLog:

2021-11-21  Jan Hubicka  

PR ipa/103227
* gcc.dg/tree-ssa/modref-15.c: New test.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index a04e5855a9a..4f9323165ea 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -812,14 +812,15 @@ ignore_stores_p (tree caller, int flags)
   return false;
 }
 
-/* Determine parm_map for argument OP.  */
+/* Determine parm_map for PTR which is supposed to be a pointer.  */
 
 modref_parm_map
-parm_map_for_arg (tree op)
+parm_map_for_ptr (tree op)
 {
   bool offset_known;
   poly_int64 offset;
   struct modref_parm_map parm_map;
+  gcall *call;
 
   parm_map.parm_offset_known = false;
   parm_map.parm_offset = 0;
@@ -830,22 +831,26 @@ parm_map_for_arg (tree op)
   && TREE_CODE (SSA_NAME_VAR (op)) == PARM_DECL)
 {
   int index = 0;
-  for (tree t = DECL_ARGUMENTS (current_function_decl);
-  t != SSA_NAME_VAR (op); t = DECL_CHAIN (t))
-   {
- if (!t)
-   {
- index = MODREF_UNKNOWN_PARM;
- break;
-  

Fix failure in merge_block.c testcase

2021-11-21 Thread Jan Hubicka via Gcc-patches
Hi,
this testcase needs -fno-ipa-modref becuase otherwise it hits the issue
that complete loop unrolling leaves somewhat mismatched profile.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/testsuite/ChangeLog:

2021-11-21  Jan Hubicka  

PR ipa/103264
* gcc.dg/tree-prof/merge_block.c: Add -fno-ipa-modref

diff --git a/gcc/testsuite/gcc.dg/tree-prof/merge_block.c 
b/gcc/testsuite/gcc.dg/tree-prof/merge_block.c
index 5da5ddff6a0..e8a8873f152 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/merge_block.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/merge_block.c
@@ -1,5 +1,5 @@
 
-/* { dg-options "-O2 -fno-ipa-pure-const -fdump-tree-optimized-blocks-details 
-fno-early-inlining" } */
+/* { dg-options "-O2 -fno-ipa-pure-const -fdump-tree-optimized-blocks-details 
-fno-early-inlining -fno-ipa-modref" } */
 int a[8];
 int t()
 {


Improve byte-wise DSE (modref-dse-[45].c failures)

2021-11-22 Thread Jan Hubicka via Gcc-patches
Hi,
testcase modref-dse-4.c and modref-dse-5.c fails on some targets because they
depend on store merging.  What really happens is that without store merging
we produce for kill_me combined write that is ao_ref with offset=0, size=32
and max_size=96.  We have size != max_size becaue we do ont track the info that
all 3 writes must happen in a group and conider case only some of them are done.

This disables byte-wise DSE which checks that size == max_size.  This is
completely unnecesary for store being proved to be dead or load being checked
to not read live bytes.  It is only necessary for kill store that is used to
prove that given store is dead.

While looking into this I also noticed that we check that everything is byte
aligned.  This is also unnecessary and with access merging in modref may more
commonly fire on accesses that we could otherwise handle.

This patch fixes both also also changes interface to normalize_ref that I found
confusing since it modifies the ref. Instead of that we have get_byte_range
that is computing range in bytes (since that is what we need to maintain the
bitmap) and has additional parameter specifying if the store in question should
be turned into sub-range or super-range depending whether we compute range
for kill or load.

Bootstrapped/regtested x86_64-linux OK?
gcc/ChangeLog:

2021-11-23  Jan Hubicka  

* tree-ssa-dse.c (valid_ao_ref_for_dse): Rename to ...
(valid_ao_ref_kill_for_dse): ... this; do not check that boundaries
are divisible by BITS_PER_UNIT.
(get_byte_aligned_range_containing_ref): New function.
(get_byte_aligned_range_contained_in_ref): New function.
(normalize_ref): Rename to ...
(get_byte_range): ... this one; handle accesses not aligned to byte
boundary; return range in bytes rater than updating ao_ref.
(clear_live_bytes_for_ref): Take write ref by reference; simplify using
get_byte_access.
(setup_live_bytes_from_ref): Likewise.
(clear_bytes_written_by): Update.
(live_bytes_read): Update.
(dse_classify_store): Simplify tech before live_bytes_read checks.

gcc/testsuite/ChangeLog:

2021-11-23  Jan Hubicka  

* gcc.dg/tree-ssa/modref-dse-4.c: Update template.
* gcc.dg/tree-ssa/modref-dse-5.c: Update template.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-4.c
index 81aa7dc587c..19e91b00f15 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dse2-details"  } */
+/* { dg-options "-O2 -fdump-tree-dse1-details"  } */
 struct a {int a,b,c;};
 __attribute__ ((noinline))
 void
@@ -23,4 +23,4 @@ set (struct a *a)
   my_pleasure (a);
   a->b=1;
 }
-/* { dg-final { scan-tree-dump "Deleted dead store: kill_me" "dse2" } } */
+/* { dg-final { scan-tree-dump "Deleted dead store: kill_me" "dse1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-5.c
index ad35b70136f..dc2c2892615 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-dse2-details"  } */
+/* { dg-options "-O2 -fdump-tree-dse1-details"  } */
 struct a {int a,b,c;};
 __attribute__ ((noinline))
 void
@@ -36,8 +36,7 @@ set (struct a *a)
 {
   wrap (0, a);
   int ret = wrap2 (0, a);
-  //int ret = my_pleasure (a);
   a->b=1;
   return ret;
 }
-/* { dg-final { scan-tree-dump "Deleted dead store: wrap" "dse2" } } */
+/* { dg-final { scan-tree-dump "Deleted dead store: wrap" "dse1" } } */
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 9531d892f76..8717d654e5a 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -156,57 +156,137 @@ initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
 }
 
 /* Given REF from the alias oracle, return TRUE if it is a valid
-   memory reference for dead store elimination, false otherwise.
+   kill memory reference for dead store elimination, false otherwise.
 
In particular, the reference must have a known base, known maximum
size, start at a byte offset and have a size that is one or more
bytes.  */
 
 static bool
-valid_ao_ref_for_dse (ao_ref *ref)
+valid_ao_ref_kill_for_dse (ao_ref *ref)
 {
   return (ao_ref_base (ref)
  && known_size_p (ref->max_size)
  && maybe_ne (ref->size, 0)
  && known_eq (ref->max_size, ref->size)
- && known_ge (ref->offset, 0)
- && multiple_p (ref->offset, BITS_PER_UNIT)
- && multiple_p (ref->size, BITS_PER_UNIT));
+ && known_ge (ref->offset, 0));
+}
+
+/* Given REF from the alias oracle, return TRUE if it is a valid
+   load or store memory reference for dead store elimination, false otherwise.
+
+   Unlike for valid_ao_ref_kill_for_dse we can accept writes where max_size

Re: [PATCH] Fix incorrect loop exit edge probability [PR103270]

2021-11-23 Thread Jan Hubicka via Gcc-patches
> On Tue, Nov 23, 2021 at 6:52 AM Xionghu Luo  wrote:
> >
> > r12-4526 cancelled jump thread path rotates loop. It exposes a issue in
> > profile-estimate when predict_extra_loop_exits, outer loop's exit edge
> > is marked as inner loop's extra loop exit and set with incorrect
> > prediction, then a hot inner loop will become cold loop finally through
> > optimizations, this patch ignores the EDGE_DFS_BACK edge when searching
> > extra exit edges to avoid unexpected predict_edge.
> 
> Not sure how outer vs. inner loop exit correlates with EDGE_DFS_BACK,
> I have expected a check based on which loop is exited by the edge instead?
> A backedge should never be an exit, no?
> 
> Note that the profile pass does not yet mark backedges so EDGE_DFS_BACK
> settings are unreliable.

So we have two nested loops and an exit which goes from inner loop and
exists both loops.  While processing outer loop we set pretty high exit
probability that is not good for inner loop?

I guess we could just check if exit edge source basic block has same
loop depth as the loop we ar eprocesing?

Honza
> 
> Richard.
> 
> >
> > gcc/ChangeLog:
> >
> > PR middle-end/103270
> > * predict.c (predict_extra_loop_exits): Ignore EDGE_DFS_BACK edge.
> >
> > gcc/ChangeLog:
> >
> > PR middle-end/103270
> > * predict.c (predict_extra_loop_exits): New.
> > ---
> >  gcc/predict.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/gcc/predict.c b/gcc/predict.c
> > index 68b11135680..1ae8ccff72c 100644
> > --- a/gcc/predict.c
> > +++ b/gcc/predict.c
> > @@ -1910,6 +1910,10 @@ predict_extra_loop_exits (edge exit_edge)
> > continue;
> >if ((check_value_one ^ integer_onep (val)) == 1)
> > continue;
> > +#if 0
> > +  if (e->flags & EDGE_DFS_BACK)
> > +   continue;
> > +#endif
> >if (EDGE_COUNT (e->src->succs) != 1)
> > {
> >   predict_paths_leading_to_edge (e, PRED_LOOP_EXTRA_EXIT, 
> > NOT_TAKEN);
> > --
> > 2.25.1
> >


Re: [Bug tree-optimization/103168] Value numbering for PRE of pure functions can be improved

2021-11-22 Thread Jan Hubicka via Gcc-bugs
The patch passed testing on x86_64-linux.


Small tweak to modref pure/const discoverys

2021-11-20 Thread Jan Hubicka via Gcc-patches
Hi,
while looking into the PR I also improved debug output in ipa-modref and
fixed ignore_nondeterminism predicate: looping pures and cont are still
deterministic.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

2021-11-21  Jan Hubicka  

PR ipa/103052
* ipa-modref.c (ignore_nondeterminism_p): Allow looping pure/cont.
(merge_call_side_effects): Improve debug output.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 57e2aa5d868..20810c74da5 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -923,8 +923,7 @@ record_access_p (tree expr)
 static bool
 ignore_nondeterminism_p (tree caller, int flags)
 {
-  if ((flags & (ECF_CONST | ECF_PURE))
-  && !(flags & ECF_LOOPING_CONST_OR_PURE))
+  if (flags & (ECF_CONST | ECF_PURE))
 return true;
   if ((flags & (ECF_NORETURN | ECF_NOTHROW)) == (ECF_NORETURN | ECF_NOTHROW)
   || (!opt_for_fn (caller, flag_exceptions) && (flags & ECF_NORETURN)))
@@ -1016,6 +1015,10 @@ merge_call_side_effects (modref_summary *cur_summary,
   && !(flags & ECF_LOOPING_CONST_OR_PURE))
 return changed;
 
+  if (dump_file)
+fprintf (dump_file, " - Merging side effects of %s\n",
+callee_node->dump_name ());
+
   if (!(flags & (ECF_CONST | ECF_NOVOPS | ECF_PURE))
   || (flags & ECF_LOOPING_CONST_OR_PURE))
 {
@@ -1061,8 +1064,7 @@ merge_call_side_effects (modref_summary *cur_summary,
 }
 
   if (dump_file)
-fprintf (dump_file, " - Merging side effects of %s with parm map:",
-callee_node->dump_name ());
+fprintf (dump_file, "   Parm map:");
 
   parm_map.safe_grow_cleared (gimple_call_num_args (stmt), true);
   for (unsigned i = 0; i < gimple_call_num_args (stmt); i++)


Fix looping flag discovery in ipa-pure-const

2021-11-20 Thread Jan Hubicka via Gcc-patches
Hi,
The testcase shows situation where there is non-trivial cycle in the callgraph
involving a noreturn call.  This cycle is important for const function discovery
but not important for pure.  IPA pure const uses same strongly connected
components for both propagations which makes it to get suboptimal result
(does not detect the pure flag). However local pure const gets the situation
right becaue it processes functions in right order.  This hits rarely
executed code in propagate_pure_const that merge results with previously
known state that has long standing bug in it that makes it to throw away
the looping flag.

Bootstrapped/regtested x86_64-linux. Comitted.

gcc/ChangeLog:

2021-11-21  Jan Hubicka  

PR ipa/103052
* ipa-pure-const.c (propagate_pure_const): Fix merging of loping flag.

gcc/testsuite/ChangeLog:

2021-11-21  Jan Hubicka  

PR ipa/103052
* gcc.c-torture/execute/pr103052.c: New test.

diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index a332940b55d..fea8b08c4eb 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -1782,9 +1782,9 @@ propagate_pure_const (void)
  if (w_l->state_previously_known != IPA_NEITHER
  && this_state > w_l->state_previously_known)
{
-  this_state = w_l->state_previously_known;
  if (this_state == IPA_NEITHER)
-   this_looping = w_l->looping_previously_known;
+   this_looping = w_l->looping_previously_known;
+ this_state = w_l->state_previously_known;
}
  if (!this_looping && self_recursive_p (w))
this_looping = true;
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr103052.c 
b/gcc/testsuite/gcc.c-torture/execute/pr103052.c
new file mode 100644
index 000..bef8674a43c
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr103052.c
@@ -0,0 +1,35 @@
+static void js_error(void);
+static int top;
+static void js_throw(void)
+{
+   __builtin_exit(0);
+}
+
+// LOCATION A -- if js_pop is here, the bug is present
+static void js_pop(void)
+{
+   if (++top > 100)
+   js_error();
+}
+
+static void jsC_error(const char *v)
+{
+   if (v[0] == 0)
+   js_error();
+   js_throw();
+}
+static void checkfutureword(const char *exp)
+{
+   if (!__builtin_strcmp(exp, "const"))
+   jsC_error("boom");
+}
+static void js_error(void) {
+   checkfutureword("foo");
+   checkfutureword("bar");
+   js_pop();
+}
+int main(void)
+{
+   checkfutureword("const");
+   __builtin_abort ();
+}


Reduce size of modref_access_tree

2021-11-23 Thread Jan Hubicka via Gcc-patches
Hi,

Modref tree template stores its own copy of param_moderf_max_bases, *_max_refs
and *_max_accesses values.  This was done before we had per-function limits and
even back then it was bit dubious, so this patch removes it.

Bootstrapped/regtested x86_64-linux, will commit it shortly.
Honza

gcc/ChangeLog:

* ipa-modref-tree.h (struct modref_tree): Remove max_bases, max_refs
and max_accesses.
(modref_tree::modref_tree): Remove parametr.
(modref_tree::insert_base): Add max_bases parameter.
(modref_tree::insert): Add max_bases, max_refs, max_accesses
parameters.
(modref_tree::insert): New member function.
(modref_tree::merge): Add max_bases, max_refs, max_accesses
parameters.
(modref_tree::insert): New member function.
* ipa-modref-tree.c (test_insert_search_collapse): Update.
(test_merge): Update.
* ipa-modref.c (dump_records): Don't dump max_refs and max_bases.
(dump_lto_records): Likewise.
(modref_summary::finalize): Fix whitespace.
(get_modref_function_summary): Likewise.
(modref_access_analysis::record_access): Update.
(modref_access_analysis::record_access_lto): Update.
(modref_access_analysis::process_fnspec): Update.
(analyze_function): Update.
(modref_summaries::duplicate): Update.
(modref_summaries_lto::duplicate): Update.
(write_modref_records): Update.
(read_modref_records): Update.
(read_section): Update.
(propagate_unknown_call): Update.
(modref_propagate_in_scc): Update.
(ipa_merge_modref_summary_after_inlining): Update.

diff --git a/gcc/ipa-modref-tree.c b/gcc/ipa-modref-tree.c
index e23d88d7fc0..0671fa76199 100644
--- a/gcc/ipa-modref-tree.c
+++ b/gcc/ipa-modref-tree.c
@@ -874,11 +874,11 @@ test_insert_search_collapse ()
   modref_ref_node *ref_node;
   modref_access_node a = unspecified_modref_access_node;
 
-  modref_tree *t = new modref_tree(1, 2, 2);
+  modref_tree *t = new modref_tree();
   ASSERT_FALSE (t->every_base);
 
   /* Insert into an empty tree.  */
-  t->insert (1, 2, a, false);
+  t->insert (1, 2, 2, 1, 2, a, false);
   ASSERT_NE (t->bases, NULL);
   ASSERT_EQ (t->bases->length (), 1);
   ASSERT_FALSE (t->every_base);
@@ -896,7 +896,7 @@ test_insert_search_collapse ()
   ASSERT_EQ (ref_node->ref, 2);
 
   /* Insert when base exists but ref does not.  */
-  t->insert (1, 3, a, false);
+  t->insert (1, 2, 2, 1, 3, a, false);
   ASSERT_NE (t->bases, NULL);
   ASSERT_EQ (t->bases->length (), 1);
   ASSERT_EQ (t->search (1), base_node);
@@ -909,7 +909,7 @@ test_insert_search_collapse ()
 
   /* Insert when base and ref exist, but access is not dominated by nor
  dominates other accesses.  */
-  t->insert (1, 2, a, false);
+  t->insert (1, 2, 2, 1, 2, a, false);
   ASSERT_EQ (t->bases->length (), 1);
   ASSERT_EQ (t->search (1), base_node);
 
@@ -917,12 +917,12 @@ test_insert_search_collapse ()
   ASSERT_NE (ref_node, NULL);
 
   /* Insert when base and ref exist and access is dominated.  */
-  t->insert (1, 2, a, false);
+  t->insert (1, 2, 2, 1, 2, a, false);
   ASSERT_EQ (t->search (1), base_node);
   ASSERT_EQ (base_node->search (2), ref_node);
 
   /* Insert ref to trigger ref list collapse for base 1.  */
-  t->insert (1, 4, a, false);
+  t->insert (1, 2, 2, 1, 4, a, false);
   ASSERT_EQ (t->search (1), base_node);
   ASSERT_EQ (base_node->refs, NULL);
   ASSERT_EQ (base_node->search (2), NULL);
@@ -930,7 +930,7 @@ test_insert_search_collapse ()
   ASSERT_TRUE (base_node->every_ref);
 
   /* Further inserts to collapsed ref list are ignored.  */
-  t->insert (1, 5, a, false);
+  t->insert (1, 2, 2, 1, 5, a, false);
   ASSERT_EQ (t->search (1), base_node);
   ASSERT_EQ (base_node->refs, NULL);
   ASSERT_EQ (base_node->search (2), NULL);
@@ -938,13 +938,13 @@ test_insert_search_collapse ()
   ASSERT_TRUE (base_node->every_ref);
 
   /* Insert base to trigger base list collapse.  */
-  t->insert (5, 0, a, false);
+  t->insert (1, 2, 2, 5, 0, a, false);
   ASSERT_TRUE (t->every_base);
   ASSERT_EQ (t->bases, NULL);
   ASSERT_EQ (t->search (1), NULL);
 
   /* Further inserts to collapsed base list are ignored.  */
-  t->insert (7, 8, a, false);
+  t->insert (1, 2, 2, 7, 8, a, false);
   ASSERT_TRUE (t->every_base);
   ASSERT_EQ (t->bases, NULL);
   ASSERT_EQ (t->search (1), NULL);
@@ -959,23 +959,23 @@ test_merge ()
   modref_base_node *base_node;
   modref_access_node a = unspecified_modref_access_node;
 
-  t1 = new modref_tree(3, 4, 1);
-  t1->insert (1, 1, a, false);
-  t1->insert (1, 2, a, false);
-  t1->insert (1, 3, a, false);
-  t1->insert (2, 1, a, false);
-  t1->insert (3, 1, a, false);
-
-  t2 = new modref_tree(10, 10, 10);
-  t2->insert (1, 2, a, false);
-  t2->insert (1, 3, a, false);
-  t2->insert (1, 4, a, false);
-  t2->insert (3, 2, a, false);
-  t2->insert (3, 3, a, false);
-  t2->insert (3, 4, a, false);
-  t2->insert (3, 5, a, false);

Re: [Bug tree-optimization/103300] New: wrong code at -O3 on x86_64-linux-gnu

2021-11-17 Thread Jan Hubicka via Gcc-bugs
Needs -O2  -floop-unroll-and-jam   --param early-inlining-insns=14
to fail, so I guess it may be issue with unrol-and-jam.


Re: [PATCH] Fix IPA modref ubsan.

2021-11-18 Thread Jan Hubicka via Gcc-patches
> > > I don't know what the guidance is on using vec in IPA passes
> > > but with respect to existing practice elsewhere, there are
> > > existing uses of vec and auto_vec with non-POD types and vec
> > > does work with them  (see the vec_default_construct and
> > > vec_copy_construct templates, for example, whose goal is
> > > to support nontrivial classes).
> > 
> > I see, since 2017 :).  The patch is OK then.
> > Nontrivial destructors also behave in a sane way these days?
> 
> Good question :)
> 
> At a minimum, element dtors should be automatically invoked by
> the auto_vec dtor (there is an auto-test in vec.c to verify that).
> 
> Beyond that, since (unlike auto_vec) a plain vec isn't a container
> its users are on their own when it comes to managing the memory of
> their elements (i.e., they need to explicitly destroy their elements).
> 
> Having said that, as with all retrofits, they could be incomplete.
> I see a few examples of where that seems to be the case here:
> 
> Calling truncate() on a vec with notrivial elements leaks, so
> clients needs to explicitly release those elements.  That
> should happen automatically.
> 
> Going through vec, I also see calls to memmove in functions
> like quick_insert, ordered_remove, and block_remove.  So calling
> those functions is not safe on a vec with nontrivial types.
> 
> Calling any of the sort functions also may not work correctly
> with nontrivial elements (gcc_sort() calls memcpy).  vec should
> either prevent that buy refusing to compile or use a safe
> (generic) template for that.
> 
> So while basic vec uses work with nontrivial types, there are
> plenty of bugs :(

OK, sounds bit dangerous but the use here is very simple.
I remember the non-POD rule mostly from the original David's
implementation of modref that did put non-pods to vector and he took
really long while to work out why it breaks.  And yep, it was before
2017 :)
Honza
> 
> Martin
> 
> > 
> > Honza
> > > 
> > > Martin
> > > 
> > > > The diagnostics should be from
> > > > a.parm_offset_known &= m.parm_offset_known;
> > > > Becasue both in the parm_map (which is variable m) and access_node
> > > > (which is variable a) the parm_offset_known has no meaning when
> > > > parm_index == MODREF_UNKNOWN_PARM.
> > > > 
> > > > If we want to avoid computing on these, perhaps this will work?
> > > > 
> > > > diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
> > > > index 0a097349ebd..97736d0d8a4 100644
> > > > --- a/gcc/ipa-modref-tree.h
> > > > +++ b/gcc/ipa-modref-tree.h
> > > > @@ -568,9 +568,13 @@ struct GTY((user)) modref_tree
> > > >   : (*parm_map) [a.parm_index];
> > > > if (m.parm_index == 
> > > > MODREF_LOCAL_MEMORY_PARM)
> > > >   continue;
> > > > -   a.parm_offset += m.parm_offset;
> > > > -   a.parm_offset_known &= m.parm_offset_known;
> > > > a.parm_index = m.parm_index;
> > > > +   if (a.parm_index != MODREF_UNKNOWN_PARM)
> > > > + {
> > > > +   a.parm_offset_known &= 
> > > > m.parm_offset_known;
> > > > +   if (a.parm_offset_known)
> > > > + a.parm_offset += m.parm_offset;
> > > > + }
> > > >   }
> > > >   }
> > > > changed |= insert (base_node->base, ref_node->ref, 
> > > > a,
> > > > >  /* Index of parameter we translate to.
> > > > > Values from special_params enum are permitted too.  */
> > > > >  int parm_index;
> > > > > diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> > > > > index c94f0589d44..630d202d5cf 100644
> > > > > --- a/gcc/ipa-modref.c
> > > > > +++ b/gcc/ipa-modref.c
> > > > > @@ -5020,8 +5020,7 @@ ipa_merge_modref_summary_after_inlining 
> > > > > (cgraph_edge *edge)
> > > > >  auto_vec  parm_map;
> > > > >  modref_parm_map chain_map;
> > > > >  /* TODO: Once we get jump functions for static chains we 
> > > > > could
> > > > > -  compute this.  */
> > > > > -  chain_map.parm_index = MODREF_UNKNOWN_PARM;
> > > > > +  compute parm_index.  */
> > > > >  compute_parm_map (edge, _map);
> > > > > -- 
> > > > > 2.33.1
> > > > > 
> > > 
> 


Re: [PATCH] IPA: fix reproducibility in IPA MOD REF

2021-11-18 Thread Jan Hubicka via Gcc-patches
> > > 
> > > Isn't problem that the following code
> > > 
> > >past_flags.reserve_exact (summary->arg_flags.length ());
> > >past_flags.splice (summary->arg_flags);
> > >past_retslot_flags = summary->retslot_flags;
> > 
> > Aha, that makes sense. Sorry.  It used to be saved for dumping only
> > while we now use it to merge old summaries with new.  So it is indeed a
> > (quite stupid) bug.
> 
> Good :) Good. I thought I overlooked something.
Hehe, I overlooked a hunk while breaking the patches into more
independent changes.

I planed a cleanup of this code after pushing out the new features.
Pehraps a trivial parts of the cleanup can be done even in stage3.
Honza


Re: [PATCH] IPA: use cgraph_node instance

2021-11-18 Thread Jan Hubicka via Gcc-patches
> Hi.
> 
> This is a refactoring I noticed.
> 
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
>   * ipa-modref.c (analyze_function): Use fnode instead of repeated
>   cgraph_node::get (current_function_decl).
OK, thanks!
Honza


Re: [PATCH] Fix IPA modref ubsan.

2021-11-18 Thread Jan Hubicka via Gcc-patches
> On 11/18/21 5:41 AM, Jan Hubicka via Gcc-patches wrote:
> > > modref_tree::merge(modref_tree*, 
> > > vec*, modref_parm_map*, bool)
> > > 
> > > is called with modref_parm_map chain_map;
> > > 
> > > The variable has uninitialized m.parm_offset_known and it is accessed
> > > here:
> > > 
> > > gcc/ipa-modref-tree.h:572 a.parm_offset_known &= m.parm_offset_known;
> > > 
> > > Ready to be installed after testing?
> > > Thanks,
> > > Martin
> > > 
> > >   PR ipa/103230
> > > 
> > > gcc/ChangeLog:
> > > 
> > >   * ipa-modref-tree.h (struct modref_parm_map): Add default
> > >   constructor.
> > >   * ipa-modref.c (ipa_merge_modref_summary_after_inlining): Use it.
> > > ---
> > >   gcc/ipa-modref-tree.h | 5 +
> > >   gcc/ipa-modref.c  | 3 +--
> > >   2 files changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
> > > index 0a097349ebd..6796e6ecc34 100644
> > > --- a/gcc/ipa-modref-tree.h
> > > +++ b/gcc/ipa-modref-tree.h
> > > @@ -287,6 +287,11 @@ struct GTY((user)) modref_base_node
> > >   struct modref_parm_map
> > >   {
> > > +  /* Default constructor.  */
> > > +  modref_parm_map ()
> > > +  : parm_index (MODREF_UNKNOWN_PARM), parm_offset_known (false), 
> > > parm_offset ()
> > > +  {}
> > > +
> > I think we are generally not supposed to put non-pods to vec<..>
> 
> I don't know what the guidance is on using vec in IPA passes
> but with respect to existing practice elsewhere, there are
> existing uses of vec and auto_vec with non-POD types and vec
> does work with them  (see the vec_default_construct and
> vec_copy_construct templates, for example, whose goal is
> to support nontrivial classes).

I see, since 2017 :).  The patch is OK then.
Nontrivial destructors also behave in a sane way these days?

Honza
> 
> Martin
> 
> > The diagnostics should be from
> > a.parm_offset_known &= m.parm_offset_known;
> > Becasue both in the parm_map (which is variable m) and access_node
> > (which is variable a) the parm_offset_known has no meaning when
> > parm_index == MODREF_UNKNOWN_PARM.
> > 
> > If we want to avoid computing on these, perhaps this will work?
> > 
> > diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
> > index 0a097349ebd..97736d0d8a4 100644
> > --- a/gcc/ipa-modref-tree.h
> > +++ b/gcc/ipa-modref-tree.h
> > @@ -568,9 +568,13 @@ struct GTY((user)) modref_tree
> >   : (*parm_map) [a.parm_index];
> > if (m.parm_index == MODREF_LOCAL_MEMORY_PARM)
> >   continue;
> > -   a.parm_offset += m.parm_offset;
> > -   a.parm_offset_known &= m.parm_offset_known;
> > a.parm_index = m.parm_index;
> > +   if (a.parm_index != MODREF_UNKNOWN_PARM)
> > + {
> > +   a.parm_offset_known &= m.parm_offset_known;
> > +   if (a.parm_offset_known)
> > + a.parm_offset += m.parm_offset;
> > + }
> >   }
> >   }
> > changed |= insert (base_node->base, ref_node->ref, a,
> > > /* Index of parameter we translate to.
> > >Values from special_params enum are permitted too.  */
> > > int parm_index;
> > > diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> > > index c94f0589d44..630d202d5cf 100644
> > > --- a/gcc/ipa-modref.c
> > > +++ b/gcc/ipa-modref.c
> > > @@ -5020,8 +5020,7 @@ ipa_merge_modref_summary_after_inlining 
> > > (cgraph_edge *edge)
> > > auto_vec  parm_map;
> > > modref_parm_map chain_map;
> > > /* TODO: Once we get jump functions for static chains we could
> > > -  compute this.  */
> > > -  chain_map.parm_index = MODREF_UNKNOWN_PARM;
> > > +  compute parm_index.  */
> > > compute_parm_map (edge, _map);
> > > -- 
> > > 2.33.1
> > > 
> 


Re: [PATCH] IPA: fix reproducibility in IPA MOD REF

2021-11-18 Thread Jan Hubicka via Gcc-patches
> On 11/18/21 19:22, Jan Hubicka wrote:
> > > Supported LTO compression algorithms: zlib zstd
> > > gcc version 12.0.0 2028 (experimental) (GCC)
> > > /usr/bin/ld: ./xxx.ltrans0.ltrans.o: warning: relocation against 
> > > `lm_read_ctl_dict_size_n_lmclass_used' in read-only section `.text'
> > > /usr/bin/ld: ./xxx.ltrans0.ltrans.o: relocation R_X86_64_PC32 against 
> > > symbol `__ckd_calloc___elem_size' can not be used when making a shared 
> > > object; recompile with -fPIC
> > > /usr/bin/ld: final link failed: bad value
> > > collect2: error: ld returned 1 exit status
> > > /usr/bin/ld: ./yyy.ltrans0.ltrans.o: warning: relocation against 
> > > `__ckd_calloc___n_elem' in read-only section `.text'
> > > /usr/bin/ld: ./yyy.ltrans0.ltrans.o: relocation R_X86_64_PC32 against 
> > > symbol `__ckd_calloc___elem_size' can not be used when making a shared 
> > > object; recompile with -fPIC
> > > /usr/bin/ld: final link failed: bad value
> > > collect2: error: ld returned 1 exit status
> > > diff: yyy.ltrans0.ltrans*optimized: No such file or directory
> > > 54,55d53
> > > < movslq  lm_read_ctl_dict_size_n_lmclass_used(%rip), %rax
> > > < movl$0, 0(%rbp,%rax,4)
> > > 
> > > I tracked that it differs in tree DSE dump.
> > > 
> > > May I install the patch?
> > 
> > No, I think it is bug of symbol_summary that get is causing a
> > difference.
> 
> Isn't problem that the following code
> 
>   past_flags.reserve_exact (summary->arg_flags.length ());
>   past_flags.splice (summary->arg_flags);
>   past_retslot_flags = summary->retslot_flags;

Aha, that makes sense. Sorry.  It used to be saved for dumping only
while we now use it to merge old summaries with new.  So it is indeed a
(quite stupid) bug.

The patch is OK. Thanks for finding it.
Honza


Re: [PATCH] Fix rs6000 predicates.md use of decl_replaceable_p

2021-11-18 Thread Jan Hubicka via Gcc-patches


> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -1086,7 +1086,9 @@ (define_predicate "current_file_function_operand"
> (match_test "(DEFAULT_ABI != ABI_AIX || SYMBOL_REF_FUNCTION_P (op))
> && (SYMBOL_REF_LOCAL_P (op)
> || (op == XEXP (DECL_RTL (current_function_decl), 0)
> -   && !decl_replaceable_p (current_function_decl)))
> +   && !decl_replaceable_p (current_function_decl,
> +   opt_for_fn
> (current_function_decl,
> +
> flag_semantic_interposition

Thanks, missed the use of the predicate here.
However it is not clear to me why one would care about semantic
interposition at this level.  It seems to me that one more cares whether
the symbol must be always resolved locally.  In this case
cgraph_node::get (current_function_decl)->binds_to_current_def_p 
(cgraph_node::get (current_function_decl))
would give right answer (and work for cases like functions in comdat groups)

Honza
> && !((DEFAULT_ABI == ABI_AIX
>   || DEFAULT_ABI == ABI_ELFv2)
>  && (SYMBOL_REF_EXTERNAL_P (op)


Re: [PATCH] IPA: fix reproducibility in IPA MOD REF

2021-11-18 Thread Jan Hubicka via Gcc-patches
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
> Ready to be installed?
> Thanks,
> Martin
> 
> gcc/ChangeLog:
> 
>   * ipa-modref.c (analyze_function): Do not execute the code
>   only if dump_file != NULL.
> ---
>  gcc/ipa-modref.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> index a3c7c6d6a1f..6cacf9c8ab1 100644
> --- a/gcc/ipa-modref.c
> +++ b/gcc/ipa-modref.c
> @@ -2868,15 +2868,15 @@ analyze_function (function *f, bool ipa)
>   optimization_summaries = modref_summaries::create_ggc (symtab);
>else /* Remove existing summary if we are re-running the pass.  */
>   {
> -   if (dump_file
> -   && (summary
> -   = optimization_summaries->get (fnode))
> -  != NULL
> +   summary = optimization_summaries->get (fnode);
> +   if (summary != NULL
How does this affect reproducibility?

Honza


Re: [PATCH] IPA: fix reproducibility in IPA MOD REF

2021-11-18 Thread Jan Hubicka via Gcc-patches
> Supported LTO compression algorithms: zlib zstd
> gcc version 12.0.0 2028 (experimental) (GCC)
> /usr/bin/ld: ./xxx.ltrans0.ltrans.o: warning: relocation against 
> `lm_read_ctl_dict_size_n_lmclass_used' in read-only section `.text'
> /usr/bin/ld: ./xxx.ltrans0.ltrans.o: relocation R_X86_64_PC32 against symbol 
> `__ckd_calloc___elem_size' can not be used when making a shared object; 
> recompile with -fPIC
> /usr/bin/ld: final link failed: bad value
> collect2: error: ld returned 1 exit status
> /usr/bin/ld: ./yyy.ltrans0.ltrans.o: warning: relocation against 
> `__ckd_calloc___n_elem' in read-only section `.text'
> /usr/bin/ld: ./yyy.ltrans0.ltrans.o: relocation R_X86_64_PC32 against symbol 
> `__ckd_calloc___elem_size' can not be used when making a shared object; 
> recompile with -fPIC
> /usr/bin/ld: final link failed: bad value
> collect2: error: ld returned 1 exit status
> diff: yyy.ltrans0.ltrans*optimized: No such file or directory
> 54,55d53
> < movslq  lm_read_ctl_dict_size_n_lmclass_used(%rip), %rax
> < movl$0, 0(%rbp,%rax,4)
> 
> I tracked that it differs in tree DSE dump.
> 
> May I install the patch?

No, I think it is bug of symbol_summary that get is causing a
difference.  It should be pure function. I think it is:
  /* Getter for summary callgraph node pointer.  */
  T* get (cgraph_node *node) ATTRIBUTE_PURE
{
   return exists (node) ? (*m_vector)[node->get_summary_id ()] : NULL;
}
It should not be using get_summary_id (which allocates it for no good
reaosn) and simply check that summary_id is non-negative.

Still I wonder how this can make code different - that looks like
another bug somewhere.

Honza


Re: [PATCH] Fix rs6000 predicates.md use of decl_replaceable_p

2021-11-18 Thread Jan Hubicka via Gcc-patches
> On Thu, Nov 18, 2021 at 2:07 PM Jan Hubicka  wrote:
> >
> > > --- a/gcc/config/rs6000/predicates.md
> > > +++ b/gcc/config/rs6000/predicates.md
> > > @@ -1086,7 +1086,9 @@ (define_predicate "current_file_function_operand"
> > > (match_test "(DEFAULT_ABI != ABI_AIX || SYMBOL_REF_FUNCTION_P 
> > > (op))
> > > && (SYMBOL_REF_LOCAL_P (op)
> > > || (op == XEXP (DECL_RTL (current_function_decl), 
> > > 0)
> > > -   && !decl_replaceable_p 
> > > (current_function_decl)))
> > > +   && !decl_replaceable_p (current_function_decl,
> > > +   opt_for_fn
> > > (current_function_decl,
> > > +
> > > flag_semantic_interposition
> >
> > Thanks, missed the use of the predicate here.
> > However it is not clear to me why one would care about semantic
> > interposition at this level.  It seems to me that one more cares whether
> > the symbol must be always resolved locally.  In this case
> > cgraph_node::get (current_function_decl)->binds_to_current_def_p 
> > (cgraph_node::get (current_function_decl))
> > would give right answer (and work for cases like functions in comdat groups)
> 
> Hi, Honza
> 
> I was trying to fix bootstrap as quickly as possible and used the best
> example that I could find.  It definitely can be refined.

Sure, having bootstrap working is good - sorry for missing update here.
> 
> Thanks for suggesting a better design. I'll test it.

It really depends what current_file_function_operand means.  If it is
supposed to check that it is a function that is defined in current file
and always will bind to it, then node->binds_to_current_def_p is good
a right thing to test and you don't need to restrict it to current
function decl.  

 node->binds_to_current_def_p (node2)
returns true if refernces from node2 to node will always bind to the
deifnition you are seeing. So
 - node2 is cgraph_node::get (current_function_ecl)
 - node is the symbol OP refers to
   (accessible by cgraph_node::get (SYMBOL_REF_DECL (op)) I think but
you will likely need to watch for possible NULLs)

Note that with LTO this will rturn true even if node2 is in different
partition (and thus different object file). If you do not want that you
need to additionally check
 && !node->in_other_partition.

Honza
> 
> Thanks, David
> 
> >
> > Honza
> > > && !((DEFAULT_ABI == ABI_AIX
> > >   || DEFAULT_ABI == ABI_ELFv2)
> > >  && (SYMBOL_REF_EXTERNAL_P (op)


Re: [PATCH] options: Make -Ofast switch off -fsemantic-interposition

2021-11-18 Thread Jan Hubicka via Gcc-patches
> Hi,
> 
> On Fri, Nov 12 2021, Martin Jambor wrote:
> > Hi,
> >
> > using -fno-semantic-interposition has been reported by various people
> > to bring about considerable speed up at the cost of strict compliance
> > to the ELF symbol interposition rules  See for example
> > https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
> >
> > As such I believe it should be implied by our -Ofast optimization
> > level, not only so that benchmarks that can benefit run faster, but
> > also so that people looking at -Ofast documentation for options that
> > could speed their programs find it.
> >
> > I have verified that with the following patch IPA-CP sees
> > flag_semantic_interposition set to zero at Ofast and that info and pdf
> > manual builds fine with the documentation change.  I am bootstrapping
> > and testing it now in order to comply with submission criteria but I
> > don't think an Ofast change gets much tested.
> >
> > Assuming it passes, is the patch OK?  (If it is, I will also add a note
> > about it in the "Caveats" section in gcc-12/changes.html of wwwdocs
> > after I commit the patch.)
> >
> 
> Unfortunately, I was wrong, there are testcases which use the optimize
> attribute to switch a function to Ofast and those ICE because
> -fsemantic-interposition is not an optimization flag and only
> optimization flags can change in an optimize attribute (AFAIK, I only
> had a quick glance at the results).
> 
> I am not sure what is the right way to tackle this, whether to set the
> flag at Ofast in some nonstandard way or make the flag an optimization
> flag - probably affecting function definitions, having it affect
> call-sites seems too fine-grained.  I will try to discuss this on IRC on
> Monday (and hope such change is still doable early stage3).
> 
> Sorry for posting this a bit prematurely,

Hi,

This patch turns flag_semantic_interposition to optimization option so
it can be enabled with per-function granuality.  This is done by adding
the flag among visibility flags into the symbol table.  This fixes the
behaviour on the testcase I added to testsuite.

There are bugs where get_availability misbehaves on partitioned program.
We can also use the new flag to fix those, but I will do that
incrementally.

The -Ofast change should be safe now.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

2021-11-18  Jan Hubicka  

* cgraph.c (cgraph_node::get_availability): Update call of
decl_replaceable_p.
(cgraph_node::verify_node): Verify that semantic_interposition flag
is set correclty.
* cgraph.h: (symtab_node): Add semantic_interposition flag.
* cgraphclones.c (set_new_clone_decl_and_node_flags): Clear
semantic_interposition flag.
* cgraphunit.c (cgraph_node::finalize_function): Set
semantic_interposition flag.
(cgraph_node::add_new_function): Likewise.
(varpool_node::finalize_decl): Likewise.
(cgraph_node::create_wrapper): Likewise.
* common.opt (fsemantic-interposition): Turn to optimization node.
* lto-cgraph.c (lto_output_node): Stream semantic_interposition.
(lto_output_varpool_node): Likewise.
(input_overwrite_node): Likewise.
(input_varpool_node): Likewise.
* symtab.c (symtab_node::dump_base):
* varasm.c (decl_replaceable_p): Add semantic_interposition_p
parameter.
* varasm.h (decl_replaceable_p): Update declaration.
* varpool.c (varpool_node::ctor_useable_for_folding_p):
Use semantic_interposition flag.
(varpool_node::get_availability): Likewise.
(varpool_node::create_alias): Copy semantic_interposition flag.

gcc/cp/ChangeLog:

2021-11-18  Jan Hubicka  

* decl.c (finish_function): Update use of decl_replaceable_p.

gcc/lto/ChangeLog:

2021-11-18  Jan Hubicka  

* lto-partition.c (promote_symbol): Clear semantic_interposition flag.

gcc/testsuite/ChangeLog:

2021-11-18  Jan Hubicka  

* gcc.dg/lto/semantic-interposition-1_0.c: New test.
* gcc.dg/lto/semantic-interposition-1_1.c: New test.

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 466b66d5ba5..8e7c12642ad 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2390,7 +2390,8 @@ cgraph_node::get_availability (symtab_node *ref)
  to code cp_cannot_inline_tree_fn and probably shall be shared and
  the inlinability hooks completely eliminated).  */
 
-  else if (decl_replaceable_p (decl) && !DECL_EXTERNAL (decl))
+  else if (decl_replaceable_p (decl, semantic_interposition)
+  && !DECL_EXTERNAL (decl))
 avail = AVAIL_INTERPOSABLE;
   else avail = AVAIL_AVAILABLE;
 
@@ -3486,6 +3487,13 @@ cgraph_node::verify_node (void)
 "returns a pointer");
   error_found = true;
 }
+  if (definition && externally_visible
+  && semantic_interposition
+!= opt_for_fn (decl, flag_semantic_interposition))
+{
+  error ("semantic interposition 

Finish lto parts of kill analysis

2021-11-17 Thread Jan Hubicka via Gcc-patches
Hi,
this patch adds the IPA part of modref kill analysis.  It just copies of
what local code did alrady.  I did not manage to push out all patches
for modref I planned and I will wait for next stage1.  This one however
I would like to push since it is quite simple and it makes no sense to
leave the ipa bits being collected but unused.

Bootstrapped/regtested x86_64-linux. I will commit it tonight if there
are no complains.

Honza

gcc/ChangeLog:

2021-11-17  Jan Hubicka  

* ipa-modref-tree.c: Include cgraph.h and tree-streamer.h.
(modref_access_node::stream_out): New member function.
(modref_access_node::stream_in): New member function.
* ipa-modref-tree.h (modref_access_node::stream_out,
modref_access_node::stream_in): Declare.
* ipa-modref.c (modref_summary_lto::useful_p): Free useless kills.
(modref_summary_lto::dump): Dump kills.
(analyze_store): Record kills for LTO
(analyze_stmt): Likewise.
(modref_summaries_lto::duplicate): Duplicate kills.
(write_modref_records): Use new stream_out member function.
(read_modref_records): Likewise.
(modref_write): Stream out kills.
(read_section): Stream in kills
(remap_kills): New function.
(update_signature): Use it.

diff --git a/gcc/ipa-modref-tree.c b/gcc/ipa-modref-tree.c
index bbe23a5a211..ece42ade225 100644
--- a/gcc/ipa-modref-tree.c
+++ b/gcc/ipa-modref-tree.c
@@ -27,6 +27,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "selftest.h"
 #include "tree-ssa-alias.h"
 #include "gimple.h"
+#include "cgraph.h"
+#include "tree-streamer.h"
 
 /* Return true if both accesses are the same.  */
 bool
@@ -458,6 +460,50 @@ modref_access_node::try_merge_with (vec 
 *,
   i++;
 }
 
+/* Stream out to OB.  */
+
+void
+modref_access_node::stream_out (struct output_block *ob) const
+{
+  streamer_write_hwi (ob, parm_index);
+  if (parm_index != -1)
+{
+  streamer_write_uhwi (ob, parm_offset_known);
+  if (parm_offset_known)
+   {
+ streamer_write_poly_int64 (ob, parm_offset);
+ streamer_write_poly_int64 (ob, offset);
+ streamer_write_poly_int64 (ob, size);
+ streamer_write_poly_int64 (ob, max_size);
+   }
+}
+}
+
+modref_access_node
+modref_access_node::stream_in (struct lto_input_block *ib)
+{
+  int parm_index = streamer_read_hwi (ib);
+  bool parm_offset_known = false;
+  poly_int64 parm_offset = 0;
+  poly_int64 offset = 0;
+  poly_int64 size = -1;
+  poly_int64 max_size = -1;
+
+  if (parm_index != -1)
+{
+  parm_offset_known = streamer_read_uhwi (ib);
+  if (parm_offset_known)
+   {
+ parm_offset = streamer_read_poly_int64 (ib);
+ offset = streamer_read_poly_int64 (ib);
+ size = streamer_read_poly_int64 (ib);
+ max_size = streamer_read_poly_int64 (ib);
+   }
+}
+  return {offset, size, max_size, parm_offset, parm_index,
+ parm_offset_known, false};
+}
+
 /* Insert access with OFFSET and SIZE.
Collapse tree if it has more than MAX_ACCESSES entries.
If RECORD_ADJUSTMENTs is true avoid too many interval extensions.
diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index 1bf2aa8460e..0a097349ebd 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -99,6 +99,10 @@ struct GTY(()) modref_access_node
   tree get_call_arg (const gcall *stmt) const;
   /* Build ao_ref corresponding to the access and return true if succesful.  */
   bool get_ao_ref (const gcall *stmt, class ao_ref *ref) const;
+  /* Stream access to OB.  */
+  void stream_out (struct output_block *ob) const;
+  /* Stream access in from IB.  */
+  static modref_access_node stream_in (struct lto_input_block *ib);
   /* Insert A into vector ACCESSES.  Limit size of vector to MAX_ACCESSES and
  if RECORD_ADJUSTMENT is true keep track of adjustment counts.
  Return 0 if nothing changed, 1 is insertion suceeded and -1 if failed.  */
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 90cd1be764c..9ceecdd479f 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -410,6 +410,8 @@ modref_summary_lto::useful_p (int ecf_flags, bool 
check_flags)
&& (ecf_flags & ECF_LOOPING_CONST_OR_PURE));
   if (loads && !loads->every_base)
 return true;
+  else
+kills.release ();
   if (ecf_flags & ECF_PURE)
 return ((!side_effects || !nondeterministic)
&& (ecf_flags & ECF_LOOPING_CONST_OR_PURE));
@@ -634,6 +636,15 @@ modref_summary_lto::dump (FILE *out)
   dump_lto_records (loads, out);
   fprintf (out, "  stores:\n");
   dump_lto_records (stores, out);
+  if (kills.length ())
+{
+  fprintf (out, "  kills:\n");
+  for (auto kill : kills)
+   {
+ fprintf (out, "");
+ kill.dump (out);
+   }
+}
   if (writes_errno)
 fprintf (out, "  Writes errno\n");
   if (side_effects)
@@ -1527,15 +1538,17 @@ analyze_store (gimple *stmt, tree, tree op, void *data)
  

Fix gamess miscompare

2021-11-17 Thread Jan Hubicka via Gcc-patches
Hi,
this patch fixes bug in streaming in modref access tree that now cause a failure
of gamess benchmark.  The bug is quite old (present in GCC11 release) but it
needs quite interesting series of events to manifest. In particular
 1) At lto time ISRA turns some parameters passed by reference to scalar
 2) At lto time modref computes summaries for old parameters and then updates
them but does so quite stupidly believing that the load from parameters
are now unkonwn loads (rather than optimized out).
This renders summary not very useful since it thinks every memory aliasing
int is now accssed (as opposed as parameter dereference)
 3) At stream in we notice too early that summary is useless, set every_access
flag and drop the list.  However while reading rest of the summary we
overwrite the flag back to 0 which makes us to lose part of summary.
 4) right selection of partitions needs to be done to avoid late modref from
recalculating and thus fixing the summary.

This patch fixes the stream in bug, however we also should fix updating of
summaries.  Martin, would be possible to extend get_original_index by "deref"
parameter that would be set to true when refernce was turned to scalar?

Bootstrapped/regtested x86_64-linux. Comitted.

gcc/ChangeLog:

2021-11-17  Jan Hubicka  

PR ipa/103246
* ipa-modref.c (read_modref_records): Fix streaminig in of every_access
flag.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 9ceecdd479f..c94f0589d44 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -3460,10 +3460,10 @@ read_modref_records (lto_input_block *ib, struct 
data_in *data_in,
  size_t every_access = streamer_read_uhwi (ib);
  size_t naccesses = streamer_read_uhwi (ib);
 
- if (nolto_ref_node)
-   nolto_ref_node->every_access = every_access;
- if (lto_ref_node)
-   lto_ref_node->every_access = every_access;
+ if (nolto_ref_node && every_access)
+   nolto_ref_node->collapse ();
+ if (lto_ref_node && every_access)
+   lto_ref_node->collapse ();
 
  for (size_t k = 0; k < naccesses; k++)
{


Re: [PATCH] options: Make -Ofast switch off -fsemantic-interposition

2021-11-19 Thread Jan Hubicka via Gcc-patches
> > Hi,
> > 
> > On Fri, Nov 12 2021, Martin Jambor wrote:
> > > Hi,
> > >
> > > using -fno-semantic-interposition has been reported by various people
> > > to bring about considerable speed up at the cost of strict compliance
> > > to the ELF symbol interposition rules  See for example
> > > https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
> > >
> > > As such I believe it should be implied by our -Ofast optimization
> > > level, not only so that benchmarks that can benefit run faster, but
> > > also so that people looking at -Ofast documentation for options that
> > > could speed their programs find it.
> > >
> > > I have verified that with the following patch IPA-CP sees
> > > flag_semantic_interposition set to zero at Ofast and that info and pdf
> > > manual builds fine with the documentation change.  I am bootstrapping
> > > and testing it now in order to comply with submission criteria but I
> > > don't think an Ofast change gets much tested.
> > >
> > > Assuming it passes, is the patch OK?  (If it is, I will also add a note
> > > about it in the "Caveats" section in gcc-12/changes.html of wwwdocs
> > > after I commit the patch.)
> > >
> > 
> > Unfortunately, I was wrong, there are testcases which use the optimize
> > attribute to switch a function to Ofast and those ICE because
> > -fsemantic-interposition is not an optimization flag and only
> > optimization flags can change in an optimize attribute (AFAIK, I only
> > had a quick glance at the results).
> > 
> > I am not sure what is the right way to tackle this, whether to set the
> > flag at Ofast in some nonstandard way or make the flag an optimization
> > flag - probably affecting function definitions, having it affect
> > call-sites seems too fine-grained.  I will try to discuss this on IRC on
> > Monday (and hope such change is still doable early stage3).
> > 
> > Sorry for posting this a bit prematurely,
> 
> Hi,
> 
> This patch turns flag_semantic_interposition to optimization option so
> it can be enabled with per-function granuality.  This is done by adding
> the flag among visibility flags into the symbol table.  This fixes the
> behaviour on the testcase I added to testsuite.
> 
> There are bugs where get_availability misbehaves on partitioned program.
> We can also use the new flag to fix those, but I will do that
> incrementally.
> 
> The -Ofast change should be safe now.

Also forgot to say it explicitly, the patch is OK, so please commit it.

Honza


Remove gimple_static_chain test disabling modref in ref_maybe_used_in_call_p

2021-11-19 Thread Jan Hubicka via Gcc-patches
Hi,
this patch removes test for function not having call chain guarding
modref use in ref_maybe_used_by_call_p_1.  It never made sense since
modref treats call chain accesses explicitly. It was however copied from
earlier check for ECF_CONST (which seems dubious too, but I would like
to discuss it independelty).

This enables us to detect that memory pointed to static chain (or parts of it)
are unused by the function.

lto-bootstrapped-regtested all lanugages on x86_64-linux. OK?

gcc/ChangeLog:

2021-11-19  Jan Hubicka  

* tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Do not guard modref
by !gimple_call_chain.

gcc/testsuite/ChangeLog:

2021-11-19  Jan Hubicka  

* gcc.dg/tree-ssa/modref-dse-6.c: New test.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-6.c
new file mode 100644
index 000..d1e45a893ad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-6.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized"  } */
+int
+main()
+{
+  int a,b;
+  __attribute__ ((noinline))
+  void kill_me()
+  {
+a=1234;
+b=2234;
+  }
+  a=0;
+  b=1234;
+  __attribute__ ((noinline))
+  int reta()
+  {
+return a;
+  }
+  return reta();
+}
+/* { dg-final { scan-tree-dump-not "kill_me" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "1234" "optimized" } } */
diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index 02bbc87b597..cd6a0b2f67b 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -2755,7 +2755,7 @@ ref_maybe_used_by_call_p_1 (gcall *call, ao_ref *ref, 
bool tbaa_p)
 
   callee = gimple_call_fndecl (call);
 
-  if (!gimple_call_chain (call) && callee != NULL_TREE)
+  if (callee != NULL_TREE)
 {
   struct cgraph_node *node = cgraph_node::get (callee);
   /* We can not safely optimize based on summary of calle if it does


Fix some side cases of side effects analysis

2021-11-11 Thread Jan Hubicka via Gcc-patches
Hi,
I wrote script comparing modref pure/const discovery with ipa-pure-const
and found mistakes on both ends.  I fixed ipa-pure-const in previous two
patches.

This plugs the case where modref was too optimistic in handling looping
pure consts which were previously missed due to early exits on ECF_CONST
| ECF_PURE.  Those early exists are bit anoying and I think as a cleanup
I may just drop some of them as premature optimizations coming from time
modref was very simplistic on what it propagates.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  

* ipa-modref.c (modref_summary::useful_p): Check also for side-effects
with looping const/pure.
(modref_summary_lto::useful_p): Likewise.
(merge_call_side_effects): Merge side effects before early exit
for pure/const.
(process_fnspec): Also handle pure functions.
(analyze_call): Do not early exit on looping pure const.
(propagate_unknown_call): Also handle nontrivial SCC as side-effect.
(modref_propagate_in_scc):

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index f8b7b900527..45b391a565e 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -331,11 +331,11 @@ modref_summary::useful_p (int ecf_flags, bool check_flags)
   && remove_useless_eaf_flags (static_chain_flags, ecf_flags, false))
 return true;
   if (ecf_flags & (ECF_CONST | ECF_NOVOPS))
-return false;
+return (!side_effects && (ecf_flags & ECF_LOOPING_CONST_OR_PURE));
   if (loads && !loads->every_base)
 return true;
   if (ecf_flags & ECF_PURE)
-return false;
+return (!side_effects && (ecf_flags & ECF_LOOPING_CONST_OR_PURE));
   return stores && !stores->every_base;
 }
 
@@ -416,11 +416,11 @@ modref_summary_lto::useful_p (int ecf_flags, bool 
check_flags)
   && remove_useless_eaf_flags (static_chain_flags, ecf_flags, false))
 return true;
   if (ecf_flags & (ECF_CONST | ECF_NOVOPS))
-return false;
+return (!side_effects && (ecf_flags & ECF_LOOPING_CONST_OR_PURE));
   if (loads && !loads->every_base)
 return true;
   if (ecf_flags & ECF_PURE)
-return false;
+return (!side_effects && (ecf_flags & ECF_LOOPING_CONST_OR_PURE));
   return stores && !stores->every_base;
 }
 
@@ -925,6 +925,18 @@ merge_call_side_effects (modref_summary *cur_summary,
   auto_vec  parm_map;
   modref_parm_map chain_map;
   bool changed = false;
+  int flags = gimple_call_flags (stmt);
+
+  if (!cur_summary->side_effects && callee_summary->side_effects)
+{
+  if (dump_file)
+   fprintf (dump_file, " - merging side effects.\n");
+  cur_summary->side_effects = true;
+  changed = true;
+}
+
+  if (flags & (ECF_CONST | ECF_NOVOPS))
+return changed;
 
   /* We can not safely optimize based on summary of callee if it does
  not always bind to current def: it is possible that memory load
@@ -988,12 +1000,6 @@ merge_call_side_effects (modref_summary *cur_summary,
  changed = true;
}
 }
-  if (!cur_summary->side_effects
-  && callee_summary->side_effects)
-{
-  cur_summary->side_effects = true;
-  changed = true;
-}
   return changed;
 }
 
@@ -1091,7 +1097,7 @@ process_fnspec (modref_summary *cur_summary,
   attr_fnspec fnspec = gimple_call_fnspec (call);
   int flags = gimple_call_flags (call);
 
-  if (!(flags & (ECF_CONST | ECF_NOVOPS))
+  if (!(flags & (ECF_CONST | ECF_NOVOPS | ECF_PURE))
   || (flags & ECF_LOOPING_CONST_OR_PURE)
   || (cfun->can_throw_non_call_exceptions
  && stmt_could_throw_p (cfun, call)))
@@ -1101,6 +1107,8 @@ process_fnspec (modref_summary *cur_summary,
   if (cur_summary_lto)
cur_summary_lto->side_effects = true;
 }
+  if (flags & (ECF_CONST | ECF_NOVOPS))
+return true;
   if (!fnspec.known_p ())
 {
   if (dump_file && gimple_call_builtin_p (call, BUILT_IN_NORMAL))
@@ -1203,7 +1211,8 @@ analyze_call (modref_summary *cur_summary, 
modref_summary_lto *cur_summary_lto,
   /* Check flags on the function call.  In certain cases, analysis can be
  simplified.  */
   int flags = gimple_call_flags (stmt);
-  if (flags & (ECF_CONST | ECF_NOVOPS))
+  if ((flags & (ECF_CONST | ECF_NOVOPS))
+  && !(flags & ECF_LOOPING_CONST_OR_PURE))
 {
   if (dump_file)
fprintf (dump_file,
@@ -3963,7 +3972,8 @@ static bool
 propagate_unknown_call (cgraph_node *node,
cgraph_edge *e, int ecf_flags,
modref_summary *cur_summary,
-   modref_summary_lto *cur_summary_lto)
+   modref_summary_lto *cur_summary_lto,
+   bool nontrivial_scc)
 {
   bool changed = false;
   class fnspec_summary *fnspec_sum = fnspec_summaries->get (e);
@@ -3973,12 +3983,12 @@ propagate_unknown_call (cgraph_node *node,
   if (e->callee
   && builtin_safe_for_const_function_p (, e->callee->decl))
 {
-  if (cur_summary && 

Re: Fix recursion discovery in ipa-pure-const

2021-11-11 Thread Jan Hubicka via Gcc-patches
> On Thu, Nov 11, 2021 at 2:41 PM Jan Hubicka via Gcc-patches
>  wrote:
> >
> > Hi,
> > We make self recursive functions as looping of fear of endless recursion.
> > This is done correctly for local pure/const and for non-trivial SCCs in
> > callgraph, but for trivial SCCs we miss the flag.
> >
> > I think it is bad decision since infinite recursion will run out of stack,
> 
> Note it might not always in case we can eliminate the tail-recursion or avoid
> stack use by the recursion by other means.  So I think it is conservatively
> correct.

I don't know.  If function is pure and has infinite recursion in it it
means that it can only run forever without side effects if it gets lucky
and we tail-recurse it.  There are no other means avoid the stack use from
growing.

First i think code relying on tail-recurse optimization to not run out
of stack is not strictly valid in C/C++ other languages we care.
Also in C++ there is the forced progression which makes even the tail
optiimzed code invalid.

I think in high level code such recursive accessors used for no good
reason are not that infrequent.  Also we had this bug in tree probably
forever since LOOPING_PURE_CONST was added and no one complained ;)

Relaxing this rule breaks some testcases, but odd ones - they are
infinitely self-recursive builtin implementations where we then both
prove function as noreturn & later optimize builtin to constant
so the assembly matching does not see expected thing.

Honza


Enable pure/const discovery in modref

2021-11-11 Thread Jan Hubicka via Gcc-patches
Hi,
this patch enables the pure/const discovery in modref, so we newly can handle
some extra cases, for example:

struct a {int a,b,c;};
__attribute__ ((noinline))
int init (struct a *a)
{
  a->a=1;
  a->b=2;
  a->c=3;
}
int const_fn () 
{
  struct a a;
  init ();
  return a.a + a.b + a.c;
}

Here pure/const stops on the fact that const_fn calls non-const init, while
modref knows that the memory it initializes is local to const_fn.

I ended up reordering passes so early modref is done after early pure-const
mostly to avoid need to change testsuite which greps for const functions
being detects in pure-const.  Stil some testuiste compensation is needed.

Boostrapped/regtested x86_64-linux. Will commit it shortly.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  

* ipa-modref.c (analyze_function): Do pure/const discovery, return
true on success.
(pass_modref::execute): If pure/const is discovered fixup cfg.
(ignore_edge): Do not ignore pure/const edges.
(modref_propagate_in_scc): Do pure/const discovery, return true if
cdtor was promoted pure/const.
(pass_ipa_modref::execute): If needed remove unreachable functions.
* ipa-pure-const.c (warn_function_noreturn): Fix whitespace.
(warn_function_cold): Likewise.
(skip_function_for_local_pure_const): Move earlier.
(ipa_make_function_const): Break out from ...
(ipa_make_function_pure): Break out from ...
(propagate_pure_const): ... here.
(pass_local_pure_const::execute): Use it.
* ipa-utils.h (ipa_make_function_const): Declare.
(ipa_make_function_pure): Declare.
* passes.def: Move early modref after pure-const.

gcc/testsuite/ChangeLog:

2021-11-11  Jan Hubicka  

* c-c++-common/tm/inline-asm.c: Disable pure-const.
* g++.dg/ipa/modref-1.C: Update template.
* gcc.dg/tree-ssa/modref-11.c: Disable pure-const.
* gcc.dg/tree-ssa/modref-14.c: New test.
* gcc.dg/tree-ssa/modref-8.c: Do not optimize sibling calls.
* gfortran.dg/do_subscript_3.f90: Add -O0.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 45b391a565e..72006251f29 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -2603,11 +2603,13 @@ analyze_parms (modref_summary *summary, 
modref_summary_lto *summary_lto,
 }
 
 /* Analyze function F.  IPA indicates whether we're running in local mode
-   (false) or the IPA mode (true).  */
+   (false) or the IPA mode (true).
+   Return true if fixup cfg is needed after the pass.  */
 
-static void
+static bool
 analyze_function (function *f, bool ipa)
 {
+  bool fixup_cfg = false;
   if (dump_file)
 fprintf (dump_file, "modref analyzing '%s' (ipa=%i)%s%s\n",
 function_name (f), ipa,
@@ -2617,7 +2619,7 @@ analyze_function (function *f, bool ipa)
   /* Don't analyze this function if it's compiled with -fno-strict-aliasing.  
*/
   if (!flag_ipa_modref
   || lookup_attribute ("noipa", DECL_ATTRIBUTES (current_function_decl)))
-return;
+return false;
 
   /* Compute no-LTO summaries when local optimization is going to happen.  */
   bool nolto = (!ipa || ((!flag_lto || flag_fat_lto_objects) && !in_lto_p)
@@ -2774,12 +2776,32 @@ analyze_function (function *f, bool ipa)
  if (!summary->useful_p (ecf_flags, false))
{
  remove_summary (lto, nolto, ipa);
- return;
+ return false;
}
}
  first = false;
}
 }
+  if (summary && !summary->global_memory_written_p () && !summary->side_effects
+  && !finite_function_p ())
+summary->side_effects = true;
+  if (summary_lto && !summary_lto->side_effects && !finite_function_p ())
+summary_lto->side_effects = true;
+
+  if (!ipa && flag_ipa_pure_const)
+{
+  if (!summary->stores->every_base && !summary->stores->bases)
+   {
+ if (!summary->loads->every_base && !summary->loads->bases)
+   fixup_cfg = ipa_make_function_const
+  (cgraph_node::get (current_function_decl),
+   summary->side_effects, true);
+ else
+   fixup_cfg = ipa_make_function_pure
+  (cgraph_node::get (current_function_decl),
+   summary->side_effects, true);
+   }
+}
   if (summary && !summary->useful_p (ecf_flags))
 {
   if (!ipa)
@@ -2793,11 +2815,6 @@ analyze_function (function *f, bool ipa)
   summaries_lto->remove (fnode);
   summary_lto = NULL;
 }
-  if (summary && !summary->global_memory_written_p () && !summary->side_effects
-  && !finite_function_p ())
-summary->side_effects = true;
-  if (summary_lto && !summary_lto->side_effects && !finite_function_p ())
-summary_lto->side_effects = true;
 
   if (ipa && !summary && !summary_lto)
 remove_modref_edge_summaries (fnode);
@@ -2907,6 +2924,7 @@ analyze_function (function *f, bool ipa)
}
}
 }
+  return 

Fix noreturn discovery

2021-11-11 Thread Jan Hubicka via Gcc-patches
Hi,
this patch fixes ipa-pure-const handling of noreturn flags.  It is not
safe to set it for interposable symbols and we should also set it for
aliases (just like we do for other flags).  This patch merely copies other
flag handling and implements it here.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

gcc/ChangeLog:

2021-11-11  Jan Hubicka  

* cgraph.c (set_noreturn_flag_1): New function.
(cgraph_node::set_noreturn_flag): New member function
* cgraph.h (cgraph_node::set_noreturn_flags): Declare.
* ipa-pure-const.c (pass_local_pure_const::execute): Use it.

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index c67d300e7a4..466b66d5ba5 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2614,6 +2614,53 @@ cgraph_node::set_malloc_flag (bool malloc_p)
   return changed;
 }
 
+/* Worker to set noreturng flag.  */
+static void
+set_noreturn_flag_1 (cgraph_node *node, bool noreturn_p, bool *changed)
+{
+  if (noreturn_p && !TREE_THIS_VOLATILE (node->decl))
+{
+  TREE_THIS_VOLATILE (node->decl) = true;
+  *changed = true;
+}
+
+  ipa_ref *ref;
+  FOR_EACH_ALIAS (node, ref)
+{
+  cgraph_node *alias = dyn_cast (ref->referring);
+  if (!noreturn_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+   set_noreturn_flag_1 (alias, noreturn_p, changed);
+}
+
+  for (cgraph_edge *e = node->callers; e; e = e->next_caller)
+if (e->caller->thunk
+   && (!noreturn_p || e->caller->get_availability () > AVAIL_INTERPOSABLE))
+  set_noreturn_flag_1 (e->caller, noreturn_p, changed);
+}
+
+/* Set TREE_THIS_VOLATILE on NODE's decl and on NODE's aliases if any.  */
+
+bool
+cgraph_node::set_noreturn_flag (bool noreturn_p)
+{
+  bool changed = false;
+
+  if (!noreturn_p || get_availability () > AVAIL_INTERPOSABLE)
+set_noreturn_flag_1 (this, noreturn_p, );
+  else
+{
+  ipa_ref *ref;
+
+  FOR_EACH_ALIAS (this, ref)
+   {
+ cgraph_node *alias = dyn_cast (ref->referring);
+ if (!noreturn_p || alias->get_availability () > AVAIL_INTERPOSABLE)
+   set_noreturn_flag_1 (alias, noreturn_p, );
+   }
+}
+  return changed;
+}
+
 /* Worker to set_const_flag.  */
 
 static void
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 0a1f7c8960e..e42e305cdb6 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1167,6 +1167,10 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : 
public symtab_node
  if any.  */
   bool set_malloc_flag (bool malloc_p);
 
+  /* SET TREE_THIS_VOLATILE on cgraph_node's decl and on aliases of the node
+ if any.  */
+  bool set_noreturn_flag (bool noreturn_p);
+
   /* If SET_CONST is true, mark function, aliases and thunks to be ECF_CONST.
 If SET_CONST if false, clear the flag.
 
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index 505ed4f8a3b..84a028bcf8e 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -2132,11 +2132,10 @@ pass_local_pure_const::execute (function *fun)
 current_function_name ());
 
   /* Update declaration and reduce profile to executed once.  */
-  TREE_THIS_VOLATILE (current_function_decl) = 1;
+  if (cgraph_node::get (current_function_decl)->set_noreturn_flag (true))
+   changed = true;
   if (node->frequency > NODE_FREQUENCY_EXECUTED_ONCE)
node->frequency = NODE_FREQUENCY_EXECUTED_ONCE;
-
-  changed = true;
 }
 
   switch (l->pure_const_state)


Re: [PATCH] vect: Remove vec_outside/inside_cost fields

2021-11-11 Thread Jan Hubicka via Gcc-patches
> > Now afunc writes to __var_5_mma only indirectly so I think it is correct 
> > that
> > we optimize the conditional out.
> >
> > Easy fix would be to add -fno-ipa-modref, but perhaps someone with
> > better understanding of Fortran would help me to improve the testcase so
> > the calls to matmul_r4 remains reachable?
> 
> I think the two matmul_r4 cases were missed optimizations before so just
> changing the expected number of calls to zero is the correct fix here.  Indeed
> we can now statically determine the matrices are not large and so only
> keep the inline copy.

I have updated the matmul as follows.

gcc/testsuite/ChangeLog:

2021-11-11  Jan Hubicka  

* gfortran.dg/inline_matmul_17.f90: Fix template

diff --git a/gcc/testsuite/gfortran.dg/inline_matmul_17.f90 
b/gcc/testsuite/gfortran.dg/inline_matmul_17.f90
index d2ca8e2948a..cff4b6ce5e2 100644
--- a/gcc/testsuite/gfortran.dg/inline_matmul_17.f90
+++ b/gcc/testsuite/gfortran.dg/inline_matmul_17.f90
@@ -45,4 +45,4 @@ program main
   c = matmul(a, bfunc())
   if (any(c-d /= 0)) STOP 6
 end program main
-! { dg-final { scan-tree-dump-times "matmul_r4" 2 "optimized" } }
+! { dg-final { scan-tree-dump-not "matmul_r4" "optimized" } }


Re: Use modref summary to DSE calls to non-pure functions

2021-11-11 Thread Jan Hubicka via Gcc-patches
> > Hmm, I could try to do this, but possibly incrementally?
> 
> You mean handle a  argument specially for unknown param offset?
> Yeah, I guess so.

I think it is also pointer that was allocated and is going to be
freed...
> 
> > Basically I want to have
> >
> > foo ()
> > decl = {}
> >
> > To be matched since even if I do not know the offset I know it is dead
> > after end of lifetime of the decl.  I am not quite sure PTA will give me
> > that?
> 
> for this case PTA should tell you the alias is to 'decl' only but then I'm
> not sure if stmt_kills_ref_p is up to the task to determine that 'decl = {}',
> from a quick look it doesn't.  So indeed the only interesting case will
> be a  based parameter which we can special-case.

Yep, i do not think it understands this.  I will look into it - I guess
it is common enough to care about.

Honza


Fix recursion discovery in ipa-pure-const

2021-11-11 Thread Jan Hubicka via Gcc-patches
Hi,
We make self recursive functions as looping of fear of endless recursion.
This is done correctly for local pure/const and for non-trivial SCCs in
callgraph, but for trivial SCCs we miss the flag.

I think it is bad decision since infinite recursion will run out of stack,
but changing it upsets some testcases and should be done independently.
So this patch is fixing current behaviour to be consistent.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

2021-11-11  Jan Hubicka  

* ipa-pure-const.c (propagate_pure_const): Self recursion is
a side effects.

diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index 505ed4f8a3b..64777cd2d91 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -1513,6 +1611,9 @@ propagate_pure_const (void)
  enum pure_const_state_e edge_state = IPA_CONST;
  bool edge_looping = false;
 
+ if (e->recursive_p ())
+   looping = true;
+
  if (dump_file && (dump_flags & TDF_DETAILS))
{
  fprintf (dump_file, "Call to %s",


Re: [Bug tree-optimization/103175] [12 Regression] internal compiler error: in handle_call_arg, at tree-ssa-structalias.c:4139

2021-11-11 Thread Jan Hubicka via Gcc-bugs
The sanity check verifies that functions acessing parameter indirectly
also reads the parameter (otherwise the indirect reference can not
happen).  This patch moves the check earlier and removes some overactive
flag cleaning on function call boundary which introduces the non-sential
situation.  I got bit paranoid here on how return value relates to
escaping solution.

as discussed on ML, matmul failure is simply the fact that the testcase
verified missed optimization is still misssed.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 72006251f29..a97021c6c60 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -1681,6 +1681,13 @@ modref_lattice::merge (int f)
 {
   if (f & EAF_UNUSED)
 return false;
+  /* Check that flags seems sane: if function does not read the parameter
+ it can not access it indirectly.  */
+  gcc_checking_assert (!(f & EAF_NO_DIRECT_READ)
+  || ((f & EAF_NO_INDIRECT_READ)
+  && (f & EAF_NO_INDIRECT_CLOBBER)
+  && (f & EAF_NO_INDIRECT_ESCAPE)
+  && (f & EAF_NOT_RETURNED_INDIRECTLY)));
   if ((flags & f) != flags)
 {
   flags &= f;
@@ -1874,27 +1881,13 @@ modref_eaf_analysis::merge_call_lhs_flags (gcall *call, 
int arg,
argument if needed.  */
 
 static int
-callee_to_caller_flags (int call_flags, bool ignore_stores,
-   modref_lattice )
+callee_to_caller_flags (int call_flags, bool ignore_stores)
 {
   /* call_flags is about callee returning a value
  that is not the same as caller returning it.  */
   call_flags |= EAF_NOT_RETURNED_DIRECTLY
| EAF_NOT_RETURNED_INDIRECTLY;
-  /* TODO: We miss return value propagation.
- Be conservative and if value escapes to memory
- also mark it as escaping.  */
-  if (!ignore_stores && !(call_flags & EAF_UNUSED))
-{
-  if (!(call_flags & EAF_NO_DIRECT_ESCAPE))
-   lattice.merge (~(EAF_NOT_RETURNED_DIRECTLY
-| EAF_NOT_RETURNED_INDIRECTLY
-| EAF_UNUSED));
-  if (!(call_flags & EAF_NO_INDIRECT_ESCAPE))
-   lattice.merge (~(EAF_NOT_RETURNED_INDIRECTLY
-| EAF_UNUSED));
-}
-  else
+  if (ignore_stores)
 call_flags |= ignore_stores_eaf_flags;
   return call_flags;
 }
@@ -2033,15 +2026,9 @@ modref_eaf_analysis::analyze_ssa_name (tree name)
  if (!(call_flags & (EAF_NOT_RETURNED_DIRECTLY
  | EAF_UNUSED)))
m_lattice[index].merge (~(EAF_NO_DIRECT_ESCAPE
- | EAF_NO_INDIRECT_ESCAPE
- | EAF_UNUSED));
- if (!(call_flags & (EAF_NOT_RETURNED_INDIRECTLY
- | EAF_UNUSED)))
-   m_lattice[index].merge (~(EAF_NO_INDIRECT_ESCAPE
  | EAF_UNUSED));
  call_flags = callee_to_caller_flags
-  (call_flags, false,
-   m_lattice[index]);
+  (call_flags, false);
}
  m_lattice[index].merge (call_flags);
}
@@ -2057,8 +2044,7 @@ modref_eaf_analysis::analyze_ssa_name (tree name)
  !(call_flags & EAF_NOT_RETURNED_DIRECTLY),
  !(call_flags & EAF_NOT_RETURNED_INDIRECTLY));
  call_flags = callee_to_caller_flags
-  (call_flags, ignore_stores,
-   m_lattice[index]);
+  (call_flags, ignore_stores);
  if (!(ecf_flags & (ECF_CONST | ECF_NOVOPS)))
m_lattice[index].merge (call_flags);
}
@@ -2082,8 +2068,7 @@ modref_eaf_analysis::analyze_ssa_name (tree name)
if (!(ecf_flags & (ECF_CONST | ECF_NOVOPS)))
  {
call_flags = callee_to_caller_flags
-(call_flags, ignore_stores,
- m_lattice[index]);
+(call_flags, ignore_stores);
if (!record_ipa)
  m_lattice[index].merge (call_flags);
else
@@ -2105,8 +2090,7 @@ modref_eaf_analysis::analyze_ssa_name (tree name)
else
  {
call_flags = callee_to_caller_flags
-(call_flags, ignore_stores,
- m_lattice[index]);
+(call_flags, ignore_stores);
if (!record_ipa)
  

Silence additional warning in gfortran.dg/do_subscript_3.f90

2021-11-10 Thread Jan Hubicka via Gcc-patches
Hi,
the testcase tests for out of bound accesses warnings and with ipa-modref 
improvements
it now triggers a new warning:

/aux/hubicka/trunk-git/gcc/testsuite/gfortran.dg/do_subscript_3.f90:11:9: 
Warning: (1)
/aux/hubicka/trunk-git/gcc/testsuite/gfortran.dg/do_subscript_3.f90:10:47: 
Warning: Array reference at (1) out of bounds (0 < 1) in loop beginning at (2)
/aux/hubicka/trunk-git/gcc/testsuite/gfortran.dg/do_subscript_3.f90:19:9: 
Warning: (1)
/aux/hubicka/trunk-git/gcc/testsuite/gfortran.dg/do_subscript_3.f90:18:45: 
Warning: Array reference at (1) out of bounds (6 > 5) in loop beginning at (2)
/aux/hubicka/trunk-git/gcc/testsuite/gfortran.dg/do_subscript_3.f90:19:50: 
Warning: iteration 5 invokes undefined behavior 
[-Waggressive-loop-optimizations]
/aux/hubicka/trunk-git/gcc/testsuite/gfortran.dg/do_subscript_3.f90:18:9: note: 
within this loop

I suppose we now are able to propagate array bounds better into the
nested function.

The last warning is new and correct even though little bit redundant.  I think
we may just silence it?  I wonder why we do not get same fact on the first loop
(which hits out of bound access already at iteration 0).

Looks OK?
Honza

gcc/testsuite/ChangeLog:

2021-11-10  Jan Hubicka  

* gfortran.dg/do_subscript_3.f90: Add 
-Wno-aggressive-loop-optimizations.

diff --git a/gcc/testsuite/gfortran.dg/do_subscript_3.f90 
b/gcc/testsuite/gfortran.dg/do_subscript_3.f90
index 2f62f58142b..18ed9a2f0c9 100644
--- a/gcc/testsuite/gfortran.dg/do_subscript_3.f90
+++ b/gcc/testsuite/gfortran.dg/do_subscript_3.f90
@@ -1,4 +1,5 @@
 ! { dg-do compile }
+! { dg-additional-options "-Wno-aggressive-loop-optimizations" }
 ! PR fortran/91424
 ! Check that only one warning is issued inside blocks, and that
 ! warnings are also issued for contained subroutines.


Fix modref_tree::remap_params

2021-11-10 Thread Jan Hubicka via Gcc-patches
Hi,
this patch fixes wrong compare in remap_params which triggers a wrong
code with my followup patch.  This needs backporting to gcc11 as well
which I plan to do tomorrow.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* ipa-modref-tree.h (modref_tree::remap_params): Fix off-by-one error.

diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index be5efcbb68f..3e213b23d79 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -1139,7 +1139,7 @@ struct GTY((user)) modref_tree
size_t k;
modref_access_node *access_node;
FOR_EACH_VEC_SAFE_ELT (ref_node->accesses, k, access_node)
- if (access_node->parm_index > 0)
+ if (access_node->parm_index >= 0)
{
  if (access_node->parm_index < (int)map->length ())
access_node->parm_index = (*map)[access_node->parm_index];


Extend modref by side-effect analysis

2021-11-10 Thread Jan Hubicka via Gcc-patches
Hi,
this patch makes modref to also collect info whether function has side
effects.  This allows pure/const function detection and also handling
functions which do store some memory in similar way as we handle
pure/consts now.

The code is symmetric to what ipa-pure-const does.  Modref is actually more
capable on proving that a given function is pure/const (since it understands
that non-pure function can be called when it only modifies data on stack)
so we could retire ipa-pure-const's pure-const discovery at some point.

However this patch only does the anlaysis - the consumers of this flag
will come next.

Bootstrapped/regtested x86_64-linux. I plan to commit it later today
if there are no complains.

gcc/ChangeLog:

* ipa-modref.c: Include tree-eh.h
(modref_summary::modref_summary): Initialize side_effects.
(struct modref_summary_lto): New bool field side_effects.
(modref_summary_lto::modref_summary_lto): Initialize side_effects.
(modref_summary::dump): Dump side_effects.
(modref_summary_lto::dump): Dump side_effects.
(merge_call_side_effects): Merge side effects.
(process_fnspec): Calls to non-const/pure or looping
function is a side effect.
(analyze_call): Self-recursion is a side-effect; handle
special builtins.
(analyze_load): Watch for volatile and throwing memory.
(analyze_store): Likewise.
(analyze_stmt): Watch for volatitle asm.
(analyze_function): Handle side_effects.
(modref_summaries::duplicate): Duplicate side_effects.
(modref_summaries_lto::duplicate): Likewise.
(modref_write): Stream side_effects.
(read_section): Likewise.
(update_signature): Update.
(propagate_unknown_call): Handle side_effects.
(modref_propagate_in_scc): Likewise.
* ipa-modref.h (struct modref_summary): Add side_effects.
* ipa-pure-const.c (special_builtin_state): Rename to ...
(builtin_safe_for_const_function_p): ... this one.
(check_call): Update.
(finite_function_p): Break out from ...
(propagate_pure_const): ... here
* ipa-utils.h (finite_function): Declare.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 22efc06c583..d14f9e52f62 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -87,6 +87,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssanames.h"
 #include "attribs.h"
 #include "tree-cfg.h"
+#include "tree-eh.h"
 
 
 namespace {
@@ -273,7 +274,7 @@ static GTY(()) fast_function_summary 
 
 modref_summary::modref_summary ()
   : loads (NULL), stores (NULL), retslot_flags (0), static_chain_flags (0),
-writes_errno (false)
+writes_errno (false), side_effects (false)
 {
 }
 
@@ -371,6 +372,7 @@ struct GTY(()) modref_summary_lto
   eaf_flags_t retslot_flags;
   eaf_flags_t static_chain_flags;
   bool writes_errno;
+  bool side_effects;
 
   modref_summary_lto ();
   ~modref_summary_lto ();
@@ -382,7 +384,7 @@ struct GTY(()) modref_summary_lto
 
 modref_summary_lto::modref_summary_lto ()
   : loads (NULL), stores (NULL), retslot_flags (0), static_chain_flags (0),
-writes_errno (false)
+writes_errno (false), side_effects (false)
 {
 }
 
@@ -615,6 +617,8 @@ modref_summary::dump (FILE *out)
 }
   if (writes_errno)
 fprintf (out, "  Writes errno\n");
+  if (side_effects)
+fprintf (out, "  Side effects\n");
   if (arg_flags.length ())
 {
   for (unsigned int i = 0; i < arg_flags.length (); i++)
@@ -647,6 +651,8 @@ modref_summary_lto::dump (FILE *out)
   dump_lto_records (stores, out);
   if (writes_errno)
 fprintf (out, "  Writes errno\n");
+  if (side_effects)
+fprintf (out, "  Side effects\n");
   if (arg_flags.length ())
 {
   for (unsigned int i = 0; i < arg_flags.length (); i++)
@@ -980,6 +986,12 @@ merge_call_side_effects (modref_summary *cur_summary,
  changed = true;
}
 }
+  if (!cur_summary->side_effects
+  && callee_summary->side_effects)
+{
+  cur_summary->side_effects = true;
+  changed = true;
+}
   return changed;
 }
 
@@ -1075,6 +1087,18 @@ process_fnspec (modref_summary *cur_summary,
gcall *call, bool ignore_stores)
 {
   attr_fnspec fnspec = gimple_call_fnspec (call);
+  int flags = gimple_call_flags (call);
+
+  if (!(flags & (ECF_CONST | ECF_NOVOPS))
+  || (flags & ECF_LOOPING_CONST_OR_PURE)
+  || (cfun->can_throw_non_call_exceptions
+ && stmt_could_throw_p (cfun, call)))
+{
+  if (cur_summary)
+   cur_summary->side_effects = true;
+  if (cur_summary_lto)
+   cur_summary_lto->side_effects = true;
+}
   if (!fnspec.known_p ())
 {
   if (dump_file && gimple_call_builtin_p (call, BUILT_IN_NORMAL))
@@ -1212,6 +1236,10 @@ analyze_call (modref_summary *cur_summary, 
modref_summary_lto *cur_summary_lto,
   if (recursive_call_p (current_function_decl, callee))
 {
   

Use modref summary to DSE calls to non-pure functions

2021-11-10 Thread Jan Hubicka via Gcc-patches
Hi,
this patch implements DSE using modref summaries: if function has no side 
effects
besides storing to memory pointed to by its argument and if we can prove those 
stores
to be dead, we can optimize out. So we handle for example:

volatile int *ptr;
struct a {
int a,b,c;
} a;
__attribute__((noinline))
static int init (struct a*a)
{
a->a=0;
a->b=1;
}
__attribute__((noinline))
static int use (struct a*a)
{
if (a->c != 3)
*ptr=5;
}

void
main(void)
{
struct a a;
init ();
a.c=3;
use ();
}

And optimize out call to init ().

We work quite hard to inline such constructors and this patch is only
effective if inlining did not happen (for whatever reason).  Still, we
optimize about 26 calls building tramp3d and about 70 calls during
bootstrap (mostly ctors of poly_int). During bootstrap most removal
happens early and we would inline the ctors unless we decide to optimize
for size. 1 call per cc1* binary is removed late during LTO build.

This is more frequent in codebases with higher abstraction penalty, with
-Os or with profile feedback in sections optimized for size. I also hope
we will be able to CSE such calls and that would make DSE more
important.

Bootstrapped/regtested x86_64-linux, OK?

gcc/ChangeLog:

* tree-ssa-alias.c (ao_ref_alias_ptr_type): Export.
* tree-ssa-alias.h (ao_ref_init_from_ptr_and_range): Declare.
* tree-ssa-dse.c (dse_optimize_stmt): Rename to ...
(dse_optimize_store): ... this;
(dse_optimize_call): New function.
(pass_dse::execute): Use dse_optimize_call and update
call to dse_optimize_store.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/modref-dse-1.c: New test.
* gcc.dg/tree-ssa/modref-dse-2.c: New test.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c
new file mode 100644
index 000..e78693b349a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1"  } */
+volatile int *ptr;
+struct a {
+   int a,b,c;
+} a;
+__attribute__((noinline))
+static int init (struct a*a)
+{
+   a->a=0;
+   a->b=1;
+}
+__attribute__((noinline))
+static int use (struct a*a)
+{
+   if (a->c != 3)
+   *ptr=5;
+}
+
+void
+main(void)
+{
+   struct a a;
+   init ();
+   a.c=3;
+   use ();
+}
+/* { dg-final { scan-tree-dump "Deleted dead store: init" "dse1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c
new file mode 100644
index 000..99c8ceb8127
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-2.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse2 -fno-ipa-sra -fno-ipa-cp"  } */
+volatile int *ptr;
+struct a {
+   int a,b,c;
+} a;
+__attribute__((noinline))
+static int init (struct a*a)
+{
+   a->a=0;
+   a->b=1;
+   a->c=1;
+}
+__attribute__((noinline))
+static int use (struct a*a)
+{
+   if (a->c != 3)
+   *ptr=5;
+}
+
+void
+main(void)
+{
+   struct a a;
+   init ();
+   a.c=3;
+   use ();
+}
+/* Only DSE2 is tracking live bytes needed to figure out that store to c is
+   also dead above.  */
+/* { dg-final { scan-tree-dump "Deleted dead store: init" "dse2" } } */
diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index eabf6805f2b..affb5d40d4b 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -782,7 +782,7 @@ ao_ref_alias_ptr_type (ao_ref *ref)
The access is assumed to be only to or after of the pointer target adjusted
by the offset, not before it (even in the case RANGE_KNOWN is false).  */
 
-static void
+void
 ao_ref_init_from_ptr_and_range (ao_ref *ref, tree ptr,
bool range_known,
poly_int64 offset,
diff --git a/gcc/tree-ssa-alias.h b/gcc/tree-ssa-alias.h
index 275dea10397..c2e28a74999 100644
--- a/gcc/tree-ssa-alias.h
+++ b/gcc/tree-ssa-alias.h
@@ -111,6 +111,8 @@ ao_ref::max_size_known_p () const
 /* In tree-ssa-alias.c  */
 extern void ao_ref_init (ao_ref *, tree);
 extern void ao_ref_init_from_ptr_and_size (ao_ref *, tree, tree);
+void ao_ref_init_from_ptr_and_range (ao_ref *, tree, bool,
+poly_int64, poly_int64, poly_int64);
 extern tree ao_ref_base (ao_ref *);
 extern alias_set_type ao_ref_alias_set (ao_ref *);
 extern alias_set_type ao_ref_base_alias_set (ao_ref *);
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 27287fe88ee..1fec9100011 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -40,6 +40,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "tree-eh.h"
 #include "cfganal.h"
+#include "cgraph.h"
+#include "ipa-modref-tree.h"
+#include "ipa-modref.h"
 
 /* This file implements dead store elimination.
 

Introduce finalize method to modref_summary

2021-11-12 Thread Jan Hubicka via Gcc-patches
Hi,
this patch adds finalize method that is called once summary is computed
before it is used by optimizers.  It adds convenient place to compute
various flags as one for DSE Richard asked for.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

gcc/ChangeLog:

* ipa-modref.c (modref_summary::global_memory_read_p): Remove.
(modref_summary::global_memory_written_p): Remove.
(modref_summary::dump): Dump new flags.
(modref_summary::finalize): New member function.
(analyze_function): Call it.
(read_section): Call it.
(update_signature): Call it.
(pass_ipa_modref::execute): Call it.
* ipa-modref.h (struct modref_summary): Remove
global_memory_read_p and global_memory_written_p.
Add global_memory_read, global_memory_written.
* tree-ssa-structalias.c (determine_global_memory_access):
Update.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 72006251f29..6cc282292df 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -339,26 +339,6 @@ modref_summary::useful_p (int ecf_flags, bool check_flags)
   return stores && !stores->every_base;
 }
 
-/* Return true if global memory is read
-   (that is loads summary contains global memory access).  */
-bool
-modref_summary::global_memory_read_p ()
-{
-  if (!loads)
-return true;
-  return loads->global_access_p ();
-}
-
-/* Return true if global memory is written.  */
-bool
-modref_summary::global_memory_written_p ()
-{
-  if (!stores)
-return true;
-  return stores->global_access_p ();
-}
-
-
 /* Single function summary used for LTO.  */
 
 typedef modref_tree  modref_records_lto;
@@ -621,6 +601,10 @@ modref_summary::dump (FILE *out)
 fprintf (out, "  Writes errno\n");
   if (side_effects)
 fprintf (out, "  Side effects\n");
+  if (global_memory_read)
+fprintf (out, "  Global memory read\n");
+  if (global_memory_written)
+fprintf (out, "  Global memory written\n");
   if (arg_flags.length ())
 {
   for (unsigned int i = 0; i < arg_flags.length (); i++)
@@ -676,6 +660,15 @@ modref_summary_lto::dump (FILE *out)
 }
 }
 
+/* Called after summary is produced and before it is used by local analysis.
+   Can be called multiple times in case summary needs to update signature.  */
+void
+modref_summary::finalize ()
+{
+  global_memory_read = !loads || loads->global_access_p ();
+  global_memory_written = !stores || stores->global_access_p ();
+}
+
 /* Get function summary for FUNC if it exists, return NULL otherwise.  */
 
 modref_summary *
@@ -2782,8 +2775,7 @@ analyze_function (function *f, bool ipa)
  first = false;
}
 }
-  if (summary && !summary->global_memory_written_p () && !summary->side_effects
-  && !finite_function_p ())
+  if (summary && !summary->side_effects && !finite_function_p ())
 summary->side_effects = true;
   if (summary_lto && !summary_lto->side_effects && !finite_function_p ())
 summary_lto->side_effects = true;
@@ -2810,6 +2802,8 @@ analyze_function (function *f, bool ipa)
summaries->remove (fnode);
   summary = NULL;
 }
+  else
+summary->finalize ();
   if (summary_lto && !summary_lto->useful_p (ecf_flags))
 {
   summaries_lto->remove (fnode);
@@ -3529,6 +3523,8 @@ read_section (struct lto_file_decl_data *file_data, const 
char *data,
  modref_read_escape_summary (, e);
}
}
+  if (flag_ltrans)
+   modref_sum->finalize ();
   if (dump_file)
{
  fprintf (dump_file, "Read modref for %s\n",
@@ -3685,6 +3681,8 @@ update_signature (struct cgraph_node *node)
   if (r_lto)
r_lto->dump (dump_file);
 }
+  if (r)
+r->finalize ();
   return;
 }
 
@@ -4905,6 +4903,11 @@ pass_ipa_modref::execute (function *)
 
   pureconst |= modref_propagate_in_scc (component_node);
   modref_propagate_flags_in_scc (component_node);
+  if (optimization_summaries)
+   for (struct cgraph_node *cur = component_node; cur;
+cur = ((struct ipa_dfs_info *) cur->aux)->next_cycle)
+ if (modref_summary *sum = optimization_summaries->get (cur))
+   sum->finalize ();
   if (dump_file)
modref_propagate_dump_scc (component_node);
 }
diff --git a/gcc/ipa-modref.h b/gcc/ipa-modref.h
index 49c99f263a7..9a8d565d770 100644
--- a/gcc/ipa-modref.h
+++ b/gcc/ipa-modref.h
@@ -33,15 +33,18 @@ struct GTY(()) modref_summary
   auto_vec GTY((skip)) arg_flags;
   eaf_flags_t retslot_flags;
   eaf_flags_t static_chain_flags;
-  bool writes_errno;
-  bool side_effects;
+  unsigned writes_errno : 1;
+  unsigned side_effects : 1;
+  /* Flags coputed by finalize method.  */
+  unsigned global_memory_read : 1;
+  unsigned global_memory_written : 1;
+
 
   modref_summary ();
   ~modref_summary ();
   void dump (FILE *);
   bool useful_p (int ecf_flags, bool check_flags = true);
-  bool global_memory_read_p ();
-  bool global_memory_written_p ();
+  void finalize ();

Fix ICE in tree-ssa-structalias

2021-11-12 Thread Jan Hubicka via Gcc-patches
Hi,
this patch fixes ICE in sanity check of EAF flags determined: we can not
escape/clobber/return param indirectly w/o reading it.
I moved check earlier and fixed the wrong updates.

Boottrapped/regtested x86_64-linux, comitted.

Honza

PR tree-optimization/103175
* ipa-modref.c (modref_lattice::merge): Add sanity check.
(callee_to_caller_flags): Make flags adjustment sane.
(modref_eaf_analysis::analyze_ssa_name): Likewise.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 44b3427a202..e999c2c5d1e 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -1681,6 +1681,13 @@ modref_lattice::merge (int f)
 {
   if (f & EAF_UNUSED)
 return false;
+  /* Check that flags seems sane: if function does not read the parameter
+ it can not access it indirectly.  */
+  gcc_checking_assert (!(f & EAF_NO_DIRECT_READ)
+  || ((f & EAF_NO_INDIRECT_READ)
+  && (f & EAF_NO_INDIRECT_CLOBBER)
+  && (f & EAF_NO_INDIRECT_ESCAPE)
+  && (f & EAF_NOT_RETURNED_INDIRECTLY)));
   if ((flags & f) != flags)
 {
   flags &= f;
@@ -1889,9 +1896,11 @@ callee_to_caller_flags (int call_flags, bool 
ignore_stores,
   if (!(call_flags & EAF_NO_DIRECT_ESCAPE))
lattice.merge (~(EAF_NOT_RETURNED_DIRECTLY
 | EAF_NOT_RETURNED_INDIRECTLY
+| EAF_NO_DIRECT_READ
 | EAF_UNUSED));
   if (!(call_flags & EAF_NO_INDIRECT_ESCAPE))
lattice.merge (~(EAF_NOT_RETURNED_INDIRECTLY
+| EAF_NO_DIRECT_READ
 | EAF_UNUSED));
 }
   else
@@ -2033,11 +2042,11 @@ modref_eaf_analysis::analyze_ssa_name (tree name)
  if (!(call_flags & (EAF_NOT_RETURNED_DIRECTLY
  | EAF_UNUSED)))
m_lattice[index].merge (~(EAF_NO_DIRECT_ESCAPE
- | EAF_NO_INDIRECT_ESCAPE
  | EAF_UNUSED));
  if (!(call_flags & (EAF_NOT_RETURNED_INDIRECTLY
  | EAF_UNUSED)))
m_lattice[index].merge (~(EAF_NO_INDIRECT_ESCAPE
+ | EAF_NO_DIRECT_READ
  | EAF_UNUSED));
  call_flags = callee_to_caller_flags
   (call_flags, false,


Re: [PATCH] options: Make -Ofast switch off -fsemantic-interposition

2021-11-12 Thread Jan Hubicka via Gcc-patches
> Hi,
> 
> using -fno-semantic-interposition has been reported by various people
> to bring about considerable speed up at the cost of strict compliance
> to the ELF symbol interposition rules  See for example
> https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
> 
> As such I believe it should be implied by our -Ofast optimization
> level, not only so that benchmarks that can benefit run faster, but
> also so that people looking at -Ofast documentation for options that
> could speed their programs find it.
> 
> I have verified that with the following patch IPA-CP sees
> flag_semantic_interposition set to zero at Ofast and that info and pdf
> manual builds fine with the documentation change.  I am bootstrapping
> and testing it now in order to comply with submission criteria but I
> don't think an Ofast change gets much tested.
> 
> Assuming it passes, is the patch OK?  (If it is, I will also add a note
> about it in the "Caveats" section in gcc-12/changes.html of wwwdocs
> after I commit the patch.)
> 
> Thanks,
> 
> Martin
> 
> 
> gcc/ChangeLog:
> 
> 2021-11-12  Martin Jambor  
> 
>   * opts.c (default_options_table): Switch off
>   flag_semantic_interposition at Ofast.
>   * doc/invoke.texi (Optimize Options): Document that Ofast switches off
>   -fsemantic-interposition.
OK,
thanks!
Honza


Re: [Bug ipa/103211] [12 Regression] 416.gamess crashes after r12-5177-g494bdadf28d0fb35

2021-11-12 Thread Jan Hubicka via Gcc-bugs
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103211
> 
> --- Comment #2 from Martin Liška  ---
> Optimized dump differs for couple of functions in the same way:
> 
> diff -u good bad
> --- good2021-11-12 17:42:36.995947103 +0100
> +++ bad 2021-11-12 17:41:56.728194961 +0100
> @@ -38,7 +38,6 @@
> 
>  ;; Function abrt (abrt_, funcdef_no=10, decl_uid=4338, cgraph_uid=11,
> symbol_order=10) (executed once)
> 
> -Removing basic block 5
>  __attribute__((fn spec (". ")))
>  void abrt ()
>  {
> @@ -350,7 +349,6 @@
>  void setfm (integer(kind=4) * ipar)
>  {
> [local count: 1073741824]:
> -  master.0.setfm (0, ipar_2(D)); [tail call]
>return;
> 
>  }
> 
> maybe the fnspec for master.0.setfm is bad?
> 
> __attribute__((fn spec (". R w ")))
> void master.0.setfm (integer(kind=8) __entry, integer(kind=4) * ipar)
> {
It looks more like pure/const discovery. You should be able to use
-fdump-ipa-all -fdump-tree-all and grep "function found to be" 
either pure or const.

What is body of master.0.setfm. Does it look like it does nothing?

"R" in fnspec means that arg 0 is only read directly and not derefernced.
"w" means that it arg 1 is not escaping.

Honza


Enable ipa-sra for functions with fnspec attribute

2021-11-13 Thread Jan Hubicka via Gcc-patches
Hi,
this patch enables some ipa-sra on fortran by allowing signature changes on 
functions
with "fn spec" attribute when ipa-modref is enabled.  This is possible since 
ipa-modref
knows how to preserve things we trace in fnspec and fnspec generated by fortran 
forntend
are quite simple and can be analysed automatically now.  To be sure I will also 
add
code that merge fnspec to parameters.

This unfortunately hits bug in ipa-param-manipulation when we remove parameter
that specifies size of variable length parameter. For this reason I added a hack
that prevent signature changes on such functions and will handle it 
incrementally.

I tried creating C testcase but it is blocked by another problem that we punt 
ipa-sra
on access attribute.  This is optimization regression we ought to fix so I 
filled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103223.

As a followup I will add code classifying the type attributes (we have just 
few) and 
get stats on access attribute.

Martin, can you please check that the code detecting signature changes is 
correct
and can't be done more easily?

Bootstrapped/regtested x86_64-linux, comitted.
Honza

gcc/ChangeLog:

* ipa-fnsummary.c (compute_fn_summary): Do not give up on signature
changes on "fn spec" attribute; give up on varadic types.
* ipa-param-manipulation.c: Include attribs.h.
(build_adjusted_function_type): New parameter ARG_MODIFIED; if it is
true remove "fn spec" attribute.
(ipa_param_adjustments::build_new_function_type): Update.
(ipa_param_body_adjustments::modify_formal_parameters): update.
* ipa-sra.c: Include attribs.h.
(ipa_sra_preliminary_function_checks): Do not check for TYPE_ATTRIBUTES.

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 2cfa9a6d0e9..94a80d3ec90 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -3135,10 +3135,38 @@ compute_fn_summary (struct cgraph_node *node, bool 
early)
else
 info->inlinable = tree_inlinable_function_p (node->decl);
 
-   /* Type attributes can use parameter indices to describe them.  */
-   if (TYPE_ATTRIBUTES (TREE_TYPE (node->decl))
-  /* Likewise for #pragma omp declare simd functions or functions
- with simd attribute.  */
+   bool no_signature = false;
+   /* Type attributes can use parameter indices to describe them.
+ Special case fn spec since we can safely preserve them in
+ modref summaries.  */
+   for (tree list = TYPE_ATTRIBUTES (TREE_TYPE (node->decl));
+   list && !no_signature; list = TREE_CHAIN (list))
+if (!flag_ipa_modref
+|| !is_attribute_p ("fn spec", get_attribute_name (list)))
+  {
+if (dump_file)
+   {
+ fprintf (dump_file, "No signature change:"
+  " function type has unhandled attribute %s.\n",
+  IDENTIFIER_POINTER (get_attribute_name (list)));
+   }
+no_signature = true;
+  }
+   for (tree parm = DECL_ARGUMENTS (node->decl);
+   parm && !no_signature; parm = DECL_CHAIN (parm))
+if (variably_modified_type_p (TREE_TYPE (parm), node->decl))
+  {
+if (dump_file)
+   {
+ fprintf (dump_file, "No signature change:"
+  " has parameter with variably modified type.\n");
+   }
+no_signature = true;
+  }
+
+   /* Likewise for #pragma omp declare simd functions or functions
+ with simd attribute.  */
+   if (no_signature
   || lookup_attribute ("omp declare simd",
DECL_ATTRIBUTES (node->decl)))
 node->can_change_signature = false;
diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index ae3149718ca..20f41dd5363 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "symtab-clones.h"
 #include "tree-phinodes.h"
 #include "cfgexpand.h"
+#include "attribs.h"
 
 
 /* Actual prefixes of different newly synthetized parameters.  Keep in sync
@@ -281,11 +282,13 @@ fill_vector_of_new_param_types (vec *new_types, 
vec *otypes,
 /* Build and return a function type just like ORIG_TYPE but with parameter
types given in NEW_PARAM_TYPES - which can be NULL if, but only if,
ORIG_TYPE itself has NULL TREE_ARG_TYPEs.  If METHOD2FUNC is true, also make
-   it a FUNCTION_TYPE instead of FUNCTION_TYPE.  */
+   it a FUNCTION_TYPE instead of FUNCTION_TYPE.
+   If ARG_MODIFIED is true drop attributes that are no longer up to date.  */
 
 static tree
 build_adjusted_function_type (tree orig_type, vec *new_param_types,
- bool method2func, bool skip_return)
+ bool method2func, bool skip_return,
+ bool 

Remember fnspec EAF flags in modref summary

2021-11-13 Thread Jan Hubicka via Gcc-patches
Hi,
this patch stores eaf flags from fnspec to modref summaries.  THis makes
them survive signature changes and also improves IPA propagation in case
modref is not able to autodetect given flag.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

gcc/ChangeLog:

* attr-fnspec.h (attr_fnspec::arg_eaf_flags): Break out from ...
* gimple.c (gimple_call_arg_flags): ... here.
* ipa-modref.c (analyze_parms): Record flags known from fnspec.
(modref_merge_call_site_flags): Use arg_eaf_flags.

diff --git a/gcc/attr-fnspec.h b/gcc/attr-fnspec.h
index 1154c30e7b0..cd618cb342b 100644
--- a/gcc/attr-fnspec.h
+++ b/gcc/attr-fnspec.h
@@ -264,6 +264,29 @@ public:
 return str[1] == 'C' || str[1] == 'P';
   }
 
+  /* Return EAF flags for arg I.  */
+  int
+  arg_eaf_flags (unsigned int i)
+  {
+int flags = 0;
+
+if (!arg_specified_p (i))
+  ;
+else if (!arg_used_p (i))
+  flags = EAF_UNUSED;
+else
+  {
+   if (arg_direct_p (i))
+ flags |= EAF_NO_INDIRECT_READ | EAF_NO_INDIRECT_ESCAPE
+  | EAF_NOT_RETURNED_INDIRECTLY | EAF_NO_INDIRECT_CLOBBER;
+   if (arg_noescape_p (i))
+ flags |= EAF_NO_DIRECT_ESCAPE | EAF_NO_INDIRECT_ESCAPE;
+   if (arg_readonly_p (i))
+ flags |= EAF_NO_DIRECT_CLOBBER | EAF_NO_INDIRECT_CLOBBER;
+  }
+return flags;
+  }
+
   /* Check validity of the string.  */
   void verify ();
 
diff --git a/gcc/gimple.c b/gcc/gimple.c
index 1e0fad92e15..037c6e4c827 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -1567,22 +1567,7 @@ gimple_call_arg_flags (const gcall *stmt, unsigned arg)
   int flags = 0;
 
   if (fnspec.known_p ())
-{
-  if (!fnspec.arg_specified_p (arg))
-   ;
-  else if (!fnspec.arg_used_p (arg))
-   flags = EAF_UNUSED;
-  else
-   {
- if (fnspec.arg_direct_p (arg))
-   flags |= EAF_NO_INDIRECT_READ | EAF_NO_INDIRECT_ESCAPE
-| EAF_NOT_RETURNED_INDIRECTLY | EAF_NO_INDIRECT_CLOBBER;
- if (fnspec.arg_noescape_p (arg))
-   flags |= EAF_NO_DIRECT_ESCAPE | EAF_NO_INDIRECT_ESCAPE;
- if (fnspec.arg_readonly_p (arg))
-   flags |= EAF_NO_DIRECT_CLOBBER | EAF_NO_INDIRECT_CLOBBER;
-   }
-}
+flags = fnspec.arg_eaf_flags (arg);
   tree callee = gimple_call_fndecl (stmt);
   if (callee)
 {
diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 90985cc1326..669dbe45a3d 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -2476,6 +2476,14 @@ analyze_parms (modref_summary *summary, 
modref_summary_lto *summary_lto,
   /* Do the dataflow.  */
   eaf_analysis.propagate ();
 
+  tree attr = lookup_attribute ("fn spec",
+   TYPE_ATTRIBUTES
+ (TREE_TYPE (current_function_decl)));
+  attr_fnspec fnspec (attr
+ ? TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (attr)))
+ : "");
+
+
   /* Store results to summaries.  */
   for (tree parm = DECL_ARGUMENTS (current_function_decl); parm; parm_index++,
parm = TREE_CHAIN (parm))
@@ -2502,6 +2510,18 @@ analyze_parms (modref_summary *summary, 
modref_summary_lto *summary_lto,
  continue;
}
   int flags = eaf_analysis.get_ssa_name_flags (name);
+  int attr_flags = fnspec.arg_eaf_flags (parm_index);
+
+  if (dump_file && (flags | attr_flags) != flags && !(flags & EAF_UNUSED))
+   {
+ fprintf (dump_file,
+  "  Flags for param %i combined with fnspec flags:",
+  (int)parm_index);
+ dump_eaf_flags (dump_file, attr_flags, false);
+ fprintf (dump_file, " determined: ");
+ dump_eaf_flags (dump_file, flags, true);
+   }
+  flags |= attr_flags;
 
   /* Eliminate useless flags so we do not end up storing unnecessary
 summaries.  */
@@ -2522,8 +2542,8 @@ analyze_parms (modref_summary *summary, 
modref_summary_lto *summary_lto,
   "  Flags for param %i combined with IPA pass:",
   (int)parm_index);
  dump_eaf_flags (dump_file, past, false);
- fprintf (dump_file, " local ");
- dump_eaf_flags (dump_file, flags | past, true);
+ fprintf (dump_file, " determined: ");
+ dump_eaf_flags (dump_file, flags, true);
}
  if (!(flags & EAF_UNUSED))
flags |= past;
@@ -2561,7 +2581,7 @@ analyze_parms (modref_summary *summary, 
modref_summary_lto *summary_lto,
  fprintf (dump_file,
   "  Retslot flags combined with IPA pass:");
  dump_eaf_flags (dump_file, past, false);
- fprintf (dump_file, " local ");
+ fprintf (dump_file, " determined: ");
  dump_eaf_flags (dump_file, flags, true);
}
   if (!(flags & EAF_UNUSED))
@@ -2591,7 +2611,7 @@ analyze_parms (modref_summary *summary, 
modref_summary_lto *summary_lto,
  fprintf (dump_file,
   "  

Cleanup modref_access_node

2021-11-13 Thread Jan Hubicka via Gcc-patches
Hi,
this patch moves member functions of modref_access_node from ipa-modref-tree.h
to ipa-modref-tree.c since they become long and not fitting for inlines anyway.
I also cleaned up the interface by making static insert method (which handles
inserting accesses into a vector and optimizing them) which makes it 
possible to hide most of the interface handling interval merging private.

Honza

gcc/ChangeLog:

* ipa-modref-tree.h 
(struct modref_access_node): Move longer member functions to 
ipa-modref-tree.c
(modref_ref_node::try_merge_with): Turn into modreef_acces_node member
function.
* ipa-modref-tree.c (modref_access_node::contains): Move here
from ipa-modref-tree.h.
(modref_access_node::update): Likewise.
(modref_access_node::merge): Likewise.
(modref_access_node::closer_pair_p): Likewise.
(modref_access_node::forced_merge): Likewise.
(modref_access_node::update2): Likewise.
(modref_access_node::combined_offsets): Likewise.
(modref_access_node::try_merge_with): Likewise.
(modref_access_node::insert): Likewise.

diff --git a/gcc/ipa-modref-tree.c b/gcc/ipa-modref-tree.c
index d0ee487f9fa..e363c506a09 100644
--- a/gcc/ipa-modref-tree.c
+++ b/gcc/ipa-modref-tree.c
@@ -28,6 +28,541 @@ along with GCC; see the file COPYING3.  If not see
 
 #if CHECKING_P
 
+/* Return true if both accesses are the same.  */
+bool
+modref_access_node::operator == (modref_access_node ) const
+{
+  if (parm_index != a.parm_index)
+return false;
+  if (parm_index != MODREF_UNKNOWN_PARM)
+{
+  if (parm_offset_known != a.parm_offset_known)
+   return false;
+  if (parm_offset_known
+ && !known_eq (parm_offset, a.parm_offset))
+   return false;
+}
+  if (range_info_useful_p () != a.range_info_useful_p ())
+return false;
+  if (range_info_useful_p ()
+  && (!known_eq (a.offset, offset)
+ || !known_eq (a.size, size)
+ || !known_eq (a.max_size, max_size)))
+return false;
+  return true;
+}
+
+/* Return true A is a subaccess.  */
+bool
+modref_access_node::contains (const modref_access_node ) const
+{
+  poly_int64 aoffset_adj = 0;
+  if (parm_index != MODREF_UNKNOWN_PARM)
+{
+  if (parm_index != a.parm_index)
+   return false;
+  if (parm_offset_known)
+   {
+  if (!a.parm_offset_known)
+return false;
+  /* Accesses are never below parm_offset, so look
+ for smaller offset.
+ If access ranges are known still allow merging
+ when bit offsets comparsion passes.  */
+  if (!known_le (parm_offset, a.parm_offset)
+  && !range_info_useful_p ())
+return false;
+  /* We allow negative aoffset_adj here in case
+ there is an useful range.  This is because adding
+ a.offset may result in non-ngative offset again.
+ Ubsan fails on val << LOG_BITS_PER_UNIT where val
+ is negative.  */
+  aoffset_adj = (a.parm_offset - parm_offset)
+* BITS_PER_UNIT;
+   }
+}
+  if (range_info_useful_p ())
+{
+  if (!a.range_info_useful_p ())
+   return false;
+  /* Sizes of stores are used to check that object is big enough
+to fit the store, so smaller or unknown sotre is more general
+than large store.  */
+  if (known_size_p (size)
+ && (!known_size_p (a.size)
+ || !known_le (size, a.size)))
+   return false;
+  if (known_size_p (max_size))
+   return known_subrange_p (a.offset + aoffset_adj,
+a.max_size, offset, max_size);
+  else
+   return known_le (offset, a.offset + aoffset_adj);
+}
+  return true;
+}
+
+/* Update access range to new parameters.
+   If RECORD_ADJUSTMENTS is true, record number of changes in the access
+   and if threshold is exceeded start dropping precision
+   so only constantly many updates are possible.  This makes dataflow
+   to converge.  */
+void
+modref_access_node::update (poly_int64 parm_offset1,
+   poly_int64 offset1, poly_int64 size1,
+   poly_int64 max_size1, bool record_adjustments)
+{
+  if (known_eq (parm_offset, parm_offset1)
+  && known_eq (offset, offset1)
+  && known_eq (size, size1)
+  && known_eq (max_size, max_size1))
+return;
+  if (!record_adjustments
+  || (++adjustments) < param_modref_max_adjustments)
+{
+  parm_offset = parm_offset1;
+  offset = offset1;
+  size = size1;
+  max_size = max_size1;
+}
+  else
+{
+  if (dump_file)
+   fprintf (dump_file,
+"--param param=modref-max-adjustments limit reached:");
+  if (!known_eq (parm_offset, parm_offset1))
+   {
+ if (dump_file)
+   fprintf (dump_file, " parm_offset cleared");
+ parm_offset_known = false;
+   }
+  if 

Enable more type attributes for signature changes

2021-11-13 Thread Jan Hubicka via Gcc-patches
Hi,
this patch whitelists attributes that are safe for attribute changes and
also makes access attribute dropped if function sigunature is changed.
We could do better by updating the attribute, but doing so seems to be
bit snowballing since with LTO the warnings produced seems bit confused.
We would also like to output original name of function
instead of mangledname.constprop or so.  I looked into what attributes
are dorpped in bootstrap and it does not look too bad.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

gcc/ChangeLog:

* ipa-fnsummary.c (compute_fn_summary): Use type_attribut_allowed_p
* ipa-param-manipulation.c 
(ipa_param_adjustments::type_attribute_allowed_p):
New member function.
(drop_type_attribute_if_params_changed_p): New function.
(build_adjusted_function_type): Use it.
* ipa-param-manipulation.h: Add type_attribute_allowed_p.

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 94a80d3ec90..7e9201a554a 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -3141,8 +3141,8 @@ compute_fn_summary (struct cgraph_node *node, bool early)
  modref summaries.  */
for (tree list = TYPE_ATTRIBUTES (TREE_TYPE (node->decl));
list && !no_signature; list = TREE_CHAIN (list))
-if (!flag_ipa_modref
-|| !is_attribute_p ("fn spec", get_attribute_name (list)))
+   if (!ipa_param_adjustments::type_attribute_allowed_p
+   (get_attribute_name (list)))
   {
 if (dump_file)
{
diff --git a/gcc/ipa-param-manipulation.c b/gcc/ipa-param-manipulation.c
index 991db0d9b1b..29268fa5a58 100644
--- a/gcc/ipa-param-manipulation.c
+++ b/gcc/ipa-param-manipulation.c
@@ -279,6 +279,32 @@ fill_vector_of_new_param_types (vec *new_types, 
vec *otypes,
 }
 }
 
+/* Return false if given attribute should prevent type adjustments.  */
+
+bool
+ipa_param_adjustments::type_attribute_allowed_p (tree name)
+{
+  if ((is_attribute_p ("fn spec", name) && flag_ipa_modref)
+  || is_attribute_p ("access", name)
+  || is_attribute_p ("returns_nonnull", name)
+  || is_attribute_p ("assume_aligned", name)
+  || is_attribute_p ("nocf_check", name)
+  || is_attribute_p ("warn_unused_result", name))
+return true;
+  return false;
+}
+
+/* Return true if attribute should be dropped if parameter changed.  */
+
+static bool
+drop_type_attribute_if_params_changed_p (tree name)
+{
+  if (is_attribute_p ("fn spec", name)
+  || is_attribute_p ("access", name))
+return true;
+  return false;
+}
+
 /* Build and return a function type just like ORIG_TYPE but with parameter
types given in NEW_PARAM_TYPES - which can be NULL if, but only if,
ORIG_TYPE itself has NULL TREE_ARG_TYPEs.  If METHOD2FUNC is true, also make
@@ -337,16 +363,19 @@ build_adjusted_function_type (tree orig_type, vec 
*new_param_types,
   if (skip_return)
TREE_TYPE (new_type) = void_type_node;
 }
-  /* We only support one fn spec attribute on type.  Be sure to remove it.
- Once we support multiple attributes we will need to be able to unshare
- the list.  */
   if (args_modified && TYPE_ATTRIBUTES (new_type))
 {
-  gcc_checking_assert
- (!TREE_CHAIN (TYPE_ATTRIBUTES (new_type))
-  && (is_attribute_p ("fn spec",
- get_attribute_name (TYPE_ATTRIBUTES (new_type);
+  tree t = TYPE_ATTRIBUTES (new_type);
+  tree *last = _ATTRIBUTES (new_type);
   TYPE_ATTRIBUTES (new_type) = NULL;
+  for (;t; t = TREE_CHAIN (t))
+   if (!drop_type_attribute_if_params_changed_p
+   (get_attribute_name (t)))
+ {
+   *last = copy_node (t);
+   TREE_CHAIN (*last) = NULL;
+   last = _CHAIN (*last);
+ }
 }
 
   return new_type;
diff --git a/gcc/ipa-param-manipulation.h b/gcc/ipa-param-manipulation.h
index 9440cbfc56c..5adf8a22356 100644
--- a/gcc/ipa-param-manipulation.h
+++ b/gcc/ipa-param-manipulation.h
@@ -254,6 +254,7 @@ public:
   /* If true, make the function not return any value.  */
   bool m_skip_return;
 
+  static bool type_attribute_allowed_p (tree);
 private:
   ipa_param_adjustments () {}
 


Re: [PATCH v2] IPA: Provide a mechanism to register static DTORs via cxa_atexit.

2021-11-13 Thread Jan Hubicka via Gcc-patches
> sheesh … EWRONGREVISEDPATCH
> 
> > On 5 Nov 2021, at 13:08, Iain Sandoe  wrote:
> > 
> > I tried enabling this on x86-64-linux (just for interest) and it seems to 
> > work
> > OK there too - but that testing revealed a thinko that didn’t show with a
> > a normal regstrap.
> 
> … now with the correct patch.
> 
> [PATCH v2] IPA: Provide a mechanism to register static DTORs via
>  cxa_atexit.
> 
> For at least one target (Darwin) the platform convention is to
> register static destructors (i.e. __attribute__((destructor)))
> with __cxa_atexit rather than placing them into a list that is
> run by some other mechanism.
> 
> This patch provides a target hook that allows a target to opt
> into this and handling for the process in ipa_cdtor_merge ().
> 
> When the mode is enabled (dtors_from_cxa_atexit is set) we:
> 
>  * Generate new CTORs to register static destructors with
>__cxa_atexit and add them to the existing list of CTORs;
>we then process the revised CTORs list.
> 
>  * We sort the DTORs into priority and then TU order, this
>means that they are registered in that order with
>__cxa_atexit () and therefore will be run in the reverse
>order.
> 
>  * Likewise, CTORs are sorted into priority and then TU order,
>which means that they will run in that order.
> 
> This matches the behavior of using init/fini (or
> mod_init_func/mod_term_func) sections.
> 
> Signed-off-by: Iain Sandoe 
> 
> gcc/ChangeLog:
> 
>   * config/darwin.h (TARGET_DTORS_FROM_CXA_ATEXIT): New.
>   * doc/tm.texi: Regenerated.
>   * doc/tm.texi.in: Add TARGET_DTORS_FROM_CXA_ATEXIT hook.
>   * ipa.c (ipa_discover_variable_flags):
>   (cgraph_build_static_cdtor_1): Return the built function
>   decl.
>   (build_cxa_atexit_decl): New.
>   (build_dso_handle_decl): New.
>   (build_cxa_dtor_registrations): New.
>   (compare_cdtor_tu_order): New.
>   (build_cxa_atexit_fns): New.
>   (ipa_cdtor_merge): If dtors_from_cxa_atexit is set,
>   process the DTORs/CTORs accordingly.
>   (pass_ipa_cdtor_merge::gate): Also run if
>   dtors_from_cxa_atexit is set.
>   * target.def (dtors_from_cxa_atexit): New hook.

OK, thanks!
Honza


Re: Basic kill analysis for modref

2021-11-15 Thread Jan Hubicka via Gcc-patches
> > > +  if (always_executed
> > > +  && callee_summary->kills.length ()
> > > +  && (!cfun->can_throw_non_call_exceptions
> > > + || !stmt_could_throw_p (cfun, stmt)))
> > > +{
> > > +  /* Watch for self recursive updates.  */
> > > +  auto_vec saved_kills;
> > > +
> > > +  saved_kills.reserve_exact (callee_summary->kills.length ());
> > > +  saved_kills.splice (callee_summary->kills);
> > > +  for (auto kill : saved_kills)
> > > +   {
> > > + if (kill.parm_index >= (int)parm_map.length ())
> > > +   continue;
> > > + modref_parm_map 
> > > + = kill.parm_index == MODREF_STATIC_CHAIN_PARM
> > > +   ? chain_map
> >     chain_map  isn't initialized.
> > 
> > This caused:
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103262
> Yup.  It's causing heartburn in various ways in the tester.  I was just
> tracking it down with valgrind...
> jeff

Oops, either me or patch much have mislocated the change within the
function when updating to new tree.  I am testing the following fix and
will cook up a testcase verifying that merging of kills works as
expected.

Thanks!
Honza

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index df4612bbff9..4784f68f585 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -964,38 +980,6 @@ merge_call_side_effects (modref_summary *cur_summary,
   if (flags & (ECF_CONST | ECF_NOVOPS))
 return changed;
 
-  if (always_executed
-  && callee_summary->kills.length ()
-  && (!cfun->can_throw_non_call_exceptions
- || !stmt_could_throw_p (cfun, stmt)))
-{
-  /* Watch for self recursive updates.  */
-  auto_vec saved_kills;
-
-  saved_kills.reserve_exact (callee_summary->kills.length ());
-  saved_kills.splice (callee_summary->kills);
-  for (auto kill : saved_kills)
-   {
- if (kill.parm_index >= (int)parm_map.length ())
-   continue;
- modref_parm_map 
- = kill.parm_index == MODREF_STATIC_CHAIN_PARM
-   ? chain_map
-   : parm_map[kill.parm_index];
- if (m.parm_index == MODREF_LOCAL_MEMORY_PARM
- || m.parm_index == MODREF_UNKNOWN_PARM
- || m.parm_index == MODREF_RETSLOT_PARM
- || !m.parm_offset_known)
-   continue;
- modref_access_node n = kill;
- n.parm_index = m.parm_index;
- n.parm_offset += m.parm_offset;
- if (modref_access_node::insert_kill (cur_summary->kills, n,
-  record_adjustments))
-   changed = true;
-   }
-}
-
   /* We can not safely optimize based on summary of callee if it does
  not always bind to current def: it is possible that memory load
  was optimized out earlier which may not happen in the interposed
@@ -1043,6 +1027,38 @@ merge_call_side_effects (modref_summary *cur_summary,
   if (dump_file)
 fprintf (dump_file, "\n");
 
+  if (always_executed
+  && callee_summary->kills.length ()
+  && (!cfun->can_throw_non_call_exceptions
+ || !stmt_could_throw_p (cfun, stmt)))
+{
+  /* Watch for self recursive updates.  */
+  auto_vec saved_kills;
+
+  saved_kills.reserve_exact (callee_summary->kills.length ());
+  saved_kills.splice (callee_summary->kills);
+  for (auto kill : saved_kills)
+   {
+ if (kill.parm_index >= (int)parm_map.length ())
+   continue;
+ modref_parm_map 
+ = kill.parm_index == MODREF_STATIC_CHAIN_PARM
+   ? chain_map
+   : parm_map[kill.parm_index];
+ if (m.parm_index == MODREF_LOCAL_MEMORY_PARM
+ || m.parm_index == MODREF_UNKNOWN_PARM
+ || m.parm_index == MODREF_RETSLOT_PARM
+ || !m.parm_offset_known)
+   continue;
+ modref_access_node n = kill;
+ n.parm_index = m.parm_index;
+ n.parm_offset += m.parm_offset;
+ if (modref_access_node::insert_kill (cur_summary->kills, n,
+  record_adjustments))
+   changed = true;
+   }
+}
+
   /* Merge with callee's summary.  */
   changed |= cur_summary->loads->merge (callee_summary->loads, _map,
_map, record_adjustments);


Re: Basic kill analysis for modref

2021-11-16 Thread Jan Hubicka via Gcc-patches
>    chain_map  isn't initialized.
> 
> This caused:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103262
> 

Hi,
this is patch I comitted that moves the misplaced hunk.

gcc/ChangeLog:

PR ipa/103262
* ipa-modref.c (merge_call_side_effects): Fix uninitialized
access.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/modref-dse-5.c: New test.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index df4612bbff9..4784f68f585 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -964,38 +980,6 @@ merge_call_side_effects (modref_summary *cur_summary,
   if (flags & (ECF_CONST | ECF_NOVOPS))
 return changed;
 
-  if (always_executed
-  && callee_summary->kills.length ()
-  && (!cfun->can_throw_non_call_exceptions
- || !stmt_could_throw_p (cfun, stmt)))
-{
-  /* Watch for self recursive updates.  */
-  auto_vec saved_kills;
-
-  saved_kills.reserve_exact (callee_summary->kills.length ());
-  saved_kills.splice (callee_summary->kills);
-  for (auto kill : saved_kills)
-   {
- if (kill.parm_index >= (int)parm_map.length ())
-   continue;
- modref_parm_map 
- = kill.parm_index == MODREF_STATIC_CHAIN_PARM
-   ? chain_map
-   : parm_map[kill.parm_index];
- if (m.parm_index == MODREF_LOCAL_MEMORY_PARM
- || m.parm_index == MODREF_UNKNOWN_PARM
- || m.parm_index == MODREF_RETSLOT_PARM
- || !m.parm_offset_known)
-   continue;
- modref_access_node n = kill;
- n.parm_index = m.parm_index;
- n.parm_offset += m.parm_offset;
- if (modref_access_node::insert_kill (cur_summary->kills, n,
-  record_adjustments))
-   changed = true;
-   }
-}
-
   /* We can not safely optimize based on summary of callee if it does
  not always bind to current def: it is possible that memory load
  was optimized out earlier which may not happen in the interposed
@@ -1043,6 +1027,38 @@ merge_call_side_effects (modref_summary *cur_summary,
   if (dump_file)
 fprintf (dump_file, "\n");
 
+  if (always_executed
+  && callee_summary->kills.length ()
+  && (!cfun->can_throw_non_call_exceptions
+ || !stmt_could_throw_p (cfun, stmt)))
+{
+  /* Watch for self recursive updates.  */
+  auto_vec saved_kills;
+
+  saved_kills.reserve_exact (callee_summary->kills.length ());
+  saved_kills.splice (callee_summary->kills);
+  for (auto kill : saved_kills)
+   {
+ if (kill.parm_index >= (int)parm_map.length ())
+   continue;
+ modref_parm_map 
+ = kill.parm_index == MODREF_STATIC_CHAIN_PARM
+   ? chain_map
+   : parm_map[kill.parm_index];
+ if (m.parm_index == MODREF_LOCAL_MEMORY_PARM
+ || m.parm_index == MODREF_UNKNOWN_PARM
+ || m.parm_index == MODREF_RETSLOT_PARM
+ || !m.parm_offset_known)
+   continue;
+ modref_access_node n = kill;
+ n.parm_index = m.parm_index;
+ n.parm_offset += m.parm_offset;
+ if (modref_access_node::insert_kill (cur_summary->kills, n,
+  record_adjustments))
+   changed = true;
+   }
+}
+
   /* Merge with callee's summary.  */
   changed |= cur_summary->loads->merge (callee_summary->loads, _map,
_map, record_adjustments);
+/* { dg-final { scan-tree-dump "Deleted dead store: kill_me" "dse2" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-5.c
new file mode 100644
index 000..ad35b70136f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/modref-dse-5.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse2-details"  } */
+struct a {int a,b,c;};
+__attribute__ ((noinline))
+void
+kill_me (struct a *a)
+{
+  a->a=0;
+  a->b=0;
+  a->c=0;
+}
+__attribute__ ((noinline))
+int
+wrap(int b, struct a *a)
+{
+   kill_me (a);
+   return b;
+}
+__attribute__ ((noinline))
+void
+my_pleasure (struct a *a)
+{
+  a->a=1;
+  a->c=2;
+}
+__attribute__ ((noinline))
+int
+wrap2(int b, struct a *a)
+{
+   my_pleasure (a);
+   return b;
+}
+
+int
+set (struct a *a)
+{
+  wrap (0, a);
+  int ret = wrap2 (0, a);
+  //int ret = my_pleasure (a);
+  a->b=1;
+  return ret;
+}
+/* { dg-final { scan-tree-dump "Deleted dead store: wrap" "dse2" } } */


Re: [Bug ipa/103267] Wrong code with ipa-sra

2021-11-16 Thread Jan Hubicka via Gcc-bugs
Works for me even with the 3 warnings.

hubicka@lomikamen:/aux/hubicka/trunk/build-lto2/gcc$ cat >tt.c
__attribute__ ((noinline,const))
infinite (int p)
{
  if (p)
while (1);
  return p;
}
__attribute__ ((noinline))
static void
test(int p, int *a)
{
  int v = infinite (p);
  if (*a && v)
__builtin_abort ();
}
test2(int *a)
{
  test(0,a);
}
main()
{
  test (1,0);
}
hubicka@lomikamen:/aux/hubicka/trunk/build-lto2/gcc$ ./xgcc --version
xgcc (GCC) 12.0.0 2024 (experimental)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

hubicka@lomikamen:/aux/hubicka/trunk/build-lto2/gcc$ ./xgcc -B ./ -O2 tt.c
tt.c:2:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
2 | infinite (int p)
  | ^~~~
tt.c:16:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
   16 | test2(int *a)
  | ^
tt.c:20:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
   20 | main()
  | ^~~~
hubicka@lomikamen:/aux/hubicka/trunk/build-lto2/gcc$ ./a.out
Segmentation fault



Re: [Bug ipa/103267] Wrong code with ipa-sra

2021-11-16 Thread Jan Hubicka via Gcc-bugs
Aha, but here is better example (reproduces same way).
In the former one I forgot const attribute which makes it invalid.
The testcase tests that ipa-sra is missing ECF_LOOPING_CONST_OR_PURE
check

static int
__attribute__ ((noinline))
infinite (int p)
{
  if (p)
while (1);
  return p;
}
__attribute__ ((noinline))
static void
test(int p, int *a)
{
  int v = infinite (p);
  if (*a && v)
__builtin_abort ();
}
test2(int *a)
{
  test(0,a);
}
main()
{
  test (1,0);
}


Re: [Bug ipa/103267] Wrong code with ipa-sra

2021-11-16 Thread Jan Hubicka via Gcc-bugs
> @@ -1,4 +1,3 @@
> -static int
>  __attribute__ ((noinline,const))
>  infinite (int p)
>  {
Just for a record, it crahes with or without static int here for me :)

I run across it because the code tracking must access in ipa-sra is IMO
conceptually wrong.  I noticed that because ipa-modref solves similar
problem for kills (both need to verify that given access will always
happen).  The post-dominance check is not enough to verify that because
earlier function calls can do things like EH.  I failed to construct an
actual testcase because on interesting stuff like EH we punt for other
reasons (missed fnspec annotations on EH builtins).  I will play with it
more today.


Re: Cleanup hadnling of modref access_nodes in tree-ssa-alias and tree-ssa-dse

2021-11-14 Thread Jan Hubicka via Gcc-patches
Hi,
this is variant I comitted.  Commonizing the code exposed that I can
drop memory walking when parameter passed is NULL (under assumption of
flag_delete_null_pointer_checks) since it can not point to useful
memory.  This was already done in tree-ssa-alias, but not in
tree-ssa-dse.  This needed bit of testsuite compensation for cases where
we optimize out invalid memory accesses.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

2021-11-14  Jan Hubicka  

* ipa-modref-tree.c (modref_access_node::get_call_arg): New member
function.
(modref_access_node::get_ao_ref): Likewise.
* ipa-modref-tree.h (modref_access_node::get_call_arg): Declare.
(modref_access_node::get_ao_ref): Declare.
* tree-ssa-alias.c (modref_may_conflict): Use new accessors.
* tree-ssa-dse.c (dse_optimize_call): Use new accessors.

gcc/testsuite/ChangeLog:

2021-11-14  Jan Hubicka  

* c-c++-common/asan/null-deref-1.c: Update template.
* c-c++-common/tsan/free_race.c: Update template.
* c-c++-common/tsan/free_race2.c: Update template.
* gcc.dg/ipa/ipa-sra-4.c: Update template.

diff --git a/gcc/ipa-modref-tree.c b/gcc/ipa-modref-tree.c
index 70ec71c3808..6fc2b7298f4 100644
--- a/gcc/ipa-modref-tree.c
+++ b/gcc/ipa-modref-tree.c
@@ -25,6 +25,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "ipa-modref-tree.h"
 #include "selftest.h"
+#include "tree-ssa-alias.h"
+#include "gimple.h"
 
 /* Return true if both accesses are the same.  */
 bool
@@ -603,6 +605,39 @@ modref_access_node::dump (FILE *out)
   fprintf (out, "\n");
 }
 
+/* Return tree corresponding to parameter of the range in STMT.  */
+tree
+modref_access_node::get_call_arg (const gcall *stmt) const
+{
+  if (parm_index == MODREF_UNKNOWN_PARM)
+return NULL;
+  if (parm_index == MODREF_STATIC_CHAIN_PARM)
+return gimple_call_chain (stmt);
+  /* MODREF_RETSLOT_PARM should not happen in access trees since the store
+ is seen explicitly in the caller.  */
+  gcc_checking_assert (parm_index >= 0);
+  if (parm_index >= (int)gimple_call_num_args (stmt))
+return NULL;
+  return gimple_call_arg (stmt, parm_index);
+}
+
+/* Return tree corresponding to parameter of the range in STMT.  */
+bool
+modref_access_node::get_ao_ref (const gcall *stmt, ao_ref *ref) const
+{
+  tree arg;
+
+  if (!parm_offset_known || !(arg = get_call_arg (stmt)))
+return false;
+  poly_offset_int off = (poly_offset_int)offset
+   + ((poly_offset_int)parm_offset << LOG2_BITS_PER_UNIT);
+  poly_int64 off2;
+  if (!off.to_shwi ())
+return false;
+  ao_ref_init_from_ptr_and_range (ref, arg, true, off2, size, max_size);
+  return true;
+}
+
 #if CHECKING_P
 
 namespace selftest {
diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index 1fafd59debe..2fcabe480bd 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -77,7 +77,7 @@ struct GTY(()) modref_access_node
  This has to be limited in order to keep dataflow finite.  */
   unsigned char adjustments;
 
-  /* Return true if access node holds no useful info.  */
+  /* Return true if access node holds some useful info.  */
   bool useful_p () const
 {
   return parm_index != MODREF_UNKNOWN_PARM;
@@ -88,10 +88,13 @@ struct GTY(()) modref_access_node
   bool operator == (modref_access_node ) const;
   /* Return true if range info is useful.  */
   bool range_info_useful_p () const;
+  /* Return tree corresponding to parameter of the range in STMT.  */
+  tree get_call_arg (const gcall *stmt) const;
+  /* Build ao_ref corresponding to the access and return true if succesful.  */
+  bool get_ao_ref (const gcall *stmt, class ao_ref *ref) const;
   /* Insert A into vector ACCESSES.  Limit size of vector to MAX_ACCESSES and
  if RECORD_ADJUSTMENT is true keep track of adjustment counts.
- Return 0 if nothing changed, 1 is insertion suceeded and -1 if
- failed.  */
+ Return 0 if nothing changed, 1 is insertion suceeded and -1 if failed.  */
   static int insert (vec  *,
 modref_access_node a, size_t max_accesses,
 bool record_adjustments);
diff --git a/gcc/testsuite/c-c++-common/asan/null-deref-1.c 
b/gcc/testsuite/c-c++-common/asan/null-deref-1.c
index bae016d6419..c967b29b9e2 100644
--- a/gcc/testsuite/c-c++-common/asan/null-deref-1.c
+++ b/gcc/testsuite/c-c++-common/asan/null-deref-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-fno-omit-frame-pointer -fno-shrink-wrap" } */
+/* { dg-options "-fno-omit-frame-pointer -fno-shrink-wrap -fno-ipa-modref" } */
 /* { dg-additional-options "-mno-omit-leaf-frame-pointer" { target { i?86-*-* 
x86_64-*-* } } } */
 /* { dg-shouldfail "asan" } */
 
diff --git a/gcc/testsuite/c-c++-common/tsan/free_race.c 
b/gcc/testsuite/c-c++-common/tsan/free_race.c
index 258f7b7420d..831c23e8859 100644
--- a/gcc/testsuite/c-c++-common/tsan/free_race.c
+++ 

Re: [Bug ipa/103230] New: ipa-modref-tree.h:550:33: runtime error: load of value 255, which is not a valid value for type 'bool'

2021-11-14 Thread Jan Hubicka via Gcc-bugs
> Happens with UBSAN compiler for:
> 
> $ gcc gcc/testsuite/gcc.c-torture/execute/pr71494.c -O1  -flto
> ...
> /home/marxin/Programming/gcc/gcc/ipa-modref-tree.h:550:33: runtime error: load
> of value 255, which is not a valid value for type 'bool'
> #0 0x18acc38 in modref_tree::merge(modref_tree*,
> vec*, modref_parm_map*, bool)
> /home/marxin/Programming/gcc/gcc/ipa-modref-tree.h:550
> #1 0x188452c in modref_propagate_in_scc

At 4385 I have:
   changed |= cur_summary_lto->stores->merge
(callee_summary_lto->stores, _map, _map, !first);
 

parm-map is the vector, however there is no read of it.
There is bool which is relevant only when parm_index is not unknown, so
I suspect it may a full copy with uninitialized bool which would be
harmless. We had similar issues with asan before.

How do you build ubsan compiler?
Honza


Re: [Bug ipa/103230] ipa-modref-tree.h:550:33: runtime error: load of value 255, which is not a valid value for type 'bool'

2021-11-14 Thread Jan Hubicka via Gcc-bugs
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103230
> 
> --- Comment #2 from Martin Liška  ---
> > How do you build ubsan compiler?
> 
> F="-O0 -g -fsanitize=undefined" ; make -j16 all-host -k CFLAGS="$F"
> CXXFLAGS="$F"  LDFLAGS="$F"
> 
> is the fastest approach.
Thanks, it is similar to what I tried.  I guess there should be no ";"
but yet it leds to misconfigured libiberty for me on kunlun.  I will
look into that.


Re: [r12-5236 Regression] FAIL: gcc.dg/tree-prof/merge_block.c scan-tree-dump-not optimized "Invalid sum" on Linux/x86_64

2021-11-14 Thread Jan Hubicka via Gcc-patches
> On Linux/x86_64,
> 
> 5aa91072e24c1e16a5ec641b48b64c9c9f199f13 is the first bad commit
> commit 5aa91072e24c1e16a5ec641b48b64c9c9f199f13
> Author: Jan Hubicka 
> Date:   Sat Nov 13 22:25:23 2021 +0100
> 
> Implement DSE of dead functions calls storing memory.
> 
> caused
> 
> FAIL: c-c++-common/tsan/free_race2.c   -O2  execution test
> FAIL: c-c++-common/tsan/free_race.c   -O2  execution test
> FAIL: gcc.dg/ipa/ipa-sra-4.c scan-ipa-dump-times sra "Will split parameter" 2
> FAIL: gcc.dg/tree-prof/merge_block.c scan-tree-dump-not optimized "Invalid 
> sum"
> 
> with GCC configured with
> 
> ../../gcc/configure 
> --prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-5236/usr
>  --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tsan.exp=c-c++-common/tsan/free_race2.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tsan.exp=c-c++-common/tsan/free_race2.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tsan.exp=c-c++-common/tsan/free_race.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tsan.exp=c-c++-common/tsan/free_race.c 
> --target_board='unix{-m64\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/ipa-sra-4.c --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/ipa-sra-4.c --target_board='unix{-m32\ 
> -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/ipa-sra-4.c --target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="ipa.exp=gcc.dg/ipa/ipa-sra-4.c --target_board='unix{-m64\ 
> -march=cascadelake}'"

In these two cases we do DSE and the testcase is no longer getting what
it wants (invalid store or SRA transform).  I had patch adding
-fno-ipa-modref and will commit it.

> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tree-prof.exp=gcc.dg/tree-prof/merge_block.c 
> --target_board='unix{-m32}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tree-prof.exp=gcc.dg/tree-prof/merge_block.c 
> --target_board='unix{-m32\ -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tree-prof.exp=gcc.dg/tree-prof/merge_block.c 
> --target_board='unix{-m64}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="tree-prof.exp=gcc.dg/tree-prof/merge_block.c 
> --target_board='unix{-m64\ -march=cascadelake}'"

This seems real bug in complete loop unrolling.  Not sure what broke
here, but cunrolli now does not get frequencies right.

Honza


Re: [PATCH] tsan: remove not needed -ldl in options

2021-11-14 Thread Jan Hubicka via Gcc-patches
> Tested and pushed to master as obvious.
> 
> Martin
> 
> gcc/testsuite/ChangeLog:
> 
>   * c-c++-common/tsan/free_race.c: Remove unnecessary -ldl.
>   * c-c++-common/tsan/free_race2.c: Likewise.

Thank you, I cut it from the other testcase and forgot to remove
it.  Patch is OK.

Honza


Cleanup hadnling of modref access_nodes in tree-ssa-alias and tree-ssa-dse

2021-11-14 Thread Jan Hubicka via Gcc-patches
Hi,
this patch implements the cleanup suggested by Richard to move code
getting tree op from access_node and stmt to a common place.  I also commonized
logic to build ao_ref. While I was on it I also replaced FOR_EACH_* by range
for since they reads better.

Bootstrapped/regtesed x86_64-linux, will commit it shortly.
Honza

gcc/ChangeLog:

2021-11-14  Jan Hubicka  

* ipa-modref-tree.c (modref_access_node::get_call_arg): New member
function.
(modref_access_node::get_ao_ref): Likewise.
* ipa-modref-tree.h (modref_access_node::get_call_arg): Declare.
(modref_access_node::get_ao_ref): Declare.
* tree-ssa-alias.c (modref_may_conflict): Use new accessors.
* tree-ssa-dse.c (dse_optimize_call): Use new accessors.

diff --git a/gcc/ipa-modref-tree.c b/gcc/ipa-modref-tree.c
index 70ec71c3808..6fc2b7298f4 100644
--- a/gcc/ipa-modref-tree.c
+++ b/gcc/ipa-modref-tree.c
@@ -25,6 +25,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "ipa-modref-tree.h"
 #include "selftest.h"
+#include "tree-ssa-alias.h"
+#include "gimple.h"
 
 /* Return true if both accesses are the same.  */
 bool
@@ -603,6 +605,39 @@ modref_access_node::dump (FILE *out)
   fprintf (out, "\n");
 }
 
+/* Return tree corresponding to parameter of the range in STMT.  */
+tree
+modref_access_node::get_call_arg (const gcall *stmt) const
+{
+  if (parm_index == MODREF_UNKNOWN_PARM)
+return NULL;
+  if (parm_index == MODREF_STATIC_CHAIN_PARM)
+return gimple_call_chain (stmt);
+  /* MODREF_RETSLOT_PARM should not happen in access trees since the store
+ is seen explicitly in the caller.  */
+  gcc_checking_assert (parm_index >= 0);
+  if (parm_index >= (int)gimple_call_num_args (stmt))
+return NULL;
+  return gimple_call_arg (stmt, parm_index);
+}
+
+/* Return tree corresponding to parameter of the range in STMT.  */
+bool
+modref_access_node::get_ao_ref (const gcall *stmt, ao_ref *ref) const
+{
+  tree arg;
+
+  if (!parm_offset_known || !(arg = get_call_arg (stmt)))
+return false;
+  poly_offset_int off = (poly_offset_int)offset
+   + ((poly_offset_int)parm_offset << LOG2_BITS_PER_UNIT);
+  poly_int64 off2;
+  if (!off.to_shwi ())
+return false;
+  ao_ref_init_from_ptr_and_range (ref, arg, true, off2, size, max_size);
+  return true;
+}
+
 #if CHECKING_P
 
 namespace selftest {
diff --git a/gcc/ipa-modref-tree.h b/gcc/ipa-modref-tree.h
index 1fafd59debe..2fcabe480bd 100644
--- a/gcc/ipa-modref-tree.h
+++ b/gcc/ipa-modref-tree.h
@@ -77,7 +77,7 @@ struct GTY(()) modref_access_node
  This has to be limited in order to keep dataflow finite.  */
   unsigned char adjustments;
 
-  /* Return true if access node holds no useful info.  */
+  /* Return true if access node holds some useful info.  */
   bool useful_p () const
 {
   return parm_index != MODREF_UNKNOWN_PARM;
@@ -88,10 +88,13 @@ struct GTY(()) modref_access_node
   bool operator == (modref_access_node ) const;
   /* Return true if range info is useful.  */
   bool range_info_useful_p () const;
+  /* Return tree corresponding to parameter of the range in STMT.  */
+  tree get_call_arg (const gcall *stmt) const;
+  /* Build ao_ref corresponding to the access and return true if succesful.  */
+  bool get_ao_ref (const gcall *stmt, class ao_ref *ref) const;
   /* Insert A into vector ACCESSES.  Limit size of vector to MAX_ACCESSES and
  if RECORD_ADJUSTMENT is true keep track of adjustment counts.
- Return 0 if nothing changed, 1 is insertion suceeded and -1 if
- failed.  */
+ Return 0 if nothing changed, 1 is insertion suceeded and -1 if failed.  */
   static int insert (vec  *,
 modref_access_node a, size_t max_accesses,
 bool record_adjustments);
diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index 2965902912f..ba055730558 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -2535,13 +2535,10 @@ refs_output_dependent_p (tree store1, tree store2)
IF TBAA_P is true, use TBAA oracle.  */
 
 static bool
-modref_may_conflict (const gimple *stmt,
+modref_may_conflict (const gcall *stmt,
 modref_tree  *tt, ao_ref *ref, bool tbaa_p)
 {
   alias_set_type base_set, ref_set;
-  modref_base_node  *base_node;
-  modref_ref_node  *ref_node;
-  size_t i, j, k;
 
   if (tt->every_base)
 return true;
@@ -2554,7 +2551,7 @@ modref_may_conflict (const gimple *stmt,
   ref_set = ao_ref_alias_set (ref);
 
   int num_tests = 0, max_tests = param_modref_max_tests;
-  FOR_EACH_VEC_SAFE_ELT (tt->bases, i, base_node)
+  for (auto base_node : tt->bases)
 {
   if (tbaa_p && flag_strict_aliasing)
{
@@ -2569,7 +2566,7 @@ modref_may_conflict (const gimple *stmt,
   if (base_node->every_ref)
return true;
 
-  FOR_EACH_VEC_SAFE_ELT (base_node->refs, j, ref_node)
+  for (auto ref_node : base_node->refs)
{
  /* Do not repeat same test as before.  */
 

Re: [Bug tree-optimization/103231] New: ICE (nondeterministic) on valid code at -O1 on x86_64-linux-gnu: Segmentation fault

2021-11-14 Thread Jan Hubicka via Gcc-bugs
> [659] % 
> [659] % gcctk -O0 -w small.c
> [660] % 
> [660] % gcctk -O1 -w small.c
> [661] % gcctk -O1 -w small.c
> [662] % gcctk -O1 -w small.c
> gcctk: internal compiler error: Segmentation fault signal terminated program
> cc1
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See  for instructions.
Backtrace here would be useful.  It is bit strange that you did not get
it from error message.  One can use -S -wrapper gdb,--args to make the
cc1 executed within gdb.
> [663] %


Re: Use modref kills in tree-ssa-dse

2021-11-16 Thread Jan Hubicka via Gcc-patches
> 
> Not sure, tree-ssa-dse.c doesn't seem to handle MEM_REF with offset?
> 
> VN has adjust_offsets_for_equal_base_address for this purpose.  I
> agree that some common functionality like
> 
> bool
> get_relative_extent_of (const ao_ref *base, const ao_ref *ref,
> poly_int64 *offset);
> 
> that computes [offset, offset + ref->[max_]size] of REF adjusted as to
> make ao_ref_base have the same address (or return false if not
> possible).  Then [ base->offset, base->offset + base->max_size ]
> can be compared against that.

OK, I will look into that.
> > +  if (valid_ao_ref_for_dse (write)
> > +  && operand_equal_p (write->base, ref->base, OEP_ADDRESS_OF)
> > +  && known_eq (write->size, write->max_size)
> > +  && normalize_ref (write, ref)
> 
> normalize_ref alters 'write', I think we should work on a local
> copy here.  See live_bytes_read which takes a copy of 'use_ref'.

We never proces same write twice (get_ao_ref is always constructing
fresh copy), so this should be safe.  Or shall I turn the write
parameter to "ao_ref write" instead of "ao_ref *write" just to be sure
we do not break infuture?

Thank you,
Honza


  1   2   3   4   5   >