Re: GSOC 2018 - Textual LTO dump tool project

2018-03-16 Thread Hrishikesh Kulkarni
Hi,

Thanks Martin and Richard. I have uploaded the final proposal on GSOC
website. Meantime I will study makefiles and GNU Make in greater detail.

Regards,
Hrishikesh


On Thu, Mar 15, 2018 at 4:09 PM, Martin Liška  wrote:

> On 03/15/2018 09:45 AM, Richard Biener wrote:
> > Yes, I think it's fine to submit it - but let's ask the Mentor for the
> > project, Martin, for his
> > opinion first.
> >
> > Martin?
> >
> > Thanks,
> > Richard.
>
> Hi.
>
> Sorry for the delay and thanks to Richi who's been co-mentoring the
> project.
> I've just read the submission draft and it looks nice! Please submit it.
>
> Martin
>


Re: GSOC 2018 - Textual LTO dump tool project

2018-03-15 Thread Martin Liška
On 03/15/2018 09:45 AM, Richard Biener wrote:
> Yes, I think it's fine to submit it - but let's ask the Mentor for the
> project, Martin, for his
> opinion first.
> 
> Martin?
> 
> Thanks,
> Richard.

Hi.

Sorry for the delay and thanks to Richi who's been co-mentoring the project.
I've just read the submission draft and it looks nice! Please submit it.

Martin


Re: GSOC 2018 - Textual LTO dump tool project

2018-03-15 Thread Richard Biener
On Wed, Mar 14, 2018 at 8:13 PM, Hrishikesh Kulkarni
 wrote:
> Hi,
>
> Thanks a lot for inputs and guidance. I have incorporated all the changes in
> the document. Shall I go ahead and submit the proposal formally on GSOC
> website?

Yes, I think it's fine to submit it - but let's ask the Mentor for the
project, Martin, for his
opinion first.

Martin?

Thanks,
Richard.

> Drive link to Proposal:
> https://docs.google.com/document/d/1-jYwwDWsHQwMaxVsHFBrJ9EiCAev7ljTDLS1xMwvK5w/edit
>
> Regards,
>
> Hrishikesh
>
> On Wed, Mar 14, 2018 at 8:28 PM, Richard Biener 
> wrote:
>>
>> On Tue, Mar 13, 2018 at 5:30 AM, Hrishikesh Kulkarni
>>  wrote:
>> > Hi,
>> >
>> > Thanks. I have tried to incorporate suggestions and prepared a revised
>> > draft
>> > of proposal for GSOC. Please find the same attached herewith. Your
>> > suggestions in regard with this draft would definitely help me to
>> > improve it
>> > further for submission.
>>
>> Thanks, it looks very good now.  You have essentially duplicated items
>> in 1. and 2., namely --summary= and Dumping of IPA summaries.
>> I would move some of the 1. items over to 2., apart from --summary I'd
>> also
>> move --cgraph-dot.
>>
>> Richard.
>>
>> >
>> > Drive link to Draft Proposal:
>> >
>> >
>> > https://docs.google.com/document/d/1-jYwwDWsHQwMaxVsHFBrJ9EiCAev7ljTDLS1xMwvK5w/edit
>> >
>> >
>> > Regards,
>> >
>> > Hrishikesh
>> >
>> >
>> >
>> >
>> > On Mon, Mar 12, 2018 at 4:45 PM, Richard Biener
>> > 
>> > wrote:
>> >>
>> >> On Sun, Mar 11, 2018 at 8:23 PM, Hrishikesh Kulkarni
>> >>  wrote:
>> >> > Hi,
>> >> >
>> >> > Greetings! Please find my draft proposal for GSOC attached herewith.
>> >> > I
>> >> > am
>> >> > very grateful to all of you for your inputs, suggestions and
>> >> > directions.
>> >> > I
>> >> > have tried to assimilate these inputs received from you to convert it
>> >> > into a
>> >> > proposal. Your suggestions in regard with this draft would definitely
>> >> > help
>> >> > me to convert it into final proposal for submission. In addition,
>> >> > could
>> >> > you
>> >> > please suggest the possible extensions those can be added to dump
>> >> > tool?
>> >> >
>> >> >
>> >> > Drive link to Draft Proposal:
>> >> >
>> >> >
>> >> >
>> >> > https://docs.google.com/document/d/1-jYwwDWsHQwMaxVsHFBrJ9EiCAev7ljTDLS1xMwvK5w/edit
>> >>
>> >> The proposal looks a bit sparse when giving details about the actual
>> >> project.
>> >> I'd suggest to clarify at least the deliverables.  I suggest for 1. add
>> >> a
>> >> 1 c)
>> >> that specifies what should be working, for example
>> >>
>> >>  lto-dump -l
>> >>
>> >> should dump a list of variables and functions contained in the IL
>> >>
>> >>  lto-sump -s 
>> >>
>> >> should dump detailed info about the symbol  (using the symtab dump
>> >> infrastructure)
>> >>
>> >>  lto-dump -f 
>> >>
>> >> should dump the function body of the function with  (using the
>> >> gimple
>> >> dump infrastructure)
>> >>
>> >> the lto-dump tool should be verified to work on both WPA-time and
>> >> LTRANS-time
>> >> objects.
>> >>
>> >> Thus your 2) a) should be possible with 1) already.  2) would then
>> >> contain
>> >> dumping of IPA summaries as major part apart from visualizing the
>> >> callgraph.
>> >> For visualizing the (full) callgraph you need to handle multiple LTO
>> >> IL input files.
>> >> Those two pieces should be enough for 2) unless usability issues spill
>> >> over
>> >> from 1).
>> >>
>> >> In the introduction I miss some general words about the LTO IL, like
>> >> that
>> >> it
>> >> is non-self-describing bytecode which is documented only by the code
>> >> reading/writing it and thus hard to debug.  It also misses to lay out
>> >> the
>> >> overall structure of a LTO IL file (you are already going into detail
>> >> with
>> >> IPA passes so this missing stands out).
>> >>
>> >> Richard.
>> >>
>> >> >
>> >> > Regards,
>> >> >
>> >> > Hrishikesh
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Mar 6, 2018 at 8:59 PM, Jan Hubicka  wrote:
>> >> >>
>> >> >> > On Tue, Mar 6, 2018 at 4:02 PM, Jan Hubicka 
>> >> >> > wrote:
>> >> >> > >> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
>> >> >> > >>  wrote:
>> >> >> > >> > Hi,
>> >> >> > >> >
>> >> >> > >> > Thank you Richard and Honza for the suggestions. If I
>> >> >> > >> > understand
>> >> >> > >> > correctly,
>> >> >> > >> > the issue is that LTO file format keeps changing per compiler
>> >> >> > >> > versions, so
>> >> >> > >> > we need a more “stable” representation and the first step for
>> >> >> > >> > that
>> >> >> > >> > would be
>> >> >> > >> > to “stabilize” representations for lto-cgraph and symbol
>> >> >> > >> > table ?
>> >> >> > >>
>> >> >> > >> Yes.  Note the issue is that the current format is a 1:1
>> >> >> > >> representation of
>> >> >> > >> the internal 

Re: GSOC 2018 - Textual LTO dump tool project

2018-03-14 Thread Hrishikesh Kulkarni
Hi,

Thanks a lot for inputs and guidance. I have incorporated all the changes
in the document. Shall I go ahead and submit the proposal formally on GSOC
website?

Drive link to Proposal:
https://docs.google.com/document/d/1-jYwwDWsHQwMaxVsHFBrJ9EiCAev7lj
TDLS1xMwvK5w/edit

Regards,

Hrishikesh

On Wed, Mar 14, 2018 at 8:28 PM, Richard Biener 
wrote:

> On Tue, Mar 13, 2018 at 5:30 AM, Hrishikesh Kulkarni
>  wrote:
> > Hi,
> >
> > Thanks. I have tried to incorporate suggestions and prepared a revised
> draft
> > of proposal for GSOC. Please find the same attached herewith. Your
> > suggestions in regard with this draft would definitely help me to
> improve it
> > further for submission.
>
> Thanks, it looks very good now.  You have essentially duplicated items
> in 1. and 2., namely --summary= and Dumping of IPA summaries.
> I would move some of the 1. items over to 2., apart from --summary I'd also
> move --cgraph-dot.
>
> Richard.
>
> >
> > Drive link to Draft Proposal:
> >
> > https://docs.google.com/document/d/1-jYwwDWsHQwMaxVsHFBrJ9EiCAev7lj
> TDLS1xMwvK5w/edit
> >
> >
> > Regards,
> >
> > Hrishikesh
> >
> >
> >
> >
> > On Mon, Mar 12, 2018 at 4:45 PM, Richard Biener <
> richard.guent...@gmail.com>
> > wrote:
> >>
> >> On Sun, Mar 11, 2018 at 8:23 PM, Hrishikesh Kulkarni
> >>  wrote:
> >> > Hi,
> >> >
> >> > Greetings! Please find my draft proposal for GSOC attached herewith. I
> >> > am
> >> > very grateful to all of you for your inputs, suggestions and
> directions.
> >> > I
> >> > have tried to assimilate these inputs received from you to convert it
> >> > into a
> >> > proposal. Your suggestions in regard with this draft would definitely
> >> > help
> >> > me to convert it into final proposal for submission. In addition,
> could
> >> > you
> >> > please suggest the possible extensions those can be added to dump
> tool?
> >> >
> >> >
> >> > Drive link to Draft Proposal:
> >> >
> >> >
> >> > https://docs.google.com/document/d/1-jYwwDWsHQwMaxVsHFBrJ9EiCAev7lj
> TDLS1xMwvK5w/edit
> >>
> >> The proposal looks a bit sparse when giving details about the actual
> >> project.
> >> I'd suggest to clarify at least the deliverables.  I suggest for 1. add
> a
> >> 1 c)
> >> that specifies what should be working, for example
> >>
> >>  lto-dump -l
> >>
> >> should dump a list of variables and functions contained in the IL
> >>
> >>  lto-sump -s 
> >>
> >> should dump detailed info about the symbol  (using the symtab dump
> >> infrastructure)
> >>
> >>  lto-dump -f 
> >>
> >> should dump the function body of the function with  (using the
> gimple
> >> dump infrastructure)
> >>
> >> the lto-dump tool should be verified to work on both WPA-time and
> >> LTRANS-time
> >> objects.
> >>
> >> Thus your 2) a) should be possible with 1) already.  2) would then
> contain
> >> dumping of IPA summaries as major part apart from visualizing the
> >> callgraph.
> >> For visualizing the (full) callgraph you need to handle multiple LTO
> >> IL input files.
> >> Those two pieces should be enough for 2) unless usability issues spill
> >> over
> >> from 1).
> >>
> >> In the introduction I miss some general words about the LTO IL, like
> that
> >> it
> >> is non-self-describing bytecode which is documented only by the code
> >> reading/writing it and thus hard to debug.  It also misses to lay out
> the
> >> overall structure of a LTO IL file (you are already going into detail
> with
> >> IPA passes so this missing stands out).
> >>
> >> Richard.
> >>
> >> >
> >> > Regards,
> >> >
> >> > Hrishikesh
> >> >
> >> >
> >> >
> >> > On Tue, Mar 6, 2018 at 8:59 PM, Jan Hubicka  wrote:
> >> >>
> >> >> > On Tue, Mar 6, 2018 at 4:02 PM, Jan Hubicka 
> wrote:
> >> >> > >> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
> >> >> > >>  wrote:
> >> >> > >> > Hi,
> >> >> > >> >
> >> >> > >> > Thank you Richard and Honza for the suggestions. If I
> understand
> >> >> > >> > correctly,
> >> >> > >> > the issue is that LTO file format keeps changing per compiler
> >> >> > >> > versions, so
> >> >> > >> > we need a more “stable” representation and the first step for
> >> >> > >> > that
> >> >> > >> > would be
> >> >> > >> > to “stabilize” representations for lto-cgraph and symbol
> table ?
> >> >> > >>
> >> >> > >> Yes.  Note the issue is that the current format is a 1:1
> >> >> > >> representation of
> >> >> > >> the internal representation -- which means it is the internal
> >> >> > >> representation
> >> >> > >> that changes frequently across releases.  I'm not sure how Honza
> >> >> > >> wants
> >> >> > >> to deal with those changes in the context of a "stable" IL
> format.
> >> >> > >> Given
> >> >> > >> we haven't been able to provide a stable API to plugins I think
> >> >> > >> it's
> >> >> > >> much
> >> >> > >> harder to provide a stable streaming format for all the IL
> >> >> > >> details
> >> >> 

Re: GSOC 2018 - Textual LTO dump tool project

2018-03-14 Thread Richard Biener
On Tue, Mar 13, 2018 at 5:30 AM, Hrishikesh Kulkarni
 wrote:
> Hi,
>
> Thanks. I have tried to incorporate suggestions and prepared a revised draft
> of proposal for GSOC. Please find the same attached herewith. Your
> suggestions in regard with this draft would definitely help me to improve it
> further for submission.

Thanks, it looks very good now.  You have essentially duplicated items
in 1. and 2., namely --summary= and Dumping of IPA summaries.
I would move some of the 1. items over to 2., apart from --summary I'd also
move --cgraph-dot.

Richard.

>
> Drive link to Draft Proposal:
>
> https://docs.google.com/document/d/1-jYwwDWsHQwMaxVsHFBrJ9EiCAev7ljTDLS1xMwvK5w/edit
>
>
> Regards,
>
> Hrishikesh
>
>
>
>
> On Mon, Mar 12, 2018 at 4:45 PM, Richard Biener 
> wrote:
>>
>> On Sun, Mar 11, 2018 at 8:23 PM, Hrishikesh Kulkarni
>>  wrote:
>> > Hi,
>> >
>> > Greetings! Please find my draft proposal for GSOC attached herewith. I
>> > am
>> > very grateful to all of you for your inputs, suggestions and directions.
>> > I
>> > have tried to assimilate these inputs received from you to convert it
>> > into a
>> > proposal. Your suggestions in regard with this draft would definitely
>> > help
>> > me to convert it into final proposal for submission. In addition, could
>> > you
>> > please suggest the possible extensions those can be added to dump tool?
>> >
>> >
>> > Drive link to Draft Proposal:
>> >
>> >
>> > https://docs.google.com/document/d/1-jYwwDWsHQwMaxVsHFBrJ9EiCAev7ljTDLS1xMwvK5w/edit
>>
>> The proposal looks a bit sparse when giving details about the actual
>> project.
>> I'd suggest to clarify at least the deliverables.  I suggest for 1. add a
>> 1 c)
>> that specifies what should be working, for example
>>
>>  lto-dump -l
>>
>> should dump a list of variables and functions contained in the IL
>>
>>  lto-sump -s 
>>
>> should dump detailed info about the symbol  (using the symtab dump
>> infrastructure)
>>
>>  lto-dump -f 
>>
>> should dump the function body of the function with  (using the gimple
>> dump infrastructure)
>>
>> the lto-dump tool should be verified to work on both WPA-time and
>> LTRANS-time
>> objects.
>>
>> Thus your 2) a) should be possible with 1) already.  2) would then contain
>> dumping of IPA summaries as major part apart from visualizing the
>> callgraph.
>> For visualizing the (full) callgraph you need to handle multiple LTO
>> IL input files.
>> Those two pieces should be enough for 2) unless usability issues spill
>> over
>> from 1).
>>
>> In the introduction I miss some general words about the LTO IL, like that
>> it
>> is non-self-describing bytecode which is documented only by the code
>> reading/writing it and thus hard to debug.  It also misses to lay out the
>> overall structure of a LTO IL file (you are already going into detail with
>> IPA passes so this missing stands out).
>>
>> Richard.
>>
>> >
>> > Regards,
>> >
>> > Hrishikesh
>> >
>> >
>> >
>> > On Tue, Mar 6, 2018 at 8:59 PM, Jan Hubicka  wrote:
>> >>
>> >> > On Tue, Mar 6, 2018 at 4:02 PM, Jan Hubicka  wrote:
>> >> > >> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
>> >> > >>  wrote:
>> >> > >> > Hi,
>> >> > >> >
>> >> > >> > Thank you Richard and Honza for the suggestions. If I understand
>> >> > >> > correctly,
>> >> > >> > the issue is that LTO file format keeps changing per compiler
>> >> > >> > versions, so
>> >> > >> > we need a more “stable” representation and the first step for
>> >> > >> > that
>> >> > >> > would be
>> >> > >> > to “stabilize” representations for lto-cgraph and symbol table ?
>> >> > >>
>> >> > >> Yes.  Note the issue is that the current format is a 1:1
>> >> > >> representation of
>> >> > >> the internal representation -- which means it is the internal
>> >> > >> representation
>> >> > >> that changes frequently across releases.  I'm not sure how Honza
>> >> > >> wants
>> >> > >> to deal with those changes in the context of a "stable" IL format.
>> >> > >> Given
>> >> > >> we haven't been able to provide a stable API to plugins I think
>> >> > >> it's
>> >> > >> much
>> >> > >> harder to provide a stable streaming format for all the IL
>> >> > >> details
>> >> > >>
>> >> > >> > Could you
>> >> > >> > please elaborate on what initial steps need to be taken in this
>> >> > >> > regard, and
>> >> > >> > if it’s feasible within GSoC timeframe ?
>> >> > >>
>> >> > >> I don't think it is feasible in the GSoC timeframe (nor do I think
>> >> > >> it's feasible
>> >> > >> at all ...)
>> >> > >
>> >> > > I skipped this, with GSoC timeframe I fully agree.  With
>> >> > > feasibility
>> >> > > at all not so
>> >> > > much - LLVM documents its bitcode to reasonable extend
>> >> > > https://llvm.org/docs/BitCodeFormat.html
>> >> > >
>> >> > > Reason why i mentioned it is that I would like to use this as an
>> >> > > excuse to get
>> >> > > 

Re: GSOC 2018 - Textual LTO dump tool project

2018-03-12 Thread Hrishikesh Kulkarni
Hi,

Thanks. I have tried to incorporate suggestions and prepared a revised
draft of proposal for GSOC. Please find the same attached herewith. Your
suggestions in regard with this draft would definitely help me to improve
it further for submission.


Drive link to Draft Proposal:

https://docs.google.com/document/d/1-jYwwDWsHQwMaxVsHFBrJ9EiCAev7ljTDLS1xMwvK5w/edit

Regards,

Hrishikesh



On Mon, Mar 12, 2018 at 4:45 PM, Richard Biener 
wrote:

> On Sun, Mar 11, 2018 at 8:23 PM, Hrishikesh Kulkarni
>  wrote:
> > Hi,
> >
> > Greetings! Please find my draft proposal for GSOC attached herewith. I am
> > very grateful to all of you for your inputs, suggestions and directions.
> I
> > have tried to assimilate these inputs received from you to convert it
> into a
> > proposal. Your suggestions in regard with this draft would definitely
> help
> > me to convert it into final proposal for submission. In addition, could
> you
> > please suggest the possible extensions those can be added to dump tool?
> >
> >
> > Drive link to Draft Proposal:
> >
> > https://docs.google.com/document/d/1-jYwwDWsHQwMaxVsHFBrJ9EiCAev7lj
> TDLS1xMwvK5w/edit
>
> The proposal looks a bit sparse when giving details about the actual
> project.
> I'd suggest to clarify at least the deliverables.  I suggest for 1. add a
> 1 c)
> that specifies what should be working, for example
>
>  lto-dump -l
>
> should dump a list of variables and functions contained in the IL
>
>  lto-sump -s 
>
> should dump detailed info about the symbol  (using the symtab dump
> infrastructure)
>
>  lto-dump -f 
>
> should dump the function body of the function with  (using the gimple
> dump infrastructure)
>
> the lto-dump tool should be verified to work on both WPA-time and
> LTRANS-time
> objects.
>
> Thus your 2) a) should be possible with 1) already.  2) would then contain
> dumping of IPA summaries as major part apart from visualizing the
> callgraph.
> For visualizing the (full) callgraph you need to handle multiple LTO
> IL input files.
> Those two pieces should be enough for 2) unless usability issues spill over
> from 1).
>
> In the introduction I miss some general words about the LTO IL, like that
> it
> is non-self-describing bytecode which is documented only by the code
> reading/writing it and thus hard to debug.  It also misses to lay out the
> overall structure of a LTO IL file (you are already going into detail with
> IPA passes so this missing stands out).
>
> Richard.
>
> >
> > Regards,
> >
> > Hrishikesh
> >
> >
> >
> > On Tue, Mar 6, 2018 at 8:59 PM, Jan Hubicka  wrote:
> >>
> >> > On Tue, Mar 6, 2018 at 4:02 PM, Jan Hubicka  wrote:
> >> > >> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
> >> > >>  wrote:
> >> > >> > Hi,
> >> > >> >
> >> > >> > Thank you Richard and Honza for the suggestions. If I understand
> >> > >> > correctly,
> >> > >> > the issue is that LTO file format keeps changing per compiler
> >> > >> > versions, so
> >> > >> > we need a more “stable” representation and the first step for
> that
> >> > >> > would be
> >> > >> > to “stabilize” representations for lto-cgraph and symbol table ?
> >> > >>
> >> > >> Yes.  Note the issue is that the current format is a 1:1
> >> > >> representation of
> >> > >> the internal representation -- which means it is the internal
> >> > >> representation
> >> > >> that changes frequently across releases.  I'm not sure how Honza
> >> > >> wants
> >> > >> to deal with those changes in the context of a "stable" IL format.
> >> > >> Given
> >> > >> we haven't been able to provide a stable API to plugins I think
> it's
> >> > >> much
> >> > >> harder to provide a stable streaming format for all the IL
> >> > >> details
> >> > >>
> >> > >> > Could you
> >> > >> > please elaborate on what initial steps need to be taken in this
> >> > >> > regard, and
> >> > >> > if it’s feasible within GSoC timeframe ?
> >> > >>
> >> > >> I don't think it is feasible in the GSoC timeframe (nor do I think
> >> > >> it's feasible
> >> > >> at all ...)
> >> > >
> >> > > I skipped this, with GSoC timeframe I fully agree.  With feasibility
> >> > > at all not so
> >> > > much - LLVM documents its bitcode to reasonable extend
> >> > > https://llvm.org/docs/BitCodeFormat.html
> >> > >
> >> > > Reason why i mentioned it is that I would like to use this as an
> >> > > excuse to get
> >> > > things incrementally cleaned up and it would be nice to keep it in
> >> > > mind while
> >> > > working on this.
> >> >
> >> > Ok.  It's probably close enough to what I recommended doing with
> respect
> >> > to make the LTO bytecode "self-descriptive" -- thus start with making
> >> > the
> >> > structure documented and parseable without assigning semantics to
> >> > every bit ;)  I think that can be achieved top-down in a very
> >> > incremental
> >> > way if you get the bottom implemented first (the data-streamer part).

Re: GSOC 2018 - Textual LTO dump tool project

2018-03-12 Thread Richard Biener
On Sun, Mar 11, 2018 at 8:23 PM, Hrishikesh Kulkarni
 wrote:
> Hi,
>
> Greetings! Please find my draft proposal for GSOC attached herewith. I am
> very grateful to all of you for your inputs, suggestions and directions. I
> have tried to assimilate these inputs received from you to convert it into a
> proposal. Your suggestions in regard with this draft would definitely help
> me to convert it into final proposal for submission. In addition, could you
> please suggest the possible extensions those can be added to dump tool?
>
>
> Drive link to Draft Proposal:
>
> https://docs.google.com/document/d/1-jYwwDWsHQwMaxVsHFBrJ9EiCAev7ljTDLS1xMwvK5w/edit

The proposal looks a bit sparse when giving details about the actual project.
I'd suggest to clarify at least the deliverables.  I suggest for 1. add a 1 c)
that specifies what should be working, for example

 lto-dump -l

should dump a list of variables and functions contained in the IL

 lto-sump -s 

should dump detailed info about the symbol  (using the symtab dump
infrastructure)

 lto-dump -f 

should dump the function body of the function with  (using the gimple
dump infrastructure)

the lto-dump tool should be verified to work on both WPA-time and LTRANS-time
objects.

Thus your 2) a) should be possible with 1) already.  2) would then contain
dumping of IPA summaries as major part apart from visualizing the callgraph.
For visualizing the (full) callgraph you need to handle multiple LTO
IL input files.
Those two pieces should be enough for 2) unless usability issues spill over
from 1).

In the introduction I miss some general words about the LTO IL, like that it
is non-self-describing bytecode which is documented only by the code
reading/writing it and thus hard to debug.  It also misses to lay out the
overall structure of a LTO IL file (you are already going into detail with
IPA passes so this missing stands out).

Richard.

>
> Regards,
>
> Hrishikesh
>
>
>
> On Tue, Mar 6, 2018 at 8:59 PM, Jan Hubicka  wrote:
>>
>> > On Tue, Mar 6, 2018 at 4:02 PM, Jan Hubicka  wrote:
>> > >> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
>> > >>  wrote:
>> > >> > Hi,
>> > >> >
>> > >> > Thank you Richard and Honza for the suggestions. If I understand
>> > >> > correctly,
>> > >> > the issue is that LTO file format keeps changing per compiler
>> > >> > versions, so
>> > >> > we need a more “stable” representation and the first step for that
>> > >> > would be
>> > >> > to “stabilize” representations for lto-cgraph and symbol table ?
>> > >>
>> > >> Yes.  Note the issue is that the current format is a 1:1
>> > >> representation of
>> > >> the internal representation -- which means it is the internal
>> > >> representation
>> > >> that changes frequently across releases.  I'm not sure how Honza
>> > >> wants
>> > >> to deal with those changes in the context of a "stable" IL format.
>> > >> Given
>> > >> we haven't been able to provide a stable API to plugins I think it's
>> > >> much
>> > >> harder to provide a stable streaming format for all the IL
>> > >> details
>> > >>
>> > >> > Could you
>> > >> > please elaborate on what initial steps need to be taken in this
>> > >> > regard, and
>> > >> > if it’s feasible within GSoC timeframe ?
>> > >>
>> > >> I don't think it is feasible in the GSoC timeframe (nor do I think
>> > >> it's feasible
>> > >> at all ...)
>> > >
>> > > I skipped this, with GSoC timeframe I fully agree.  With feasibility
>> > > at all not so
>> > > much - LLVM documents its bitcode to reasonable extend
>> > > https://llvm.org/docs/BitCodeFormat.html
>> > >
>> > > Reason why i mentioned it is that I would like to use this as an
>> > > excuse to get
>> > > things incrementally cleaned up and it would be nice to keep it in
>> > > mind while
>> > > working on this.
>> >
>> > Ok.  It's probably close enough to what I recommended doing with respect
>> > to make the LTO bytecode "self-descriptive" -- thus start with making
>> > the
>> > structure documented and parseable without assigning semantics to
>> > every bit ;)  I think that can be achieved top-down in a very
>> > incremental
>> > way if you get the bottom implemented first (the data-streamer part).
>>
>> OK :)
>> I did not mean to document every bit either, at least not for the fancy
>> parts.
>> It would be nice to have clenned up i.e. the section headers/footers so
>> they
>> do not depend on endianity and slowly cleanup similar nonsences at higher
>> levels.  So it may make sense to progress from both directions lower
>> hanging
>> fruits first.
>>
>> Honza
>
>


Re: GSOC 2018 - Textual LTO dump tool project

2018-03-11 Thread Hrishikesh Kulkarni
Hi,

Greetings! Please find my draft proposal for GSOC attached herewith. I am
very grateful to all of you for your inputs, suggestions and directions. I
have tried to assimilate these inputs received from you to convert it into
a proposal. Your suggestions in regard with this draft would definitely
help me to convert it into final proposal for submission. In addition,
could you please suggest the possible extensions those can be added to dump
tool?


Drive link to Draft Proposal:

https://docs.google.com/document/d/1-jYwwDWsHQwMaxVsHFBrJ9EiCAev7ljTDLS1xMwvK5w/edit


Regards,

Hrishikesh


On Tue, Mar 6, 2018 at 8:59 PM, Jan Hubicka  wrote:

> > On Tue, Mar 6, 2018 at 4:02 PM, Jan Hubicka  wrote:
> > >> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
> > >>  wrote:
> > >> > Hi,
> > >> >
> > >> > Thank you Richard and Honza for the suggestions. If I understand
> correctly,
> > >> > the issue is that LTO file format keeps changing per compiler
> versions, so
> > >> > we need a more “stable” representation and the first step for that
> would be
> > >> > to “stabilize” representations for lto-cgraph and symbol table ?
> > >>
> > >> Yes.  Note the issue is that the current format is a 1:1
> representation of
> > >> the internal representation -- which means it is the internal
> representation
> > >> that changes frequently across releases.  I'm not sure how Honza wants
> > >> to deal with those changes in the context of a "stable" IL format.
> Given
> > >> we haven't been able to provide a stable API to plugins I think it's
> much
> > >> harder to provide a stable streaming format for all the IL details
> > >>
> > >> > Could you
> > >> > please elaborate on what initial steps need to be taken in this
> regard, and
> > >> > if it’s feasible within GSoC timeframe ?
> > >>
> > >> I don't think it is feasible in the GSoC timeframe (nor do I think
> it's feasible
> > >> at all ...)
> > >
> > > I skipped this, with GSoC timeframe I fully agree.  With feasibility
> at all not so
> > > much - LLVM documents its bitcode to reasonable extend
> > > https://llvm.org/docs/BitCodeFormat.html
> > >
> > > Reason why i mentioned it is that I would like to use this as an
> excuse to get
> > > things incrementally cleaned up and it would be nice to keep it in
> mind while
> > > working on this.
> >
> > Ok.  It's probably close enough to what I recommended doing with respect
> > to make the LTO bytecode "self-descriptive" -- thus start with making the
> > structure documented and parseable without assigning semantics to
> > every bit ;)  I think that can be achieved top-down in a very incremental
> > way if you get the bottom implemented first (the data-streamer part).
>
> OK :)
> I did not mean to document every bit either, at least not for the fancy
> parts.
> It would be nice to have clenned up i.e. the section headers/footers so
> they
> do not depend on endianity and slowly cleanup similar nonsences at higher
> levels.  So it may make sense to progress from both directions lower
> hanging
> fruits first.
>
> Honza
>


Re: GSOC 2018 - Textual LTO dump tool project

2018-03-06 Thread Jan Hubicka
> On Tue, Mar 6, 2018 at 4:02 PM, Jan Hubicka  wrote:
> >> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
> >>  wrote:
> >> > Hi,
> >> >
> >> > Thank you Richard and Honza for the suggestions. If I understand 
> >> > correctly,
> >> > the issue is that LTO file format keeps changing per compiler versions, 
> >> > so
> >> > we need a more “stable” representation and the first step for that would 
> >> > be
> >> > to “stabilize” representations for lto-cgraph and symbol table ?
> >>
> >> Yes.  Note the issue is that the current format is a 1:1 representation of
> >> the internal representation -- which means it is the internal 
> >> representation
> >> that changes frequently across releases.  I'm not sure how Honza wants
> >> to deal with those changes in the context of a "stable" IL format.  Given
> >> we haven't been able to provide a stable API to plugins I think it's much
> >> harder to provide a stable streaming format for all the IL details
> >>
> >> > Could you
> >> > please elaborate on what initial steps need to be taken in this regard, 
> >> > and
> >> > if it’s feasible within GSoC timeframe ?
> >>
> >> I don't think it is feasible in the GSoC timeframe (nor do I think it's 
> >> feasible
> >> at all ...)
> >
> > I skipped this, with GSoC timeframe I fully agree.  With feasibility at all 
> > not so
> > much - LLVM documents its bitcode to reasonable extend
> > https://llvm.org/docs/BitCodeFormat.html
> >
> > Reason why i mentioned it is that I would like to use this as an excuse to 
> > get
> > things incrementally cleaned up and it would be nice to keep it in mind 
> > while
> > working on this.
> 
> Ok.  It's probably close enough to what I recommended doing with respect
> to make the LTO bytecode "self-descriptive" -- thus start with making the
> structure documented and parseable without assigning semantics to
> every bit ;)  I think that can be achieved top-down in a very incremental
> way if you get the bottom implemented first (the data-streamer part).

OK :)
I did not mean to document every bit either, at least not for the fancy parts.
It would be nice to have clenned up i.e. the section headers/footers so they
do not depend on endianity and slowly cleanup similar nonsences at higher
levels.  So it may make sense to progress from both directions lower hanging
fruits first.

Honza


Re: GSOC 2018 - Textual LTO dump tool project

2018-03-06 Thread Richard Biener
On Tue, Mar 6, 2018 at 4:02 PM, Jan Hubicka  wrote:
>> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
>>  wrote:
>> > Hi,
>> >
>> > Thank you Richard and Honza for the suggestions. If I understand correctly,
>> > the issue is that LTO file format keeps changing per compiler versions, so
>> > we need a more “stable” representation and the first step for that would be
>> > to “stabilize” representations for lto-cgraph and symbol table ?
>>
>> Yes.  Note the issue is that the current format is a 1:1 representation of
>> the internal representation -- which means it is the internal representation
>> that changes frequently across releases.  I'm not sure how Honza wants
>> to deal with those changes in the context of a "stable" IL format.  Given
>> we haven't been able to provide a stable API to plugins I think it's much
>> harder to provide a stable streaming format for all the IL details
>>
>> > Could you
>> > please elaborate on what initial steps need to be taken in this regard, and
>> > if it’s feasible within GSoC timeframe ?
>>
>> I don't think it is feasible in the GSoC timeframe (nor do I think it's 
>> feasible
>> at all ...)
>
> I skipped this, with GSoC timeframe I fully agree.  With feasibility at all 
> not so
> much - LLVM documents its bitcode to reasonable extend
> https://llvm.org/docs/BitCodeFormat.html
>
> Reason why i mentioned it is that I would like to use this as an excuse to get
> things incrementally cleaned up and it would be nice to keep it in mind while
> working on this.

Ok.  It's probably close enough to what I recommended doing with respect
to make the LTO bytecode "self-descriptive" -- thus start with making the
structure documented and parseable without assigning semantics to
every bit ;)  I think that can be achieved top-down in a very incremental
way if you get the bottom implemented first (the data-streamer part).

Richard.

> Honza
>>
>> > Thanks!
>> >
>> >
>> > I am trying to break down the project into milestones for the proposal. So
>> > far, I have identified the following objectives:
>> >
>> > 1] Creating a separate driver, that can read LTO object files. Following
>> > Richard’s estimate, I’d leave around first half of the period for this 
>> > task.
>> >
>> > Would that be OK ?
>>
>> Yes.
>>
>> > Coming to 2nd half:
>> >
>> > 2] Dumping pass summaries.
>> >
>> > 3] Stabilizing lto-cgraph and symbol table.
>>
>> So I'd instead do
>>
>>  3] Enhance the user-interface of the driver
>>
>> like providing a way to list all function bodies, a way to dump
>> the IL of a single function body, a way to create a dot graph file
>> for the cgraph in the file, etc.
>>
>> Basically while there's a lot of dumping infrastructure in GCC
>> it may not always fit the needs of a LTO IL dumping tool 1:1
>> and may need refactoring enhancement.
>>
>> Richard.
>>
>> >
>> > Thanks,
>> >
>> > Hrishikesh
>> >
>> >
>> >
>> > On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka  wrote:
>> >>
>> >> Hello,
>> >> > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni
>> >> >  wrote:
>> >> > > Hello everyone,
>> >> > >
>> >> > >
>> >> > > Thanks for your suggestions and engaging response.
>> >> > >
>> >> > > Based on the feedback I think that the scope of this project comprises
>> >> > > of
>> >> > > following three indicative actions:
>> >> > >
>> >> > >
>> >> > > 1. Creating separate driver i.e. separate dump tool that uses lto
>> >> > > object API
>> >> > > for reading the lto file.
>> >> >
>> >> > Yes.  I expect this will take the whole first half of the project,
>> >> > after this you
>> >> > should be somewhat familiar with the infrastructure as well.  With the
>> >> > existing dumping infrastructure it should be possible to dump the
>> >> > callgraph and individual function bodies.
>> >> >
>> >> > >
>> >> > > 2. Extending LTO dump infrastructure:
>> >> > >
>> >> > > GCC already seems to have dump infrastructure for pretty-printing tree
>> >> > > nodes, gimple statements etc. However I suppose we’d need to extend
>> >> > > that for
>> >> > > dumping pass summaries ? For instance, should we add a new hook say
>> >> > > “dump”
>> >> > > to ipa_opt_pass_d that’d dump the pass
>> >> > > summary ?
>> >> >
>> >> > That sounds like a good idea indeed.  I'm not sure if this is the most
>> >> > interesting
>> >> > missing part - I guess we'll find out once a dump tool is available.
>> >>
>> >> Concering the LTO file format my longer term aim is to make the symbol
>> >> table sections (symtab used by lto-plugin as well as the callgraph
>> >> section)
>> >> and hopefully also the Gimple streams) documented and well behaving
>> >> without changing the format in every revision.
>> >>
>> >> On the other hand the summaries used by individual passes are intended to
>> >> be
>> >> pass specific and envolving as individula passes become stronger/new
>> >> passes
>> >> are added.
>> >>
>> >> It is quite a lot of work to 

Re: GSOC 2018 - Textual LTO dump tool project

2018-03-06 Thread Jan Hubicka
> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
>  wrote:
> > Hi,
> >
> > Thank you Richard and Honza for the suggestions. If I understand correctly,
> > the issue is that LTO file format keeps changing per compiler versions, so
> > we need a more “stable” representation and the first step for that would be
> > to “stabilize” representations for lto-cgraph and symbol table ?
> 
> Yes.  Note the issue is that the current format is a 1:1 representation of
> the internal representation -- which means it is the internal representation
> that changes frequently across releases.  I'm not sure how Honza wants
> to deal with those changes in the context of a "stable" IL format.  Given
> we haven't been able to provide a stable API to plugins I think it's much
> harder to provide a stable streaming format for all the IL details
> 
> > Could you
> > please elaborate on what initial steps need to be taken in this regard, and
> > if it’s feasible within GSoC timeframe ?
> 
> I don't think it is feasible in the GSoC timeframe (nor do I think it's 
> feasible
> at all ...)

I skipped this, with GSoC timeframe I fully agree.  With feasibility at all not 
so
much - LLVM documents its bitcode to reasonable extend
https://llvm.org/docs/BitCodeFormat.html

Reason why i mentioned it is that I would like to use this as an excuse to get
things incrementally cleaned up and it would be nice to keep it in mind while
working on this.

Honza
> 
> > Thanks!
> >
> >
> > I am trying to break down the project into milestones for the proposal. So
> > far, I have identified the following objectives:
> >
> > 1] Creating a separate driver, that can read LTO object files. Following
> > Richard’s estimate, I’d leave around first half of the period for this task.
> >
> > Would that be OK ?
> 
> Yes.
> 
> > Coming to 2nd half:
> >
> > 2] Dumping pass summaries.
> >
> > 3] Stabilizing lto-cgraph and symbol table.
> 
> So I'd instead do
> 
>  3] Enhance the user-interface of the driver
> 
> like providing a way to list all function bodies, a way to dump
> the IL of a single function body, a way to create a dot graph file
> for the cgraph in the file, etc.
> 
> Basically while there's a lot of dumping infrastructure in GCC
> it may not always fit the needs of a LTO IL dumping tool 1:1
> and may need refactoring enhancement.
> 
> Richard.
> 
> >
> > Thanks,
> >
> > Hrishikesh
> >
> >
> >
> > On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka  wrote:
> >>
> >> Hello,
> >> > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni
> >> >  wrote:
> >> > > Hello everyone,
> >> > >
> >> > >
> >> > > Thanks for your suggestions and engaging response.
> >> > >
> >> > > Based on the feedback I think that the scope of this project comprises
> >> > > of
> >> > > following three indicative actions:
> >> > >
> >> > >
> >> > > 1. Creating separate driver i.e. separate dump tool that uses lto
> >> > > object API
> >> > > for reading the lto file.
> >> >
> >> > Yes.  I expect this will take the whole first half of the project,
> >> > after this you
> >> > should be somewhat familiar with the infrastructure as well.  With the
> >> > existing dumping infrastructure it should be possible to dump the
> >> > callgraph and individual function bodies.
> >> >
> >> > >
> >> > > 2. Extending LTO dump infrastructure:
> >> > >
> >> > > GCC already seems to have dump infrastructure for pretty-printing tree
> >> > > nodes, gimple statements etc. However I suppose we’d need to extend
> >> > > that for
> >> > > dumping pass summaries ? For instance, should we add a new hook say
> >> > > “dump”
> >> > > to ipa_opt_pass_d that’d dump the pass
> >> > > summary ?
> >> >
> >> > That sounds like a good idea indeed.  I'm not sure if this is the most
> >> > interesting
> >> > missing part - I guess we'll find out once a dump tool is available.
> >>
> >> Concering the LTO file format my longer term aim is to make the symbol
> >> table sections (symtab used by lto-plugin as well as the callgraph
> >> section)
> >> and hopefully also the Gimple streams) documented and well behaving
> >> without changing the format in every revision.
> >>
> >> On the other hand the summaries used by individual passes are intended to
> >> be
> >> pass specific and envolving as individula passes become stronger/new
> >> passes
> >> are added.
> >>
> >> It is quite a lot of work to stabilize gimple representation to this
> >> extend,
> >> For callgraph table this is however more realistic. That would mean
> >> to
> >> move some of existing random stuff streamed there into summaries and
> >> additionaly
> >> cleaning up/rewriting lto-cgraph so the on disk format actually makes
> >> sense.
> >>
> >> I will be happy to help with any steps in this direction as well.
> >>
> >> Honza
> >
> >


Re: GSOC 2018 - Textual LTO dump tool project

2018-03-06 Thread Jan Hubicka
> On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
>  wrote:
> > Hi,
> >
> > Thank you Richard and Honza for the suggestions. If I understand correctly,
> > the issue is that LTO file format keeps changing per compiler versions, so
> > we need a more “stable” representation and the first step for that would be
> > to “stabilize” representations for lto-cgraph and symbol table ?
> 
> Yes.  Note the issue is that the current format is a 1:1 representation of
> the internal representation -- which means it is the internal representation
> that changes frequently across releases.  I'm not sure how Honza wants
> to deal with those changes in the context of a "stable" IL format.  Given
> we haven't been able to provide a stable API to plugins I think it's much
> harder to provide a stable streaming format for all the IL details

Well, because I think it would be good for us to more formalize our IL -
document it properly, remove stuff which is not necessary and get an API+file
representation. Those things are connected to each other and will need work.

If you look how much things chage, it is not very frequent we would change what
is in our CFG (I changed profile this release), how gimple tuples are
represented and what gimple instructions we have.  I think those parts are
resonably well defined. Even if I changed profile this release it is relatively
localized type of change.  I am still more commonly changing symbol table as it
needs to adapt for all LTO details, but I hope to be basically done.

What is more on the go are trees that we will hopefully deal with by defining
gimple types now with early debug done.

What we can do realistically now is to first aim to stream those of better
defined parts in externally parseable sections which do have documentation.  So
far only externally parseable section is the plugin symbol table, but we should
be able to do so with reasonable effort for symbol tables, CFGs and gimple
instruction streams.

In parallel we can incrementally deal with trees mostly hopefully by getting rid
of them (moving symbol names/etc to symbol table so it can live w/o 
declarations,
having gimple types etc.)

> 
> > Could you
> > please elaborate on what initial steps need to be taken in this regard, and
> > if it’s feasible within GSoC timeframe ?
> 
> I don't think it is feasible in the GSoC timeframe (nor do I think it's 
> feasible
> at all ...)
> 
> > Thanks!
> >
> >
> > I am trying to break down the project into milestones for the proposal. So
> > far, I have identified the following objectives:
> >
> > 1] Creating a separate driver, that can read LTO object files. Following
> > Richard’s estimate, I’d leave around first half of the period for this task.
> >
> > Would that be OK ?
> 
> Yes.
Yes, it looks good to me too.
> 
> > Coming to 2nd half:
> >
> > 2] Dumping pass summaries.
> >
> > 3] Stabilizing lto-cgraph and symbol table.
> 
> So I'd instead do
> 
>  3] Enhance the user-interface of the driver
> 
> like providing a way to list all function bodies, a way to dump
> the IL of a single function body, a way to create a dot graph file
> for the cgraph in the file, etc.
> 
> Basically while there's a lot of dumping infrastructure in GCC
> it may not always fit the needs of a LTO IL dumping tool 1:1
> and may need refactoring enhancement.

I would agree here - dumping pass summaries would be nice but we already have
that more or less.  All IPA passes dump their summary into beggining of their
dump file and I find that relatively sufficient to deal with mostly because
summaries are quite simple.  It is much harder to deal with the global sream of
trees and function bodies themselves.

Honza
> 
> Richard.
> 
> >
> > Thanks,
> >
> > Hrishikesh
> >
> >
> >
> > On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka  wrote:
> >>
> >> Hello,
> >> > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni
> >> >  wrote:
> >> > > Hello everyone,
> >> > >
> >> > >
> >> > > Thanks for your suggestions and engaging response.
> >> > >
> >> > > Based on the feedback I think that the scope of this project comprises
> >> > > of
> >> > > following three indicative actions:
> >> > >
> >> > >
> >> > > 1. Creating separate driver i.e. separate dump tool that uses lto
> >> > > object API
> >> > > for reading the lto file.
> >> >
> >> > Yes.  I expect this will take the whole first half of the project,
> >> > after this you
> >> > should be somewhat familiar with the infrastructure as well.  With the
> >> > existing dumping infrastructure it should be possible to dump the
> >> > callgraph and individual function bodies.
> >> >
> >> > >
> >> > > 2. Extending LTO dump infrastructure:
> >> > >
> >> > > GCC already seems to have dump infrastructure for pretty-printing tree
> >> > > nodes, gimple statements etc. However I suppose we’d need to extend
> >> > > that for
> >> > > dumping pass summaries ? For instance, should we add a new hook say
> >> > 

Re: GSOC 2018 - Textual LTO dump tool project

2018-03-06 Thread Richard Biener
On Tue, Mar 6, 2018 at 2:30 PM, Hrishikesh Kulkarni
 wrote:
> Hi,
>
> Thank you Richard and Honza for the suggestions. If I understand correctly,
> the issue is that LTO file format keeps changing per compiler versions, so
> we need a more “stable” representation and the first step for that would be
> to “stabilize” representations for lto-cgraph and symbol table ?

Yes.  Note the issue is that the current format is a 1:1 representation of
the internal representation -- which means it is the internal representation
that changes frequently across releases.  I'm not sure how Honza wants
to deal with those changes in the context of a "stable" IL format.  Given
we haven't been able to provide a stable API to plugins I think it's much
harder to provide a stable streaming format for all the IL details

> Could you
> please elaborate on what initial steps need to be taken in this regard, and
> if it’s feasible within GSoC timeframe ?

I don't think it is feasible in the GSoC timeframe (nor do I think it's feasible
at all ...)

> Thanks!
>
>
> I am trying to break down the project into milestones for the proposal. So
> far, I have identified the following objectives:
>
> 1] Creating a separate driver, that can read LTO object files. Following
> Richard’s estimate, I’d leave around first half of the period for this task.
>
> Would that be OK ?

Yes.

> Coming to 2nd half:
>
> 2] Dumping pass summaries.
>
> 3] Stabilizing lto-cgraph and symbol table.

So I'd instead do

 3] Enhance the user-interface of the driver

like providing a way to list all function bodies, a way to dump
the IL of a single function body, a way to create a dot graph file
for the cgraph in the file, etc.

Basically while there's a lot of dumping infrastructure in GCC
it may not always fit the needs of a LTO IL dumping tool 1:1
and may need refactoring enhancement.

Richard.

>
> Thanks,
>
> Hrishikesh
>
>
>
> On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka  wrote:
>>
>> Hello,
>> > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni
>> >  wrote:
>> > > Hello everyone,
>> > >
>> > >
>> > > Thanks for your suggestions and engaging response.
>> > >
>> > > Based on the feedback I think that the scope of this project comprises
>> > > of
>> > > following three indicative actions:
>> > >
>> > >
>> > > 1. Creating separate driver i.e. separate dump tool that uses lto
>> > > object API
>> > > for reading the lto file.
>> >
>> > Yes.  I expect this will take the whole first half of the project,
>> > after this you
>> > should be somewhat familiar with the infrastructure as well.  With the
>> > existing dumping infrastructure it should be possible to dump the
>> > callgraph and individual function bodies.
>> >
>> > >
>> > > 2. Extending LTO dump infrastructure:
>> > >
>> > > GCC already seems to have dump infrastructure for pretty-printing tree
>> > > nodes, gimple statements etc. However I suppose we’d need to extend
>> > > that for
>> > > dumping pass summaries ? For instance, should we add a new hook say
>> > > “dump”
>> > > to ipa_opt_pass_d that’d dump the pass
>> > > summary ?
>> >
>> > That sounds like a good idea indeed.  I'm not sure if this is the most
>> > interesting
>> > missing part - I guess we'll find out once a dump tool is available.
>>
>> Concering the LTO file format my longer term aim is to make the symbol
>> table sections (symtab used by lto-plugin as well as the callgraph
>> section)
>> and hopefully also the Gimple streams) documented and well behaving
>> without changing the format in every revision.
>>
>> On the other hand the summaries used by individual passes are intended to
>> be
>> pass specific and envolving as individula passes become stronger/new
>> passes
>> are added.
>>
>> It is quite a lot of work to stabilize gimple representation to this
>> extend,
>> For callgraph table this is however more realistic. That would mean
>> to
>> move some of existing random stuff streamed there into summaries and
>> additionaly
>> cleaning up/rewriting lto-cgraph so the on disk format actually makes
>> sense.
>>
>> I will be happy to help with any steps in this direction as well.
>>
>> Honza
>
>


Re: GSOC 2018 - Textual LTO dump tool project

2018-03-06 Thread Hrishikesh Kulkarni
Hi,

Thank you Richard and Honza for the suggestions. If I understand correctly,
the issue is that LTO file format keeps changing per compiler versions, so
we need a more “stable” representation and the first step for that would be
to “stabilize” representations for lto-cgraph and symbol table ? Could you
please elaborate on what initial steps need to be taken in this regard, and
if it’s feasible within GSoC timeframe ?

Thanks!

I am trying to break down the project into milestones for the proposal. So
far, I have identified the following objectives:

1] Creating a separate driver, that can read LTO object files. Following
Richard’s estimate, I’d leave around first half of the period for this task.

Would that be OK ?

Coming to 2nd half:

2] Dumping pass summaries.

3] Stabilizing lto-cgraph and symbol table.

Thanks,

Hrishikesh


On Fri, Mar 2, 2018 at 6:31 PM, Jan Hubicka  wrote:

> Hello,
> > On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni
> >  wrote:
> > > Hello everyone,
> > >
> > >
> > > Thanks for your suggestions and engaging response.
> > >
> > > Based on the feedback I think that the scope of this project comprises
> of
> > > following three indicative actions:
> > >
> > >
> > > 1. Creating separate driver i.e. separate dump tool that uses lto
> object API
> > > for reading the lto file.
> >
> > Yes.  I expect this will take the whole first half of the project,
> > after this you
> > should be somewhat familiar with the infrastructure as well.  With the
> > existing dumping infrastructure it should be possible to dump the
> > callgraph and individual function bodies.
> >
> > >
> > > 2. Extending LTO dump infrastructure:
> > >
> > > GCC already seems to have dump infrastructure for pretty-printing tree
> > > nodes, gimple statements etc. However I suppose we’d need to extend
> that for
> > > dumping pass summaries ? For instance, should we add a new hook say
> “dump”
> > > to ipa_opt_pass_d that’d dump the pass
> > > summary ?
> >
> > That sounds like a good idea indeed.  I'm not sure if this is the most
> > interesting
> > missing part - I guess we'll find out once a dump tool is available.
>
> Concering the LTO file format my longer term aim is to make the symbol
> table sections (symtab used by lto-plugin as well as the callgraph section)
> and hopefully also the Gimple streams) documented and well behaving
> without changing the format in every revision.
>
> On the other hand the summaries used by individual passes are intended to
> be
> pass specific and envolving as individula passes become stronger/new passes
> are added.
>
> It is quite a lot of work to stabilize gimple representation to this
> extend,
> For callgraph table this is however more realistic. That would mean
> to
> move some of existing random stuff streamed there into summaries and
> additionaly
> cleaning up/rewriting lto-cgraph so the on disk format actually makes
> sense.
>
> I will be happy to help with any steps in this direction as well.
>
> Honza
>


Re: GSOC 2018 - Textual LTO dump tool project

2018-03-02 Thread Jan Hubicka
Hello,
> On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni
>  wrote:
> > Hello everyone,
> >
> >
> > Thanks for your suggestions and engaging response.
> >
> > Based on the feedback I think that the scope of this project comprises of
> > following three indicative actions:
> >
> >
> > 1. Creating separate driver i.e. separate dump tool that uses lto object API
> > for reading the lto file.
> 
> Yes.  I expect this will take the whole first half of the project,
> after this you
> should be somewhat familiar with the infrastructure as well.  With the
> existing dumping infrastructure it should be possible to dump the
> callgraph and individual function bodies.
> 
> >
> > 2. Extending LTO dump infrastructure:
> >
> > GCC already seems to have dump infrastructure for pretty-printing tree
> > nodes, gimple statements etc. However I suppose we’d need to extend that for
> > dumping pass summaries ? For instance, should we add a new hook say “dump”
> > to ipa_opt_pass_d that’d dump the pass
> > summary ?
> 
> That sounds like a good idea indeed.  I'm not sure if this is the most
> interesting
> missing part - I guess we'll find out once a dump tool is available.

Concering the LTO file format my longer term aim is to make the symbol
table sections (symtab used by lto-plugin as well as the callgraph section)
and hopefully also the Gimple streams) documented and well behaving
without changing the format in every revision.

On the other hand the summaries used by individual passes are intended to be
pass specific and envolving as individula passes become stronger/new passes
are added.

It is quite a lot of work to stabilize gimple representation to this extend,
For callgraph table this is however more realistic. That would mean to
move some of existing random stuff streamed there into summaries and additionaly
cleaning up/rewriting lto-cgraph so the on disk format actually makes sense.

I will be happy to help with any steps in this direction as well.

Honza


Re: GSOC 2018 - Textual LTO dump tool project

2018-03-02 Thread Richard Biener
On Fri, Mar 2, 2018 at 10:24 AM, Hrishikesh Kulkarni
 wrote:
> Hello everyone,
>
>
> Thanks for your suggestions and engaging response.
>
> Based on the feedback I think that the scope of this project comprises of
> following three indicative actions:
>
>
> 1. Creating separate driver i.e. separate dump tool that uses lto object API
> for reading the lto file.

Yes.  I expect this will take the whole first half of the project,
after this you
should be somewhat familiar with the infrastructure as well.  With the
existing dumping infrastructure it should be possible to dump the
callgraph and individual function bodies.

>
> 2. Extending LTO dump infrastructure:
>
> GCC already seems to have dump infrastructure for pretty-printing tree
> nodes, gimple statements etc. However I suppose we’d need to extend that for
> dumping pass summaries ? For instance, should we add a new hook say “dump”
> to ipa_opt_pass_d that’d dump the pass
> summary ?

That sounds like a good idea indeed.  I'm not sure if this is the most
interesting
missing part - I guess we'll find out once a dump tool is available.

> 3. Refactoring streaming API - Could you please elaborate more on what
> improvements could be made to the streaming API ? Would it be a good idea to
> make it more “C++ style” similar to iostream interface ? Also while going
> thru ipa-cp/ipa-prop I noticed the following in ipa_prop_read_functions(),
> which looks like some kind of “preamble” for setting up header to read the
> summary. Perhaps this could be abstracted into streaming API too ?
>
> const struct lto_function_header *header =
>
>(const struct lto_function_header *) data;
>
>  const int cfg_offset = sizeof (struct lto_function_header);
>
>  const int main_offset = cfg_offset + header->cfg_size;
>
>  const int string_offset = main_offset + header->main_size;

This is a very hard task so I suggest to not venture into the area of
refactoring the API for this project.

What I thought of that would be nice to debug streamer issues is to
(optionally!) make the LTO bytecode (if you can name it so...) (more)
self-descriptive.
Currently the bytecode is simply a series of bytes and in case the reading part
doesn't 1:1 match the writing part you get garbage.  So a first baby
step would be
to emit markers, like

 '0x00' raw byte follows
 '0x01' uhwi follows
 '0x02' bitpack follows, 'n' with N bits
 ...

basically look at data-streamer.[ch] as the lowest level of the stream encoding
and make it self-desctriptive and thus "dumpable" independent of the LTO reader.

Then go to tree-streamer-*.c and do the same for the various tree parts,
add tree streamer specific 'markers' (again optionally, just for debugging).

Then go to lto-streamer-*.c and repeat.

I'm not sure if we should go all the way to do sth like DWARF with its
abbrevs to optimize the encoding given it's just for debugging but it
would be maybe interesting to get an approximate idea about the overhead
streaming trees with full abbrevs.  I suppose it wouldn't be too bad.

As said the refactoring shouldn't be part of the project - 1. and 2. are
large enough already.

Richard.

>
> I would be grateful for suggestions, on how to proceed further, especially
> with modifying makefiles for creating the new driver. Unfortunately I have
> some school exams next week and won’t be able to work much on GCC during the
> period.
>
>
> Best Regards,
>
> Hrishikesh
>
>
>
> On Wed, Feb 28, 2018 at 4:05 PM, Martin Liška  wrote:
>>
>> On 02/25/2018 10:46 AM, Martin Jambor wrote:
>> > Hello Hrishikesh,
>> >
>> > I apologize for replying to you this late, this has been a busy week
>> > and now I am traveling.
>> >
>> > On Mon, Feb 19 2018, Hrishikesh Kulkarni wrote:
>> >> Hi,
>> >>
>> >> I am Hrishikesh Kulkarni currently studying as an undergrad student in
>> >> Computer Engineering at Pune University, India. I find compilers quite
>> >> interesting as a subject,  and would like to apply to GSoC to gain some
>> >> understanding of how real-world compilers work. So far, I have managed
>> >> to
>> >> build gcc and perform some simple tweaks to the codebase. In
>> >> particular, I
>> >> would like to apply to the Textual LTO dump tool project.
>> >>
>> >
>> > I must say I am impressed by the research you have already done.
>> > Nevertheless, please note that Ray Kim has also expressed interest in
>> > the project.  Martin Liska will be the mentor, so I will let him drive
>> > the selection process.  On the other hand, Ray also liked another
>> > project, so maybe he will pick that and everyone will be happy.
>>
>> Hello.
>>
>> I'm really happy that there are multiple volunteers that want to work on
>> LTO dump
>> tool project. According to what I've took a look I would like to have
>> Hrishikesh
>> working on the project. He's got experience with C, C++ and also with
>> Python language
>> that can be well used for prototyping. Apart from that he's spent quite
>> some time
>> with 

Re: GSOC 2018 - Textual LTO dump tool project

2018-03-02 Thread Hrishikesh Kulkarni
Hello everyone,


Thanks for your suggestions and engaging response.

Based on the feedback I think that the scope of this project comprises of
following three indicative actions:

1. Creating separate driver i.e. separate dump tool that uses lto object
API for reading the lto file.

2. Extending LTO dump infrastructure:

GCC already seems to have dump infrastructure for pretty-printing tree
nodes, gimple statements etc. However I suppose we’d need to extend that
for dumping pass summaries ? For instance, should we add a new hook say
“dump” to ipa_opt_pass_d that’d dump the pass

summary ?

3. Refactoring streaming API - Could you please elaborate more on what
improvements could be made to the streaming API ? Would it be a good idea
to make it more “C++ style” similar to iostream interface ? Also while
going thru ipa-cp/ipa-prop I noticed the following in
ipa_prop_read_functions(), which looks like some kind of “preamble” for
setting up header to read the summary. Perhaps this could be abstracted
into streaming API too ?

const struct lto_function_header *header =

   (const struct lto_function_header *) data;

 const int cfg_offset = sizeof (struct lto_function_header);

 const int main_offset = cfg_offset + header->cfg_size;

 const int string_offset = main_offset + header->main_size;

I would be grateful for suggestions, on how to proceed further, especially
with modifying makefiles for creating the new driver. Unfortunately I have
some school exams next week and won’t be able to work much on GCC during
the period.

Best Regards,

Hrishikesh


On Wed, Feb 28, 2018 at 4:05 PM, Martin Liška  wrote:

> On 02/25/2018 10:46 AM, Martin Jambor wrote:
> > Hello Hrishikesh,
> >
> > I apologize for replying to you this late, this has been a busy week
> > and now I am traveling.
> >
> > On Mon, Feb 19 2018, Hrishikesh Kulkarni wrote:
> >> Hi,
> >>
> >> I am Hrishikesh Kulkarni currently studying as an undergrad student in
> >> Computer Engineering at Pune University, India. I find compilers quite
> >> interesting as a subject,  and would like to apply to GSoC to gain some
> >> understanding of how real-world compilers work. So far, I have managed
> to
> >> build gcc and perform some simple tweaks to the codebase. In
> particular, I
> >> would like to apply to the Textual LTO dump tool project.
> >>
> >
> > I must say I am impressed by the research you have already done.
> > Nevertheless, please note that Ray Kim has also expressed interest in
> > the project.  Martin Liska will be the mentor, so I will let him drive
> > the selection process.  On the other hand, Ray also liked another
> > project, so maybe he will pick that and everyone will be happy.
>
> Hello.
>
> I'm really happy that there are multiple volunteers that want to work on
> LTO dump
> tool project. According to what I've took a look I would like to have
> Hrishikesh
> working on the project. He's got experience with C, C++ and also with
> Python language
> that can be well used for prototyping. Apart from that he's spent quite
> some time
> with investigation of LTO internals in GCC.
>
> That said, may I please ask other candidates to seek for a different GSoC
> project
> we offered? I believe the other topics are also interesting and important
> for the project.
>
> >
> >> As far as I understand, the motivation for LTO framework was to enable
> >> cross file interprocedural optimizations, and for this purpose an ipa
> pass
> >> is divided into following three stages:
> >>
> >>1.
> >>
> >>LGEN: The pass does a local analysis of the function and generates a
> >>“summary”, ie, the information relevant to the pass and writes it to
> LTO
> >>object file.
> >
> > A pass might do that, but the output of the whole stage is not just the
> > pass summaries, it also writes the function IL (the function gimple
> > statements, above all) to the object file.
> >
> >>2.
> >>
> >>WPA: The LTO object files are given as input to the linker, which
> then
> >>invokes the lto1 frontend to perform global ipa analysis over the
> >>call-graph and write optimized summaries to LTO object files
> >>(partitioning). The global ipa analysis is done over summary and not
> the
> >>actual function bodies.
> >
> > Well... note that partitioning actually means dividing the whole
> > compiled program/library into chunks that are then compiled
> > independently in the LTRANS stage.  But you are basically right that WPA
> > does also do whole-program analysis based on summaries and then writes
> > its decisions to optimization summaries, yes.
> >
> >>3.
> >
> >>
> >>LTRANS: The partitions are read back, and the function bodies are
> >>reconstructed from summary and are then compiled to produce real
> object
> >>files.
> >
> > Function bodies and the summaries are distinct things.  The body
> > consists of gimple statements and all the associated stuff (such as
> > types, so a lot of stuff), whereas when we refer 

Re: GSOC 2018 - Textual LTO dump tool project

2018-02-28 Thread Martin Liška
On 02/25/2018 10:46 AM, Martin Jambor wrote:
> Hello Hrishikesh,
> 
> I apologize for replying to you this late, this has been a busy week
> and now I am traveling.
> 
> On Mon, Feb 19 2018, Hrishikesh Kulkarni wrote:
>> Hi,
>>
>> I am Hrishikesh Kulkarni currently studying as an undergrad student in
>> Computer Engineering at Pune University, India. I find compilers quite
>> interesting as a subject,  and would like to apply to GSoC to gain some
>> understanding of how real-world compilers work. So far, I have managed to
>> build gcc and perform some simple tweaks to the codebase. In particular, I
>> would like to apply to the Textual LTO dump tool project.
>>
> 
> I must say I am impressed by the research you have already done.
> Nevertheless, please note that Ray Kim has also expressed interest in
> the project.  Martin Liska will be the mentor, so I will let him drive
> the selection process.  On the other hand, Ray also liked another
> project, so maybe he will pick that and everyone will be happy.

Hello.

I'm really happy that there are multiple volunteers that want to work on LTO 
dump
tool project. According to what I've took a look I would like to have Hrishikesh
working on the project. He's got experience with C, C++ and also with Python 
language
that can be well used for prototyping. Apart from that he's spent quite some 
time
with investigation of LTO internals in GCC.

That said, may I please ask other candidates to seek for a different GSoC 
project
we offered? I believe the other topics are also interesting and important
for the project.

> 
>> As far as I understand, the motivation for LTO framework was to enable
>> cross file interprocedural optimizations, and for this purpose an ipa pass
>> is divided into following three stages:
>>
>>1.
>>
>>LGEN: The pass does a local analysis of the function and generates a
>>“summary”, ie, the information relevant to the pass and writes it to LTO
>>object file.
> 
> A pass might do that, but the output of the whole stage is not just the
> pass summaries, it also writes the function IL (the function gimple
> statements, above all) to the object file.
> 
>>2.
>>
>>WPA: The LTO object files are given as input to the linker, which then
>>invokes the lto1 frontend to perform global ipa analysis over the
>>call-graph and write optimized summaries to LTO object files
>>(partitioning). The global ipa analysis is done over summary and not the
>>actual function bodies.
> 
> Well... note that partitioning actually means dividing the whole
> compiled program/library into chunks that are then compiled
> independently in the LTRANS stage.  But you are basically right that WPA
> does also do whole-program analysis based on summaries and then writes
> its decisions to optimization summaries, yes.
> 
>>3.
> 
>>
>>LTRANS: The partitions are read back, and the function bodies are
>>reconstructed from summary and are then compiled to produce real object
>>files.
> 
> Function bodies and the summaries are distinct things.  The body
> consists of gimple statements and all the associated stuff (such as
> types, so a lot of stuff), whereas when we refer to summaries, we mean
> small chunks of data that interprocedural optimizations such as inlining
> or IPA-CP scurry away because they cannot feasibly work on bodies of the
> entire program.
> 
> But apart from this terminology issue, you are basically correct, at the
> LTRANS stage, IPA passes apply transformations to the bodies according
> to the optimization summary generated by the WPA phase.  And then, all
> normal, intra-procedural passes and code generation runs.
> 
>>
>>
>> If I understand correctly, the motivation for textual LTO dump tool is to
>> easily analyze contents of LTO object file, similar to readelf or objdump ?

Yes. Richi in previous email defined how that could be done.

> 
> That is how I understand it too, but Martin may have some further uses
> in mind.
> 
>>
>> Assume that LTO object file contains in pureconst section: 0b0110 (0b for
>> binary prefix) corresponding to values of fs->pure_const_state and
>> fs->state_previously_known.
>>
>> If I understand correctly, the output of dump tool should then be:
>>
>> pure_const pass:
>>
>> pure_const_state = IPA_PURE (enum value of pure_const_state_e corresponding
>> to 0b01)
>>
>> state_previously_known = IPA_NEITHER (enum value of pure_const_state_e
>> corresponding to 0b10)
>>
>> Is this the expected output of the dump tool ?
> 
> I think the tool would have to a bit more than just dumping summaries of
> IPA passes.  I tend to think that the task should also include dumping
> gimple bodies (but we already do that in GCC and so it should be mostly
> easy) and also of types (that are merged as one of the first steps of
> WPA and interesting things happen when mergingit does something
> "interesting").  And perhaps quite a bit more.  Martin?

Yes, as we transitioned to early-debug info in LTO mode, 

Re: GSOC 2018 - Textual LTO dump tool project

2018-02-27 Thread Richard Biener
On Sun, Feb 25, 2018 at 10:46 AM, Martin Jambor  wrote:
> Hello Hrishikesh,
>
> I apologize for replying to you this late, this has been a busy week
> and now I am traveling.
>
> On Mon, Feb 19 2018, Hrishikesh Kulkarni wrote:
>> Hi,
>>
>> I am Hrishikesh Kulkarni currently studying as an undergrad student in
>> Computer Engineering at Pune University, India. I find compilers quite
>> interesting as a subject,  and would like to apply to GSoC to gain some
>> understanding of how real-world compilers work. So far, I have managed to
>> build gcc and perform some simple tweaks to the codebase. In particular, I
>> would like to apply to the Textual LTO dump tool project.
>>
>
> I must say I am impressed by the research you have already done.
> Nevertheless, please note that Ray Kim has also expressed interest in
> the project.  Martin Liska will be the mentor, so I will let him drive
> the selection process.  On the other hand, Ray also liked another
> project, so maybe he will pick that and everyone will be happy.
>
>> As far as I understand, the motivation for LTO framework was to enable
>> cross file interprocedural optimizations, and for this purpose an ipa pass
>> is divided into following three stages:
>>
>>1.
>>
>>LGEN: The pass does a local analysis of the function and generates a
>>“summary”, ie, the information relevant to the pass and writes it to LTO
>>object file.
>
> A pass might do that, but the output of the whole stage is not just the
> pass summaries, it also writes the function IL (the function gimple
> statements, above all) to the object file.
>
>>2.
>>
>>WPA: The LTO object files are given as input to the linker, which then
>>invokes the lto1 frontend to perform global ipa analysis over the
>>call-graph and write optimized summaries to LTO object files
>>(partitioning). The global ipa analysis is done over summary and not the
>>actual function bodies.
>
> Well... note that partitioning actually means dividing the whole
> compiled program/library into chunks that are then compiled
> independently in the LTRANS stage.  But you are basically right that WPA
> does also do whole-program analysis based on summaries and then writes
> its decisions to optimization summaries, yes.
>
>>3.
>
>>
>>LTRANS: The partitions are read back, and the function bodies are
>>reconstructed from summary and are then compiled to produce real object
>>files.
>
> Function bodies and the summaries are distinct things.  The body
> consists of gimple statements and all the associated stuff (such as
> types, so a lot of stuff), whereas when we refer to summaries, we mean
> small chunks of data that interprocedural optimizations such as inlining
> or IPA-CP scurry away because they cannot feasibly work on bodies of the
> entire program.
>
> But apart from this terminology issue, you are basically correct, at the
> LTRANS stage, IPA passes apply transformations to the bodies according
> to the optimization summary generated by the WPA phase.  And then, all
> normal, intra-procedural passes and code generation runs.
>
>>
>>
>> If I understand correctly, the motivation for textual LTO dump tool is to
>> easily analyze contents of LTO object file, similar to readelf or objdump ?
>
> That is how I understand it too, but Martin may have some further uses
> in mind.
>
>>
>> Assume that LTO object file contains in pureconst section: 0b0110 (0b for
>> binary prefix) corresponding to values of fs->pure_const_state and
>> fs->state_previously_known.
>>
>> If I understand correctly, the output of dump tool should then be:
>>
>> pure_const pass:
>>
>> pure_const_state = IPA_PURE (enum value of pure_const_state_e corresponding
>> to 0b01)
>>
>> state_previously_known = IPA_NEITHER (enum value of pure_const_state_e
>> corresponding to 0b10)
>>
>> Is this the expected output of the dump tool ?
>
> I think the tool would have to a bit more than just dumping summaries of
> IPA passes.  I tend to think that the task should also include dumping
> gimple bodies (but we already do that in GCC and so it should be mostly
> easy) and also of types (that are merged as one of the first steps of
> WPA and interesting things happen when mergingit does something
> "interesting").  And perhaps quite a bit more.  Martin?
>
>>
>> I am reasonably familiar working with C, C++ and python. My prior
>> experience includes opportunities to work in areas of NLP. Some of my
>> accomplishments in the area include presenting project VicharDhara- A
>> thought Mapper that was selected among top five ideas in Accenture
>> Innovation Challenge among 7000 nationwide entries. My paper on this topic
>> won the best paper award in IEEE Conference ICCUBEA-2017. My previous work
>> was focused on simple parsers, student psychology, thought process
>> detection for team selection.
>
> Interesting, congratulations.
>
>>
>> In the interim, I have been through a few docs on GCC and LTO [1][2][3] and
>> 

Re: GSOC 2018 - Textual LTO dump tool project

2018-02-25 Thread Martin Jambor
Hello Hrishikesh,

I apologize for replying to you this late, this has been a busy week
and now I am traveling.

On Mon, Feb 19 2018, Hrishikesh Kulkarni wrote:
> Hi,
>
> I am Hrishikesh Kulkarni currently studying as an undergrad student in
> Computer Engineering at Pune University, India. I find compilers quite
> interesting as a subject,  and would like to apply to GSoC to gain some
> understanding of how real-world compilers work. So far, I have managed to
> build gcc and perform some simple tweaks to the codebase. In particular, I
> would like to apply to the Textual LTO dump tool project.
>

I must say I am impressed by the research you have already done.
Nevertheless, please note that Ray Kim has also expressed interest in
the project.  Martin Liska will be the mentor, so I will let him drive
the selection process.  On the other hand, Ray also liked another
project, so maybe he will pick that and everyone will be happy.

> As far as I understand, the motivation for LTO framework was to enable
> cross file interprocedural optimizations, and for this purpose an ipa pass
> is divided into following three stages:
>
>1.
>
>LGEN: The pass does a local analysis of the function and generates a
>“summary”, ie, the information relevant to the pass and writes it to LTO
>object file.

A pass might do that, but the output of the whole stage is not just the
pass summaries, it also writes the function IL (the function gimple
statements, above all) to the object file.

>2.
>
>WPA: The LTO object files are given as input to the linker, which then
>invokes the lto1 frontend to perform global ipa analysis over the
>call-graph and write optimized summaries to LTO object files
>(partitioning). The global ipa analysis is done over summary and not the
>actual function bodies.

Well... note that partitioning actually means dividing the whole
compiled program/library into chunks that are then compiled
independently in the LTRANS stage.  But you are basically right that WPA
does also do whole-program analysis based on summaries and then writes
its decisions to optimization summaries, yes.

>3.

>
>LTRANS: The partitions are read back, and the function bodies are
>reconstructed from summary and are then compiled to produce real object
>files.

Function bodies and the summaries are distinct things.  The body
consists of gimple statements and all the associated stuff (such as
types, so a lot of stuff), whereas when we refer to summaries, we mean
small chunks of data that interprocedural optimizations such as inlining
or IPA-CP scurry away because they cannot feasibly work on bodies of the
entire program.

But apart from this terminology issue, you are basically correct, at the
LTRANS stage, IPA passes apply transformations to the bodies according
to the optimization summary generated by the WPA phase.  And then, all
normal, intra-procedural passes and code generation runs.

>
>
> If I understand correctly, the motivation for textual LTO dump tool is to
> easily analyze contents of LTO object file, similar to readelf or objdump ?

That is how I understand it too, but Martin may have some further uses
in mind.

>
> Assume that LTO object file contains in pureconst section: 0b0110 (0b for
> binary prefix) corresponding to values of fs->pure_const_state and
> fs->state_previously_known.
>
> If I understand correctly, the output of dump tool should then be:
>
> pure_const pass:
>
> pure_const_state = IPA_PURE (enum value of pure_const_state_e corresponding
> to 0b01)
>
> state_previously_known = IPA_NEITHER (enum value of pure_const_state_e
> corresponding to 0b10)
>
> Is this the expected output of the dump tool ?

I think the tool would have to a bit more than just dumping summaries of
IPA passes.  I tend to think that the task should also include dumping
gimple bodies (but we already do that in GCC and so it should be mostly
easy) and also of types (that are merged as one of the first steps of
WPA and interesting things happen when mergingit does something
"interesting").  And perhaps quite a bit more.  Martin?

>
> I am reasonably familiar working with C, C++ and python. My prior
> experience includes opportunities to work in areas of NLP. Some of my
> accomplishments in the area include presenting project VicharDhara- A
> thought Mapper that was selected among top five ideas in Accenture
> Innovation Challenge among 7000 nationwide entries. My paper on this topic
> won the best paper award in IEEE Conference ICCUBEA-2017. My previous work
> was focused on simple parsers, student psychology, thought process
> detection for team selection.

Interesting, congratulations.

>
> In the interim, I have been through a few docs on GCC and LTO [1][2][3] and
> am trying to write a toy ipa pass to better understand LTO/IPA
> infrastructure. 

Great, I believe that's exactly what my advice would be

> I would be grateful for feedback on the textual LTO dump
> tool.

I hope that Martin