RE: AutoFDO profile toolchain is open-sourced

2015-05-12 Thread Aditya K
Recently we found an ICE while compiling a program with auto-fdo 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972).
The ICE was caused because SSA is not in a valid state when the early inliner 
is run. The fix was to update_ssa before running the early inliner 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972#c4).
However, it remains to be found out which pass caused the SSA to be in that 
state, maybe fixing the problem there would be more appropriate.


-Aditya



> Date: Sat, 9 May 2015 16:33:02 +0200
> From: hubi...@ucw.cz
> To: hiradi...@msn.com
> CC: de...@google.com; i.palac...@samsung.com; davi...@google.com; 
> hubi...@ucw.cz; gcc@gcc.gnu.org; v.bari...@samsung.com; dnovi...@google.com; 
> seb...@gmail.com
> Subject: Re: AutoFDO profile toolchain is open-sourced
>
>>> Yes, it will. But it's not well tuned at all. I will start tuning it
>>> if I have free cycles. It would be great if opensource community can
>>> also contribute to this tuning effort.
>>
>> If you could outline portions of code which needs tuning, rewriting, that 
>> will help get started in this effort.
>
> Optimization passes in GCC are generally designed to work with any kind of 
> edge profile they get.
> There are only few cases where they do care about what profile is around.
>
> At the moment we consider two types of profiles - static (guessed) and FDO. 
> For
> static one we shut down use of profile info for some heuristics - for example
> we do not expect loop trip counts to be reliable in the profiles because they
> are not. You can look for code checking profile_status_for_fn.
>
> Auto-FDO does not have special value for profile_status_for_fn and it goes 
> with
> same code paths for FDO. Dehao has some patches for Auto-FDO tuning but my
> impression is that he got mostly got around by just makng optimizer bit more
> robust for nonsential profiles that is always good, since even FDO profiles 
> can
> get wrong. BTW, Dehao, do you think you can submit these changes for this
> stage1?
>
> I suppose in this case we have yet another kind of profile that is less 
> reliable than
> FDO and we need to start by simply benchmarking and looking for cases where 
> this profile
> gets worse and handle them one by one :)
>
> Honza
  

Re: AutoFDO profile toolchain is open-sourced

2015-05-09 Thread Jan Hubicka
> > Yes, it will. But it's not well tuned at all. I will start tuning it
> > if I have free cycles. It would be great if opensource community can
> > also contribute to this tuning effort.
> 
> If you could outline portions of code which needs tuning, rewriting, that 
> will help get started in this effort.

Optimization passes in GCC are generally designed to work with any kind of edge 
profile they get.
There are only few cases where they do care about what profile is around.

At the moment we consider two types of profiles - static (guessed) and FDO. For
static one we shut down use of profile info for some heuristics - for example
we do not expect loop trip counts to be reliable in the profiles because they
are not.  You can look for code checking profile_status_for_fn.

Auto-FDO does not have special value for profile_status_for_fn and it goes with
same code paths for FDO.  Dehao has some patches for Auto-FDO tuning but my
impression is that he got mostly got around by just makng optimizer bit more
robust for nonsential profiles that is always good, since even FDO profiles can
get wrong.  BTW, Dehao, do you think you can submit these changes for this
stage1?

I suppose in this case we have yet another kind of profile that is less 
reliable than
FDO and we need to start by simply benchmarking and looking for cases where 
this profile
gets worse and handle them one by one :)

Honza


RE: AutoFDO profile toolchain is open-sourced

2015-05-08 Thread Aditya K



> Date: Fri, 8 May 2015 11:19:12 -0700
> Subject: Re: AutoFDO profile toolchain is open-sourced
> From: de...@google.com
> To: i.palac...@samsung.com
> CC: davi...@google.com; hubi...@ucw.cz; gcc@gcc.gnu.org; 
> v.bari...@samsung.com; dnovi...@google.com; seb...@gmail.com
>
> On Fri, May 8, 2015 at 2:00 AM, Ilya Palachev  wrote:
>> On 11.04.2015 01:49, Xinliang David Li wrote:
>>>
>>> On Fri, Apr 10, 2015 at 3:43 PM, Jan Hubicka  wrote:
>>>>>
>>>>> LBR is used for both cfg edge profiling and indirect call Target value
>>>>> profiling.
>>>>
>>>> I see, that makes sense ;) I guess if we want to support profile
>>>> collection
>>>> on targets w/o this feature we could still use one of the algorithms that
>>>> try to guess edge profile from BB profile.
>>>
>>> Our experience with sampling cycles or retired instructions to guess
>>> BB profile has not been great -- the profile quality is significantly
>>> worse than LBR (which can almost match instrumentation based profile).
>>
>> Suppose that I have no opportunity to collect profile on x86 architecture
>> with LBR support and the only available architecture is arm/aarch64 (since
>> the application code is significantly different when compiled for different
>> architectures because of manual optimizations and different function names
>> and structure).
>
> If it's already manually tuned towards architecture (or even
> hand-written inlined-assembly), then I don't think FDO/AutoFDO can
> help much.
>
>>
>> Honza has mentioned that it's possible to guess edge profile from BB
>> profile. How do you think, can this help in the above described situation?
>> Yes, this will be much worse than LBR, but can it give any performance
>> benefit compared with no edge profile at all?
>
> Yes, it will. But it's not well tuned at all. I will start tuning it
> if I have free cycles. It would be great if opensource community can
> also contribute to this tuning effort.

If you could outline portions of code which needs tuning, rewriting, that will 
help get started in this effort.

Thanks,
-Aditya


>
> Cheers,
> Dehao
>
>>
>> --
>> Ilya
  

Re: AutoFDO profile toolchain is open-sourced

2015-05-08 Thread Dehao Chen
On Fri, May 8, 2015 at 2:00 AM, Ilya Palachev  wrote:
> On 11.04.2015 01:49, Xinliang David Li wrote:
>>
>> On Fri, Apr 10, 2015 at 3:43 PM, Jan Hubicka  wrote:

 LBR is used for both cfg edge profiling and indirect call Target value
 profiling.
>>>
>>> I see, that makes sense ;)  I guess if we want to support profile
>>> collection
>>> on targets w/o this feature we could still use one of the algorithms that
>>> try to guess edge profile from BB profile.
>>
>> Our experience with sampling cycles or retired instructions to guess
>> BB profile has not been great -- the profile quality is significantly
>> worse than LBR (which can almost match instrumentation based profile).
>
> Suppose that I have no opportunity to collect profile on x86 architecture
> with LBR support and the only available architecture is arm/aarch64 (since
> the application code is significantly different when compiled for different
> architectures because of manual optimizations and different function names
> and structure).

If it's already manually tuned towards architecture (or even
hand-written inlined-assembly), then I don't think FDO/AutoFDO can
help much.

>
> Honza has mentioned that it's possible to guess edge profile from BB
> profile. How do you think, can this help in the above described situation?
> Yes, this will be much worse than LBR, but can it give any performance
> benefit compared with no edge profile at all?

Yes, it will. But it's not well tuned at all. I will start tuning it
if I have free cycles. It would be great if opensource community can
also contribute to this tuning effort.

Cheers,
Dehao

>
> --
> Ilya


Re: AutoFDO profile toolchain is open-sourced

2015-05-08 Thread Ilya Palachev

On 11.04.2015 01:49, Xinliang David Li wrote:

On Fri, Apr 10, 2015 at 3:43 PM, Jan Hubicka  wrote:

LBR is used for both cfg edge profiling and indirect call Target value
profiling.

I see, that makes sense ;)  I guess if we want to support profile collection
on targets w/o this feature we could still use one of the algorithms that
try to guess edge profile from BB profile.

Our experience with sampling cycles or retired instructions to guess
BB profile has not been great -- the profile quality is significantly
worse than LBR (which can almost match instrumentation based profile).
Suppose that I have no opportunity to collect profile on x86 
architecture with LBR support and the only available architecture is 
arm/aarch64 (since the application code is significantly different when 
compiled for different architectures because of manual optimizations and 
different function names and structure).


Honza has mentioned that it's possible to guess edge profile from BB 
profile. How do you think, can this help in the above described situation?
Yes, this will be much worse than LBR, but can it give any performance 
benefit compared with no edge profile at all?


--
Ilya


Re: AutoFDO profile toolchain is open-sourced

2015-04-27 Thread Dehao Chen
On Mon, Apr 27, 2015 at 7:37 AM, Ilya Palachev  wrote:
> Hi,
>
> On 21.04.2015 20:25, Dehao Chen wrote:
>>
>> OTOH, the most important patch (insn-level discriminator support) is
>> not in yet. Cary has just retired. Do you know if anyone would be
>> interested in porting insn-level discriminator support to trunk?
>
>
> Do you mean r210338, r210397, r210523, r214745 ?

Yes

> Can you explain why these patches are important for autofdo?

Instruction level discriminator support is important to autofdo
because basic block level discriminator is not enough when
instructions are moved to other basic blocks by code motion.
Additionally, gcc backend optimization does not maintain BB level
discriminator well. We need to encode discriminator as part of LOC so
that once the discriminator is assigned to an IR, it will go all the
way to the codegen without being modified.

> What work should be done to port them to current 5 branch?

I think we just need to have these patches in. Or even better,
reimplement this the same way as my lexical block patch
(https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=191494)

> Do you expect them to be applied to 6 branch?

This should go into trunk and be there for all later gcc branches.

Dehao

>
> --
> Ilya


Re: AutoFDO profile toolchain is open-sourced

2015-04-27 Thread Dehao Chen
On Thu, Apr 23, 2015 at 10:31 PM, Jan Hubicka  wrote:
>
> > > It converts with the attached patches, but there's still some problem
> > > parsing the data:
> > >
> > > % ./create_gcov  -binary loop -gcov_version 1 -gcov loop.gcda 
> > > -gcov_version 0x500e
> > > % gcc50 -O2 -fprofile-use loop.c
> > > loop.c:1:0: warning: '/home/andi/src/autofdo/loop.gcda' is version ',
> > > expected version '500e'
> > > %
> >
> > You need to use -fauto-profile=loop.gcda instead of "-fprofile-use",
> > which is only for instrumentation based FDO.
>
> This is indeed not very intuitive. I wonder why it uses the same suffix 
> suggesting
> that sample based and FDO based files are the same?


AutoFDO profile does not need to have any specific suffix. I'll update
the toolchain to make the default output profile as "fbdata.afdo"
instead of "fbdata.gcda".

>
> Would it be possible to at least have this well documented in invoke.texi and 
> perhaps
> we can fix the warning above to say something like "loop.gcda is autofdo 
> profile, use
> -fauto-profile instead of -fprofile-use"?


Sounds good to me. I will send a patch to update invoke.texi. Could
you help fix the warning for profile-use?

Thanks,
Dehao

>
>
> Honza
> >
> > Dehao
> >
> > >
> > > -Andi
> > >


Re: AutoFDO profile toolchain is open-sourced

2015-04-27 Thread Ilya Palachev

Hi,

On 21.04.2015 20:25, Dehao Chen wrote:

OTOH, the most important patch (insn-level discriminator support) is
not in yet. Cary has just retired. Do you know if anyone would be
interested in porting insn-level discriminator support to trunk?


Do you mean r210338, r210397, r210523, r214745 ?
Can you explain why these patches are important for autofdo?
What work should be done to port them to current 5 branch?
Do you expect them to be applied to 6 branch?

--
Ilya


Re: AutoFDO profile toolchain is open-sourced

2015-04-23 Thread Jan Hubicka
> > It converts with the attached patches, but there's still some problem
> > parsing the data:
> >
> > % ./create_gcov  -binary loop -gcov_version 1 -gcov loop.gcda -gcov_version 
> > 0x500e
> > % gcc50 -O2 -fprofile-use loop.c
> > loop.c:1:0: warning: '/home/andi/src/autofdo/loop.gcda' is version ',
> > expected version '500e'
> > %
> 
> You need to use -fauto-profile=loop.gcda instead of "-fprofile-use",
> which is only for instrumentation based FDO.

This is indeed not very intuitive. I wonder why it uses the same suffix 
suggesting
that sample based and FDO based files are the same?
Would it be possible to at least have this well documented in invoke.texi and 
perhaps
we can fix the warning above to say something like "loop.gcda is autofdo 
profile, use
-fauto-profile instead of -fprofile-use"?

Honza
> 
> Dehao
> 
> >
> > -Andi
> >


Re: AutoFDO profile toolchain is open-sourced

2015-04-22 Thread Dehao Chen
Thanks, I'll forward the patches to quipper team.

On Tue, Apr 21, 2015 at 8:47 PM, Andi Kleen  wrote:
> On Wed, Apr 22, 2015 at 05:15:47AM +0200, Andi Kleen wrote:
>> On Tue, Apr 21, 2015 at 01:52:18PM -0700, Dehao Chen wrote:
>> > Andi,
>> >
>> > Thanks for the patches. Turns out that the first 3 patches are already
>> > in, the correct upstream quipper repository is:
>> >
>> > https://chromium.googlesource.com/chromiumos/platform2/+/master/chromiumos-wide-profiling/
>> >
>> > The last 3 patches seem to be local hacks. Do you want any of them in?
>> >
>> > I just did a batch sync with quipper head. Please let me know if this
>> > solves the perf problem.
>>
>> Still outdated:
>>
>> F0421 20:13:16.221422 22297 perf_reader.cc:1614] Check failed: attr_size <= 
>> sizeof(perf_event_attr) (104 vs. 96)
>
> It converts with the attached patches, but there's still some problem
> parsing the data:
>
> % ./create_gcov  -binary loop -gcov_version 1 -gcov loop.gcda -gcov_version 
> 0x500e
> % gcc50 -O2 -fprofile-use loop.c
> loop.c:1:0: warning: '/home/andi/src/autofdo/loop.gcda' is version ',
> expected version '500e'
> %

You need to use -fauto-profile=loop.gcda instead of "-fprofile-use",
which is only for instrumentation based FDO.

Dehao

>
> -Andi
>


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Andi Kleen
On Wed, Apr 22, 2015 at 05:15:47AM +0200, Andi Kleen wrote:
> On Tue, Apr 21, 2015 at 01:52:18PM -0700, Dehao Chen wrote:
> > Andi,
> > 
> > Thanks for the patches. Turns out that the first 3 patches are already
> > in, the correct upstream quipper repository is:
> > 
> > https://chromium.googlesource.com/chromiumos/platform2/+/master/chromiumos-wide-profiling/
> > 
> > The last 3 patches seem to be local hacks. Do you want any of them in?
> > 
> > I just did a batch sync with quipper head. Please let me know if this
> > solves the perf problem.
> 
> Still outdated:
> 
> F0421 20:13:16.221422 22297 perf_reader.cc:1614] Check failed: attr_size <= 
> sizeof(perf_event_attr) (104 vs. 96) 

It converts with the attached patches, but there's still some problem
parsing the data:

% ./create_gcov  -binary loop -gcov_version 1 -gcov loop.gcda -gcov_version 
0x500e
% gcc50 -O2 -fprofile-use loop.c 
loop.c:1:0: warning: '/home/andi/src/autofdo/loop.gcda' is version ',
expected version '500e'
% 

-Andi



autofdo-patches-2.tgz
Description: application/gtar-compressed


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Andi Kleen
On Tue, Apr 21, 2015 at 01:52:18PM -0700, Dehao Chen wrote:
> Andi,
> 
> Thanks for the patches. Turns out that the first 3 patches are already
> in, the correct upstream quipper repository is:
> 
> https://chromium.googlesource.com/chromiumos/platform2/+/master/chromiumos-wide-profiling/
> 
> The last 3 patches seem to be local hacks. Do you want any of them in?
> 
> I just did a batch sync with quipper head. Please let me know if this
> solves the perf problem.

Still outdated:

F0421 20:13:16.221422 22297 perf_reader.cc:1614] Check failed: attr_size <= 
sizeof(perf_event_attr) (104 vs. 96) 

-Andi


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Sebastian Pop
Ok, thanks for the tip of the flag.

You would also need to pass "-use_lbr=false" to create a gcov file for
a device that does not have LBR support.
We tried this on ARM collected profiles and we got the same speedup as
x86 collected profiles on linpack.

Sebastian


On Tue, Apr 21, 2015 at 3:53 PM, Dehao Chen  wrote:
> That's correct. For trunk, gcov_version is 0x1. We defined this as a
> flag so that you can actually change it via --gcov_version=0x1 instead
> of changing the code.
>
> Dehao
>
> On Tue, Apr 21, 2015 at 1:47 PM, Sebastian Pop  wrote:
>> We also needed to adjust the gcov_version in autofdo/gcov.cc to read
>> 0x1 for dev branches of gcc (instead of the current 0x3430372a for
>> some released version of GCC):
>>
>> -DEFINE_uint64(gcov_version, 0x3430372a,
>> +DEFINE_uint64(gcov_version, 0x1,
>>
>> Sebastian
>>
>> On Tue, Apr 21, 2015 at 3:33 PM, Aditya K  wrote:
>>> After patching linux perf. This script collects creates a coverage file 
>>> (e.g., for linpack) which can be used for fdo.
>>>
>>>
>>> gcov=linpack-x86.gcov
>>> MAKE='make'
>>>
>>>
>>> # x86
>>> x86() {
>>> CC=/usr/bin/gcc
>>> CXX=/usr/bin/g++
>>>
>>> export CFLAGS="-Ofast -g3 -static"
>>> export CPPFLAGS=$CFLAGS
>>>
>>> $MAKE -C $SRC/SingleSource/Benchmarks/Linpack clean
>>>
>>> $MAKE -C $SRC/SingleSource/Benchmarks/Linpack -k TEST=simple 
>>> TARGET_LLVMGCC=$CC TARGET_CXX=$CXX LLI_OPTFLAGS= TARGET_CC=$CC 
>>> TARGET_LLVMGXX=$CXX CC_UNDER_TEST_IS_GCC=1 TARGET_FLAGS= 
>>> USE_REFERENCE_OUTPUT=1CC_UNDER_TEST_TARGET_IS_AARCH64=1 OPTFLAGS= 
>>> LLC_OPTFLAGS= ENABLE_OPTIMIZED=1 ARCH=x86_64 ENABLE_HASHED_PROGRAM_OUTPUT=1 
>>> DISABLE_JIT=1
>>>
>>> perfdata=autofdo-linpack/perf-x86.data
>>>
>>> perf record -b -e branch-instructions -o $perfdata 
>>> $SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple
>>>
>>> autofdo/usr/bin/create_gcov 
>>> --binary=$SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple 
>>> --profile=$perfdata --gcov=$gcov
>>>
>>> }
>>>
>>>
>>> hth,
>>> -Aditya
>>>
>>> 
>>>> From: a...@firstfloor.org
>>>> To: i.palac...@samsung.com
>>>> CC: dnovi...@google.com; gcc@gcc.gnu.org; davi...@google.com; 
>>>> hubi...@ucw.cz; seb...@gmail.com; de...@google.com; v.bari...@samsung.com
>>>> Subject: Re: AutoFDO profile toolchain is open-sourced
>>>> Date: Tue, 21 Apr 2015 07:25:10 -0700
>>>>
>>>> Ilya Palachev  writes:
>>>>>
>>>>> But why create_gcov does not inform about that (no branch events were
>>>>> found)? It creates empty gcov file and says nothing :(
>>>>>
>>>>> Moreover, in the mentioned README it is said that perf should also be
>>>>> executed with option -e BR_INST_RETIRED:TAKEN.
>>>>
>>>> Standard perf doesn't have a full event list
>>>> This assumes a perf patched with the libpfm patch.
>>>>
>>>> Also I suspect it really wants to use PEBS events, so pp should be added.
>>>>
>>>> Alternatively you can use ocperf (from
>>>> http://github.com/andikleen/pmu-tools) which is just a wrapper:
>>>>
>>>> ocperf.py record -e br_inst_retired.near_taken:pp -b ...
>>>>
>>>> or specify the event manually (depending on your CPU, like)
>>>>
>>>> perf record -e
>>>> cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp
>>>> -b ...
>>>>
>>>> BTW the biggest problem with autofdo currently is that it is
>>>> quite bitrotten and supports only several years old perf.
>>>> So all of this above will only work with old distributions,
>>>> unless you compile an old perf utility first.
>>>>
>>>> -Andi
>>>>
>>>> --
>>>> a...@linux.intel.com -- Speaking for myself only
>>>


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Dehao Chen
That's correct. For trunk, gcov_version is 0x1. We defined this as a
flag so that you can actually change it via --gcov_version=0x1 instead
of changing the code.

Dehao

On Tue, Apr 21, 2015 at 1:47 PM, Sebastian Pop  wrote:
> We also needed to adjust the gcov_version in autofdo/gcov.cc to read
> 0x1 for dev branches of gcc (instead of the current 0x3430372a for
> some released version of GCC):
>
> -DEFINE_uint64(gcov_version, 0x3430372a,
> +DEFINE_uint64(gcov_version, 0x1,
>
> Sebastian
>
> On Tue, Apr 21, 2015 at 3:33 PM, Aditya K  wrote:
>> After patching linux perf. This script collects creates a coverage file 
>> (e.g., for linpack) which can be used for fdo.
>>
>>
>> gcov=linpack-x86.gcov
>> MAKE='make'
>>
>>
>> # x86
>> x86() {
>> CC=/usr/bin/gcc
>> CXX=/usr/bin/g++
>>
>> export CFLAGS="-Ofast -g3 -static"
>> export CPPFLAGS=$CFLAGS
>>
>> $MAKE -C $SRC/SingleSource/Benchmarks/Linpack clean
>>
>> $MAKE -C $SRC/SingleSource/Benchmarks/Linpack -k TEST=simple 
>> TARGET_LLVMGCC=$CC TARGET_CXX=$CXX LLI_OPTFLAGS= TARGET_CC=$CC 
>> TARGET_LLVMGXX=$CXX CC_UNDER_TEST_IS_GCC=1 TARGET_FLAGS= 
>> USE_REFERENCE_OUTPUT=1CC_UNDER_TEST_TARGET_IS_AARCH64=1 OPTFLAGS= 
>> LLC_OPTFLAGS= ENABLE_OPTIMIZED=1 ARCH=x86_64 ENABLE_HASHED_PROGRAM_OUTPUT=1 
>> DISABLE_JIT=1
>>
>> perfdata=autofdo-linpack/perf-x86.data
>>
>> perf record -b -e branch-instructions -o $perfdata 
>> $SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple
>>
>> autofdo/usr/bin/create_gcov 
>> --binary=$SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple 
>> --profile=$perfdata --gcov=$gcov
>>
>> }
>>
>>
>> hth,
>> -Aditya
>>
>> 
>>> From: a...@firstfloor.org
>>> To: i.palac...@samsung.com
>>> CC: dnovi...@google.com; gcc@gcc.gnu.org; davi...@google.com; 
>>> hubi...@ucw.cz; seb...@gmail.com; de...@google.com; v.bari...@samsung.com
>>> Subject: Re: AutoFDO profile toolchain is open-sourced
>>> Date: Tue, 21 Apr 2015 07:25:10 -0700
>>>
>>> Ilya Palachev  writes:
>>>>
>>>> But why create_gcov does not inform about that (no branch events were
>>>> found)? It creates empty gcov file and says nothing :(
>>>>
>>>> Moreover, in the mentioned README it is said that perf should also be
>>>> executed with option -e BR_INST_RETIRED:TAKEN.
>>>
>>> Standard perf doesn't have a full event list
>>> This assumes a perf patched with the libpfm patch.
>>>
>>> Also I suspect it really wants to use PEBS events, so pp should be added.
>>>
>>> Alternatively you can use ocperf (from
>>> http://github.com/andikleen/pmu-tools) which is just a wrapper:
>>>
>>> ocperf.py record -e br_inst_retired.near_taken:pp -b ...
>>>
>>> or specify the event manually (depending on your CPU, like)
>>>
>>> perf record -e
>>> cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp
>>> -b ...
>>>
>>> BTW the biggest problem with autofdo currently is that it is
>>> quite bitrotten and supports only several years old perf.
>>> So all of this above will only work with old distributions,
>>> unless you compile an old perf utility first.
>>>
>>> -Andi
>>>
>>> --
>>> a...@linux.intel.com -- Speaking for myself only
>>


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Dehao Chen
Andi,

Thanks for the patches. Turns out that the first 3 patches are already
in, the correct upstream quipper repository is:

https://chromium.googlesource.com/chromiumos/platform2/+/master/chromiumos-wide-profiling/

The last 3 patches seem to be local hacks. Do you want any of them in?

I just did a batch sync with quipper head. Please let me know if this
solves the perf problem.

Thanks,
Dehao

On Tue, Apr 21, 2015 at 10:36 AM, Andi Kleen  wrote:
> On Tue, Apr 21, 2015 at 10:27:49AM -0700, Dehao Chen wrote:
>> In that case, we should get quipper fixed upstream to accommodate new
>> format. (Maybe they already fixed it, I will do a batch sync to make
>> quipper up-to-date).
>
> From a quick look at
>
> http://git.chromium.org/gitweb/?p=chromiumos/platform/chromiumos-wide-profiling.git;a=summary
>
> (I assume that is what you mean with upstream)
>
> it hasn't been updated. Is still stuck in 2013.
>
> I'm attaching what patches I have so far.
>
> -Andi


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Sebastian Pop
We also needed to adjust the gcov_version in autofdo/gcov.cc to read
0x1 for dev branches of gcc (instead of the current 0x3430372a for
some released version of GCC):

-DEFINE_uint64(gcov_version, 0x3430372a,
+DEFINE_uint64(gcov_version, 0x1,

Sebastian

On Tue, Apr 21, 2015 at 3:33 PM, Aditya K  wrote:
> After patching linux perf. This script collects creates a coverage file 
> (e.g., for linpack) which can be used for fdo.
>
>
> gcov=linpack-x86.gcov
> MAKE='make'
>
>
> # x86
> x86() {
> CC=/usr/bin/gcc
> CXX=/usr/bin/g++
>
> export CFLAGS="-Ofast -g3 -static"
> export CPPFLAGS=$CFLAGS
>
> $MAKE -C $SRC/SingleSource/Benchmarks/Linpack clean
>
> $MAKE -C $SRC/SingleSource/Benchmarks/Linpack -k TEST=simple 
> TARGET_LLVMGCC=$CC TARGET_CXX=$CXX LLI_OPTFLAGS= TARGET_CC=$CC 
> TARGET_LLVMGXX=$CXX CC_UNDER_TEST_IS_GCC=1 TARGET_FLAGS= 
> USE_REFERENCE_OUTPUT=1CC_UNDER_TEST_TARGET_IS_AARCH64=1 OPTFLAGS= 
> LLC_OPTFLAGS= ENABLE_OPTIMIZED=1 ARCH=x86_64 ENABLE_HASHED_PROGRAM_OUTPUT=1 
> DISABLE_JIT=1
>
> perfdata=autofdo-linpack/perf-x86.data
>
> perf record -b -e branch-instructions -o $perfdata 
> $SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple
>
> autofdo/usr/bin/create_gcov 
> --binary=$SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple 
> --profile=$perfdata --gcov=$gcov
>
> }
>
>
> hth,
> -Aditya
>
> 
>> From: a...@firstfloor.org
>> To: i.palac...@samsung.com
>> CC: dnovi...@google.com; gcc@gcc.gnu.org; davi...@google.com; 
>> hubi...@ucw.cz; seb...@gmail.com; de...@google.com; v.bari...@samsung.com
>> Subject: Re: AutoFDO profile toolchain is open-sourced
>> Date: Tue, 21 Apr 2015 07:25:10 -0700
>>
>> Ilya Palachev  writes:
>>>
>>> But why create_gcov does not inform about that (no branch events were
>>> found)? It creates empty gcov file and says nothing :(
>>>
>>> Moreover, in the mentioned README it is said that perf should also be
>>> executed with option -e BR_INST_RETIRED:TAKEN.
>>
>> Standard perf doesn't have a full event list
>> This assumes a perf patched with the libpfm patch.
>>
>> Also I suspect it really wants to use PEBS events, so pp should be added.
>>
>> Alternatively you can use ocperf (from
>> http://github.com/andikleen/pmu-tools) which is just a wrapper:
>>
>> ocperf.py record -e br_inst_retired.near_taken:pp -b ...
>>
>> or specify the event manually (depending on your CPU, like)
>>
>> perf record -e
>> cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp
>> -b ...
>>
>> BTW the biggest problem with autofdo currently is that it is
>> quite bitrotten and supports only several years old perf.
>> So all of this above will only work with old distributions,
>> unless you compile an old perf utility first.
>>
>> -Andi
>>
>> --
>> a...@linux.intel.com -- Speaking for myself only
>


RE: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Aditya K
After patching linux perf. This script collects creates a coverage file (e.g., 
for linpack) which can be used for fdo.


gcov=linpack-x86.gcov
MAKE='make'


# x86
x86() {
CC=/usr/bin/gcc
CXX=/usr/bin/g++

export CFLAGS="-Ofast -g3 -static"
export CPPFLAGS=$CFLAGS

$MAKE -C $SRC/SingleSource/Benchmarks/Linpack clean

$MAKE -C $SRC/SingleSource/Benchmarks/Linpack -k TEST=simple TARGET_LLVMGCC=$CC 
TARGET_CXX=$CXX LLI_OPTFLAGS= TARGET_CC=$CC TARGET_LLVMGXX=$CXX 
CC_UNDER_TEST_IS_GCC=1 TARGET_FLAGS= USE_REFERENCE_OUTPUT=1        
CC_UNDER_TEST_TARGET_IS_AARCH64=1 OPTFLAGS= LLC_OPTFLAGS= ENABLE_OPTIMIZED=1 
ARCH=x86_64 ENABLE_HASHED_PROGRAM_OUTPUT=1 DISABLE_JIT=1

perfdata=autofdo-linpack/perf-x86.data

perf record -b -e branch-instructions -o $perfdata 
$SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple

autofdo/usr/bin/create_gcov 
--binary=$SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple 
--profile=$perfdata --gcov=$gcov

}


hth,
-Aditya


> From: a...@firstfloor.org
> To: i.palac...@samsung.com
> CC: dnovi...@google.com; gcc@gcc.gnu.org; davi...@google.com; hubi...@ucw.cz; 
> seb...@gmail.com; de...@google.com; v.bari...@samsung.com
> Subject: Re: AutoFDO profile toolchain is open-sourced
> Date: Tue, 21 Apr 2015 07:25:10 -0700
>
> Ilya Palachev  writes:
>>
>> But why create_gcov does not inform about that (no branch events were
>> found)? It creates empty gcov file and says nothing :(
>>
>> Moreover, in the mentioned README it is said that perf should also be
>> executed with option -e BR_INST_RETIRED:TAKEN.
>
> Standard perf doesn't have a full event list
> This assumes a perf patched with the libpfm patch.
>
> Also I suspect it really wants to use PEBS events, so pp should be added.
>
> Alternatively you can use ocperf (from
> http://github.com/andikleen/pmu-tools) which is just a wrapper:
>
> ocperf.py record -e br_inst_retired.near_taken:pp -b ...
>
> or specify the event manually (depending on your CPU, like)
>
> perf record -e
> cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp
> -b ...
>
> BTW the biggest problem with autofdo currently is that it is
> quite bitrotten and supports only several years old perf.
> So all of this above will only work with old distributions,
> unless you compile an old perf utility first.
>
> -Andi
>
> --
> a...@linux.intel.com -- Speaking for myself only
  

Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Andi Kleen
On Tue, Apr 21, 2015 at 10:27:49AM -0700, Dehao Chen wrote:
> In that case, we should get quipper fixed upstream to accommodate new
> format. (Maybe they already fixed it, I will do a batch sync to make
> quipper up-to-date).

>From a quick look at 

http://git.chromium.org/gitweb/?p=chromiumos/platform/chromiumos-wide-profiling.git;a=summary

(I assume that is what you mean with upstream)

it hasn't been updated. Is still stuck in 2013.

I'm attaching what patches I have so far.

-Andi


autofdo-newer-perf-0.tgz
Description: application/gtar-compressed


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Dehao Chen
In that case, we should get quipper fixed upstream to accommodate new
format. (Maybe they already fixed it, I will do a batch sync to make
quipper up-to-date).

Dehao

On Tue, Apr 21, 2015 at 10:24 AM, Andi Kleen  wrote:
>> > BTW the biggest problem with autofdo currently is that it is
>> > quite bitrotten and supports only several years old perf.
>> > So all of this above will only work with old distributions,
>> > unless you compile an old perf utility first.
>>
>> Do you mean newer perf does not support LBR (-b) any more?
>
> No.
>
> perf extended its perf.data output format, and quipper cannot parse
> any of the extensions, so it just bombs out with assertation
> failures.
>
> I have a patch to hack around some of this, but still
> couldn't get it actually to work so far.
>
> -Andi
> --
> a...@linux.intel.com -- Speaking for myself only.


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Dehao Chen
I'll get to it soon. When will stage1 close?

OTOH, the most important patch (insn-level discriminator support) is
not in yet. Cary has just retired. Do you know if anyone would be
interested in porting insn-level discriminator support to trunk?

Dehao

On Tue, Apr 21, 2015 at 8:59 AM, Jan Hubicka  wrote:
>> You can use dump_gcov to show a text version of the profile dump and
>> check if the profile data makes sense. If your program is just a very
>> tight single loop, the current implementation in trunk may not yield
>> good results because it does not have discriminator support. Try the
>> google-4_9 branch instead.
>
> Can we possibly merge the remaining patches now when stage1 is open?
>
> Honza
>>
>> Dehao
>>
>> >
>> >
>> > --
>> > Best regards,
>> > Ilya Palachev


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Andi Kleen
> > BTW the biggest problem with autofdo currently is that it is
> > quite bitrotten and supports only several years old perf.
> > So all of this above will only work with old distributions,
> > unless you compile an old perf utility first.
> 
> Do you mean newer perf does not support LBR (-b) any more?

No.

perf extended its perf.data output format, and quipper cannot parse
any of the extensions, so it just bombs out with assertation
failures.

I have a patch to hack around some of this, but still
couldn't get it actually to work so far.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Jan Hubicka
> You can use dump_gcov to show a text version of the profile dump and
> check if the profile data makes sense. If your program is just a very
> tight single loop, the current implementation in trunk may not yield
> good results because it does not have discriminator support. Try the
> google-4_9 branch instead.

Can we possibly merge the remaining patches now when stage1 is open?

Honza
> 
> Dehao
> 
> >
> >
> > --
> > Best regards,
> > Ilya Palachev


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Dehao Chen
On Tue, Apr 21, 2015 at 7:25 AM, Andi Kleen  wrote:
> Ilya Palachev  writes:
>>
>> But why create_gcov does not inform about that (no branch events were
>> found)? It creates empty gcov file and says nothing :(
>>
>> Moreover, in the mentioned README it is said that perf should also be
>> executed with option -e BR_INST_RETIRED:TAKEN.
>
> Standard perf doesn't have a full event list
> This assumes a perf patched with the libpfm patch.
>
> Also I suspect it really wants to use PEBS events, so pp should be added.
>
> Alternatively you can use ocperf (from
> http://github.com/andikleen/pmu-tools) which is just a wrapper:
>
> ocperf.py record -e br_inst_retired.near_taken:pp -b ...
>
> or specify the event manually (depending on your CPU, like)
>
> perf record -e
> cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp
> -b ...
>
> BTW the biggest problem with autofdo currently is that it is
> quite bitrotten and supports only several years old perf.
> So all of this above will only work with old distributions,
> unless you compile an old perf utility first.

Do you mean newer perf does not support LBR (-b) any more?

Dehao

>
> -Andi
>
> --
> a...@linux.intel.com -- Speaking for myself only


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Dehao Chen
On Tue, Apr 21, 2015 at 6:42 AM, Ilya Palachev  wrote:
> On 21.04.2015 14:57, Diego Novillo wrote:
>>
>> >From the autofdo page: https://github.com/google/autofdo
>>
>> [ ... ]
>> Inputs:
>>
>> --profile: PERF_PROFILE collected using linux perf (with last branch
>> record).
>> In order to collect this profile, you will need to have an Intel CPU that
>> have last branch record (LBR) support. You also need to have your linux
>> kernel configured with LBR support. To profile:
>> # perf record -c PERIOD -e EVENT -b -o perf.data -- ./command
>> EVENT is refering to BR_INST_RETIRED:TAKEN if available. For some
>> architectures, BR_INST_EXEC:TAKEN also works.
>> [ ... ]
>>
>> The important one for autofdo is -b. It asks perf to use LBR registers
>> for branch tracking (assuming your architecture supports it).
>
>
> Thanks! It worked. Now big programs produce big gcov files. Sorry for this
> confusing message.
>
> But why create_gcov does not inform about that (no branch events were
> found)? It creates empty gcov file and says nothing :(
>
> Moreover, in the mentioned README it is said that perf should also be
> executed with option -e BR_INST_RETIRED:TAKEN.
> I tried to add it but perf said that
>
>invalid or unsupported event: 'BR_INST_RETIRED:TAKEN'
>Run 'perf list' for a list of valid events
>
> For my architecture x86_64 the perf list contains
>
>$ sudo perf list | grep -i br
>   branch-instructions OR branches[Hardware event]
>   branch-misses  [Hardware event]
>   branch-loads   [Hardware
>cache event]
>   branch-load-misses [Hardware
>cache event]
>   branch-instructions OR cpu/branch-instructions/[Kernel PMU event]
>   branch-misses OR cpu/branch-misses/[Kernel PMU event]
>   mem:[:access] [Hardware breakpoint]
>   syscalls:sys_enter_brk [Tracepoint event]
>   syscalls:sys_exit_brk  [Tracepoint event]
>
> There is no BR_INST_RETIRED:TAKEN there. Do you use some specific
> configuration of perf for that?
>
> However, I tried to use option "-e branch-instructions". Before that the
> following error was obtained:
>
>E0421 15:57:39.308374 11551 perf_parser.cc:210] Mapped 50% of
>samples, expected at least 95%
>
> and now it disappeared (because of option "-e branch-instructions").
>
> Though, the performance decreases after adding option
> "-fauto-profile=file.gcov" or "-fprofile-use=file.gcov" to the list of
> compiler options.
> The program becomes 10% slower than before.
> Can you explain that? Maybe I should configure perf so that it will be able
> to collect events BR_INST_RETIRED:TAKEN ? How can it be done?

You can use dump_gcov to show a text version of the profile dump and
check if the profile data makes sense. If your program is just a very
tight single loop, the current implementation in trunk may not yield
good results because it does not have discriminator support. Try the
google-4_9 branch instead.

Dehao

>
>
> --
> Best regards,
> Ilya Palachev


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Andi Kleen
Ilya Palachev  writes:
>
> But why create_gcov does not inform about that (no branch events were
> found)? It creates empty gcov file and says nothing :(
>
> Moreover, in the mentioned README it is said that perf should also be
> executed with option -e BR_INST_RETIRED:TAKEN.

Standard perf doesn't have a full event list
This assumes a perf patched with the libpfm patch.

Also I suspect it really wants to use PEBS events, so pp should be added.

Alternatively you can use ocperf (from
http://github.com/andikleen/pmu-tools) which is just a wrapper:

ocperf.py record -e br_inst_retired.near_taken:pp -b ... 

or specify the event manually (depending on your CPU, like)

perf record -e
cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp
-b ...

BTW the biggest problem with autofdo currently is that it is
quite bitrotten and supports only several years old perf.
So all of this above will only work with old distributions,
unless you compile an old perf utility first.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Ilya Palachev

On 21.04.2015 14:57, Diego Novillo wrote:

>From the autofdo page: https://github.com/google/autofdo

[ ... ]
Inputs:

--profile: PERF_PROFILE collected using linux perf (with last branch record).
In order to collect this profile, you will need to have an Intel CPU that
have last branch record (LBR) support. You also need to have your linux
kernel configured with LBR support. To profile:
# perf record -c PERIOD -e EVENT -b -o perf.data -- ./command
EVENT is refering to BR_INST_RETIRED:TAKEN if available. For some
architectures, BR_INST_EXEC:TAKEN also works.
[ ... ]

The important one for autofdo is -b. It asks perf to use LBR registers
for branch tracking (assuming your architecture supports it).


Thanks! It worked. Now big programs produce big gcov files. Sorry for 
this confusing message.


But why create_gcov does not inform about that (no branch events were 
found)? It creates empty gcov file and says nothing :(


Moreover, in the mentioned README it is said that perf should also be 
executed with option -e BR_INST_RETIRED:TAKEN.

I tried to add it but perf said that

   invalid or unsupported event: 'BR_INST_RETIRED:TAKEN'
   Run 'perf list' for a list of valid events

For my architecture x86_64 the perf list contains

   $ sudo perf list | grep -i br
  branch-instructions OR branches[Hardware event]
  branch-misses  [Hardware event]
  branch-loads   [Hardware
   cache event]
  branch-load-misses [Hardware
   cache event]
  branch-instructions OR cpu/branch-instructions/[Kernel PMU event]
  branch-misses OR cpu/branch-misses/[Kernel PMU event]
  mem:[:access] [Hardware breakpoint]
  syscalls:sys_enter_brk [Tracepoint event]
  syscalls:sys_exit_brk  [Tracepoint event]

There is no BR_INST_RETIRED:TAKEN there. Do you use some specific 
configuration of perf for that?


However, I tried to use option "-e branch-instructions". Before that the 
following error was obtained:


   E0421 15:57:39.308374 11551 perf_parser.cc:210] Mapped 50% of
   samples, expected at least 95%

and now it disappeared (because of option "-e branch-instructions").

Though, the performance decreases after adding option 
"-fauto-profile=file.gcov" or "-fprofile-use=file.gcov" to the list of 
compiler options.

The program becomes 10% slower than before.
Can you explain that? Maybe I should configure perf so that it will be 
able to collect events BR_INST_RETIRED:TAKEN ? How can it be done?


--
Best regards,
Ilya Palachev


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Diego Novillo
On Tue, Apr 21, 2015 at 6:33 AM, Ilya Palachev  wrote:
> ping?
>
> On 15.04.2015 10:41, Ilya Palachev wrote:
>>
>> Hi,
>>
>> One more question.
>>
> Does anybody know with which options should the perf be executed so that to
> collect appropriate data for the autofdo converter?

>From the autofdo page: https://github.com/google/autofdo

[ ... ]
Inputs:

--profile: PERF_PROFILE collected using linux perf (with last branch record).
In order to collect this profile, you will need to have an Intel CPU that
have last branch record (LBR) support. You also need to have your linux
kernel configured with LBR support. To profile:
# perf record -c PERIOD -e EVENT -b -o perf.data -- ./command
EVENT is refering to BR_INST_RETIRED:TAKEN if available. For some
architectures, BR_INST_EXEC:TAKEN also works.
[ ... ]

The important one for autofdo is -b. It asks perf to use LBR registers
for branch tracking (assuming your architecture supports it).

The binary you run under perf should also have line table information
(compiled with -gmlt) to produce location support for autofdo.


Diego.


Re: AutoFDO profile toolchain is open-sourced

2015-04-21 Thread Ilya Palachev

ping?

On 15.04.2015 10:41, Ilya Palachev wrote:

Hi,

One more question.

Does anybody know with which options should the perf be executed so that 
to collect appropriate data for the autofdo converter?
I obtain the same data for different programs, and it seems to be empty 
(1600 Bytes).

They have the same md5sum for different programs:

   # Data for simple program with 30 lines of code:
   $ md5sum ytest.gcov
   d85481c9154aa606ce4893b64fe109e7  ytest.gcov

   # Data for program of 3D Delaunay triangulation construction of
   100 points.
   $ md5sum experimentCGAL_convexHullDynamic.gcov
   d85481c9154aa606ce4893b64fe109e7 experimentCGAL_convexHullDynamic.gcov


We tried to collect perf data using option --call-graph fp but it does 
not help: the output gcov data is still the same.

Sometimes create_gcov reports the following error:
E0421 13:10:37.125629  8732 perf_parser.cc:209] Mapped 50% of samples, 
expected at least 95%


But it does not mean that there are not enough samples collected in the 
profile, because 99% of samples are mapped in the case of very simple 
program (with 1 function).

I try to find working case for more than a week but did not suceeded.

Can anybody show me that create_gcov works at least for one case?

--
Best regards,
Ilya Palachev




Re: AutoFDO profile toolchain is open-sourced

2015-04-15 Thread Ilya Palachev

Hi,

One more question.

On 10.04.2015 23:39, Jan Hubicka wrote:

I must say I did not even try running AutoFDO myself (so I am happy to hear
it works).


I tried to use executable create_gcov built from AutoFDO repository at 
github.
The problem is that the data generated by this program has size 1600 
bytes not depending on the profile data given to it.

Steps to reproduce the issue:

1. Build AutoFDO under x86_64

2. Build, for example, the benchmark ytest.c (see attachment):

   g++ -O2 -o ytest ytest.c -g2

(I used g++ that was built just now from gcc-5-branch branch from 
git://gcc.gnu.org/git/gcc.git)


3. Run it under perf to collect the profile data:

   sudo perf record ./ytest


The perf reports no error and says that

   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.125 MB perf.data (~5442 samples) ]


Perf generates perf.data.

4. Run create_gcov on the obtained data:

   create_gcov --binary ytest --profile perf.data --gcov ytest.gcov
   --debug_dump

It creates 2 files:
* ytest.gcov which is 1600 bytes of size
* ytest.gcov.imports which is empty

Also there is no debug output from the program.
If I run create_llvm_prof on the data

   create_llvm_prof --binary ytest --profile perf.data --out ytest.out
   --debug_dump

It reports the following log:

   Length of symbol map: 1
   Number of functions:  0

and creates an empty file ytest.out.

Which is not true: all functions in the benchmark are marked with 
__attribute__((noinline)) and readelf says that they stay in the binary:


   readelf -s ytest | grep px_cycle
56: 00400640   111 FUNCGLOBAL DEFAULT   12 _Z8px_cyclei
   readelf -s ytest | grep py_cycle
60: 004006b036 FUNCGLOBAL DEFAULT   12 _Z8py_cyclev

The size of resulting gcov data is the same (1600 bytes) for different 
levels of debug information (-g0, -g1, -g2) and for different input 
sources files.


What am I doing wrong?

--
Best regards,
Ilya Palachev

#define DX (480*4)

#define DY (640*4)

int* src = new int[DX*DY];
int* dst = new int[DX*DY];
int pxm = DX;
int pym = DY;

void px_cycle(int py) __attribute__((noinline));
void px_cycle(int py) {
int *p1 = dst + (py*pxm);
int *p2 = src + (pym - py - 1);
for (int px = 0; px < pxm; px++) {
if (px < pym && py < pxm) {
*p1 = *p2;
}
p1++;
p2 += pym;
}
}

void py_cycle() __attribute__((noinline));
void py_cycle() {
for (int py = 0; py < pym; py++) {
px_cycle(py);
}
}

int main() {
int i;
for (i = 0; i < 100; i++) {
py_cycle();
}
return 0;
}


Re: AutoFDO profile toolchain is open-sourced

2015-04-10 Thread Xinliang David Li
On Fri, Apr 10, 2015 at 3:43 PM, Jan Hubicka  wrote:
>> LBR is used for both cfg edge profiling and indirect call Target value
>> profiling.
> I see, that makes sense ;)  I guess if we want to support profile collection
> on targets w/o this feature we could still use one of the algorithms that
> try to guess edge profile from BB profile.

Our experience with sampling cycles or retired instructions to guess
BB profile has not been great -- the profile quality is significantly
worse than LBR (which can almost match instrumentation based profile).

David

>
> Honza


Re: AutoFDO profile toolchain is open-sourced

2015-04-10 Thread Xinliang David Li
On Tue, Apr 7, 2015 at 7:45 AM, Ilya Palachev  wrote:
> Hi,
>
> Here are some questions about AutoFDO.
>
> On 08.05.2014 02:55, Dehao Chen wrote:
>>
>> We have open-sourced AutoFDO profile toolchain in:
>>
>> https://github.com/google/autofdo
>>
>> For GCC developers, the most important tool is create_gcov, which
>> converts sampling based profile to GCC-readable profile. Please refer
>> to the readme file
>> (https://raw.githubusercontent.com/google/autofdo/master/README) for
>> more details.
>
>
> In the mentioned README file it is said that " In order to collect this
> profile, you will need to have an Intel CPU that have last branch record
> (LBR) support." Is this information obsolete? Chrome Canary builds use
> AutoFDO for ARMv7l
> (https://code.google.com/p/chromium/issues/detail?id=434587)
>
> What about Aarch64 support? Is it supported?

As mentioned by Sebastian, the current solution is to collect profile
on Intel platform (with LBR support) and cross optimize arm/aarch64
target.

AutoFDO support with other PMU events (cycles, retired instructions
etc) still needs more tuning to match FDO performance.

>
>> To use the profile, one need to checkout
>> https://gcc.gnu.org/svn/gcc/branches/google/gcc-4_8. We are working on
>> porting AutoFDO to trunk
>> (http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00438.html).
>
>
> For now AutoFDO was merged into gcc-5.0 (trunk) branch.
> Is it possible to backport it to 4.9 branch? Can you estimate required
> efforts for that?

The google gcc49 branch has the autofdo support.

David
>
>>
>> We have limited doc inside the open-sourced package, and we are
>> planning to add more content to the wiki page
>> (https://github.com/google/autofdo/wiki). Feel free to send me emails
>> or discuss on github if you have any questions.
>>
>> Cheers,
>> Dehao
>
>
> --
> Best regards,
> Ilya


Re: AutoFDO profile toolchain is open-sourced

2015-04-10 Thread Jan Hubicka
> LBR is used for both cfg edge profiling and indirect call Target value
> profiling.
I see, that makes sense ;)  I guess if we want to support profile collection
on targets w/o this feature we could still use one of the algorithms that
try to guess edge profile from BB profile.

Honza


Re: AutoFDO profile toolchain is open-sourced

2015-04-10 Thread Xinliang David Li
LBR is used for both cfg edge profiling and indirect call Target value
profiling.

David

On Fri, Apr 10, 2015 at 3:26 PM, Xinliang David Li  wrote:
> LBR is used for both cfg edge profiling and indirect call Target value
> profiling.
>
> David
>
> On Apr 10, 2015 10:39 AM, "Jan Hubicka"  wrote:
>>
>> > On Tue, Apr 7, 2015 at 9:45 AM, Ilya Palachev 
>> > wrote:
>> > > In the mentioned README file it is said that " In order to collect
>> > > this
>> > > profile, you will need to have an Intel CPU that have last branch
>> > > record
>> > > (LBR) support." Is this information obsolete? Chrome Canary builds use
>> > > AutoFDO for ARMv7l
>> > > (https://code.google.com/p/chromium/issues/detail?id=434587)
>> >
>> > It does not mean that the profile was recorded on an ARM system: they
>> > can gather perf.data on x86 and then produce a coverage file that is
>> > then used in ARM compiles.  I tried it and seems to work well.
>>
>> I must say I did not even try running AutoFDO myself (so I am happy to
>> hear
>> it works). My understanding is that you need LBR only to get indirect
>> call profiling working (i.e. you want to know from where the indirect
>> function is called).
>>
>> Depending on your application this may not be the most important thing to
>> record (either you don't have indirect calls in hot paths or they are
>> handled
>> resonably by speculative devirtualization)
>>
>> Some ARMs also has support for tracing jump pairs, right?
>> Honza
>> >
>> > Sebastian


Re: AutoFDO profile toolchain is open-sourced

2015-04-10 Thread Jan Hubicka
> On Tue, Apr 7, 2015 at 9:45 AM, Ilya Palachev  wrote:
> > In the mentioned README file it is said that " In order to collect this
> > profile, you will need to have an Intel CPU that have last branch record
> > (LBR) support." Is this information obsolete? Chrome Canary builds use
> > AutoFDO for ARMv7l
> > (https://code.google.com/p/chromium/issues/detail?id=434587)
> 
> It does not mean that the profile was recorded on an ARM system: they
> can gather perf.data on x86 and then produce a coverage file that is
> then used in ARM compiles.  I tried it and seems to work well.

I must say I did not even try running AutoFDO myself (so I am happy to hear
it works). My understanding is that you need LBR only to get indirect
call profiling working (i.e. you want to know from where the indirect
function is called).

Depending on your application this may not be the most important thing to
record (either you don't have indirect calls in hot paths or they are handled
resonably by speculative devirtualization)

Some ARMs also has support for tracing jump pairs, right?
Honza
> 
> Sebastian


Re: AutoFDO profile toolchain is open-sourced

2015-04-07 Thread Ilya Palachev

Hi,

Here are some questions about AutoFDO.

On 08.05.2014 02:55, Dehao Chen wrote:

We have open-sourced AutoFDO profile toolchain in:

https://github.com/google/autofdo

For GCC developers, the most important tool is create_gcov, which
converts sampling based profile to GCC-readable profile. Please refer
to the readme file
(https://raw.githubusercontent.com/google/autofdo/master/README) for
more details.


In the mentioned README file it is said that " In order to collect this 
profile, you will need to have an Intel CPU that have last branch record 
(LBR) support." Is this information obsolete? Chrome Canary builds use 
AutoFDO for ARMv7l 
(https://code.google.com/p/chromium/issues/detail?id=434587)


What about Aarch64 support? Is it supported?


To use the profile, one need to checkout
https://gcc.gnu.org/svn/gcc/branches/google/gcc-4_8. We are working on
porting AutoFDO to trunk
(http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00438.html).


For now AutoFDO was merged into gcc-5.0 (trunk) branch.
Is it possible to backport it to 4.9 branch? Can you estimate required 
efforts for that?




We have limited doc inside the open-sourced package, and we are
planning to add more content to the wiki page
(https://github.com/google/autofdo/wiki). Feel free to send me emails
or discuss on github if you have any questions.

Cheers,
Dehao


--
Best regards,
Ilya