RE: AutoFDO profile toolchain is open-sourced
Recently we found an ICE while compiling a program with auto-fdo (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972). The ICE was caused because SSA is not in a valid state when the early inliner is run. The fix was to update_ssa before running the early inliner (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65972#c4). However, it remains to be found out which pass caused the SSA to be in that state, maybe fixing the problem there would be more appropriate. -Aditya > Date: Sat, 9 May 2015 16:33:02 +0200 > From: hubi...@ucw.cz > To: hiradi...@msn.com > CC: de...@google.com; i.palac...@samsung.com; davi...@google.com; > hubi...@ucw.cz; gcc@gcc.gnu.org; v.bari...@samsung.com; dnovi...@google.com; > seb...@gmail.com > Subject: Re: AutoFDO profile toolchain is open-sourced > >>> Yes, it will. But it's not well tuned at all. I will start tuning it >>> if I have free cycles. It would be great if opensource community can >>> also contribute to this tuning effort. >> >> If you could outline portions of code which needs tuning, rewriting, that >> will help get started in this effort. > > Optimization passes in GCC are generally designed to work with any kind of > edge profile they get. > There are only few cases where they do care about what profile is around. > > At the moment we consider two types of profiles - static (guessed) and FDO. > For > static one we shut down use of profile info for some heuristics - for example > we do not expect loop trip counts to be reliable in the profiles because they > are not. You can look for code checking profile_status_for_fn. > > Auto-FDO does not have special value for profile_status_for_fn and it goes > with > same code paths for FDO. Dehao has some patches for Auto-FDO tuning but my > impression is that he got mostly got around by just makng optimizer bit more > robust for nonsential profiles that is always good, since even FDO profiles > can > get wrong. BTW, Dehao, do you think you can submit these changes for this > stage1? > > I suppose in this case we have yet another kind of profile that is less > reliable than > FDO and we need to start by simply benchmarking and looking for cases where > this profile > gets worse and handle them one by one :) > > Honza
Re: AutoFDO profile toolchain is open-sourced
> > Yes, it will. But it's not well tuned at all. I will start tuning it > > if I have free cycles. It would be great if opensource community can > > also contribute to this tuning effort. > > If you could outline portions of code which needs tuning, rewriting, that > will help get started in this effort. Optimization passes in GCC are generally designed to work with any kind of edge profile they get. There are only few cases where they do care about what profile is around. At the moment we consider two types of profiles - static (guessed) and FDO. For static one we shut down use of profile info for some heuristics - for example we do not expect loop trip counts to be reliable in the profiles because they are not. You can look for code checking profile_status_for_fn. Auto-FDO does not have special value for profile_status_for_fn and it goes with same code paths for FDO. Dehao has some patches for Auto-FDO tuning but my impression is that he got mostly got around by just makng optimizer bit more robust for nonsential profiles that is always good, since even FDO profiles can get wrong. BTW, Dehao, do you think you can submit these changes for this stage1? I suppose in this case we have yet another kind of profile that is less reliable than FDO and we need to start by simply benchmarking and looking for cases where this profile gets worse and handle them one by one :) Honza
RE: AutoFDO profile toolchain is open-sourced
> Date: Fri, 8 May 2015 11:19:12 -0700 > Subject: Re: AutoFDO profile toolchain is open-sourced > From: de...@google.com > To: i.palac...@samsung.com > CC: davi...@google.com; hubi...@ucw.cz; gcc@gcc.gnu.org; > v.bari...@samsung.com; dnovi...@google.com; seb...@gmail.com > > On Fri, May 8, 2015 at 2:00 AM, Ilya Palachev wrote: >> On 11.04.2015 01:49, Xinliang David Li wrote: >>> >>> On Fri, Apr 10, 2015 at 3:43 PM, Jan Hubicka wrote: >>>>> >>>>> LBR is used for both cfg edge profiling and indirect call Target value >>>>> profiling. >>>> >>>> I see, that makes sense ;) I guess if we want to support profile >>>> collection >>>> on targets w/o this feature we could still use one of the algorithms that >>>> try to guess edge profile from BB profile. >>> >>> Our experience with sampling cycles or retired instructions to guess >>> BB profile has not been great -- the profile quality is significantly >>> worse than LBR (which can almost match instrumentation based profile). >> >> Suppose that I have no opportunity to collect profile on x86 architecture >> with LBR support and the only available architecture is arm/aarch64 (since >> the application code is significantly different when compiled for different >> architectures because of manual optimizations and different function names >> and structure). > > If it's already manually tuned towards architecture (or even > hand-written inlined-assembly), then I don't think FDO/AutoFDO can > help much. > >> >> Honza has mentioned that it's possible to guess edge profile from BB >> profile. How do you think, can this help in the above described situation? >> Yes, this will be much worse than LBR, but can it give any performance >> benefit compared with no edge profile at all? > > Yes, it will. But it's not well tuned at all. I will start tuning it > if I have free cycles. It would be great if opensource community can > also contribute to this tuning effort. If you could outline portions of code which needs tuning, rewriting, that will help get started in this effort. Thanks, -Aditya > > Cheers, > Dehao > >> >> -- >> Ilya
Re: AutoFDO profile toolchain is open-sourced
On Fri, May 8, 2015 at 2:00 AM, Ilya Palachev wrote: > On 11.04.2015 01:49, Xinliang David Li wrote: >> >> On Fri, Apr 10, 2015 at 3:43 PM, Jan Hubicka wrote: LBR is used for both cfg edge profiling and indirect call Target value profiling. >>> >>> I see, that makes sense ;) I guess if we want to support profile >>> collection >>> on targets w/o this feature we could still use one of the algorithms that >>> try to guess edge profile from BB profile. >> >> Our experience with sampling cycles or retired instructions to guess >> BB profile has not been great -- the profile quality is significantly >> worse than LBR (which can almost match instrumentation based profile). > > Suppose that I have no opportunity to collect profile on x86 architecture > with LBR support and the only available architecture is arm/aarch64 (since > the application code is significantly different when compiled for different > architectures because of manual optimizations and different function names > and structure). If it's already manually tuned towards architecture (or even hand-written inlined-assembly), then I don't think FDO/AutoFDO can help much. > > Honza has mentioned that it's possible to guess edge profile from BB > profile. How do you think, can this help in the above described situation? > Yes, this will be much worse than LBR, but can it give any performance > benefit compared with no edge profile at all? Yes, it will. But it's not well tuned at all. I will start tuning it if I have free cycles. It would be great if opensource community can also contribute to this tuning effort. Cheers, Dehao > > -- > Ilya
Re: AutoFDO profile toolchain is open-sourced
On 11.04.2015 01:49, Xinliang David Li wrote: On Fri, Apr 10, 2015 at 3:43 PM, Jan Hubicka wrote: LBR is used for both cfg edge profiling and indirect call Target value profiling. I see, that makes sense ;) I guess if we want to support profile collection on targets w/o this feature we could still use one of the algorithms that try to guess edge profile from BB profile. Our experience with sampling cycles or retired instructions to guess BB profile has not been great -- the profile quality is significantly worse than LBR (which can almost match instrumentation based profile). Suppose that I have no opportunity to collect profile on x86 architecture with LBR support and the only available architecture is arm/aarch64 (since the application code is significantly different when compiled for different architectures because of manual optimizations and different function names and structure). Honza has mentioned that it's possible to guess edge profile from BB profile. How do you think, can this help in the above described situation? Yes, this will be much worse than LBR, but can it give any performance benefit compared with no edge profile at all? -- Ilya
Re: AutoFDO profile toolchain is open-sourced
On Mon, Apr 27, 2015 at 7:37 AM, Ilya Palachev wrote: > Hi, > > On 21.04.2015 20:25, Dehao Chen wrote: >> >> OTOH, the most important patch (insn-level discriminator support) is >> not in yet. Cary has just retired. Do you know if anyone would be >> interested in porting insn-level discriminator support to trunk? > > > Do you mean r210338, r210397, r210523, r214745 ? Yes > Can you explain why these patches are important for autofdo? Instruction level discriminator support is important to autofdo because basic block level discriminator is not enough when instructions are moved to other basic blocks by code motion. Additionally, gcc backend optimization does not maintain BB level discriminator well. We need to encode discriminator as part of LOC so that once the discriminator is assigned to an IR, it will go all the way to the codegen without being modified. > What work should be done to port them to current 5 branch? I think we just need to have these patches in. Or even better, reimplement this the same way as my lexical block patch (https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=191494) > Do you expect them to be applied to 6 branch? This should go into trunk and be there for all later gcc branches. Dehao > > -- > Ilya
Re: AutoFDO profile toolchain is open-sourced
On Thu, Apr 23, 2015 at 10:31 PM, Jan Hubicka wrote: > > > > It converts with the attached patches, but there's still some problem > > > parsing the data: > > > > > > % ./create_gcov -binary loop -gcov_version 1 -gcov loop.gcda > > > -gcov_version 0x500e > > > % gcc50 -O2 -fprofile-use loop.c > > > loop.c:1:0: warning: '/home/andi/src/autofdo/loop.gcda' is version ', > > > expected version '500e' > > > % > > > > You need to use -fauto-profile=loop.gcda instead of "-fprofile-use", > > which is only for instrumentation based FDO. > > This is indeed not very intuitive. I wonder why it uses the same suffix > suggesting > that sample based and FDO based files are the same? AutoFDO profile does not need to have any specific suffix. I'll update the toolchain to make the default output profile as "fbdata.afdo" instead of "fbdata.gcda". > > Would it be possible to at least have this well documented in invoke.texi and > perhaps > we can fix the warning above to say something like "loop.gcda is autofdo > profile, use > -fauto-profile instead of -fprofile-use"? Sounds good to me. I will send a patch to update invoke.texi. Could you help fix the warning for profile-use? Thanks, Dehao > > > Honza > > > > Dehao > > > > > > > > -Andi > > >
Re: AutoFDO profile toolchain is open-sourced
Hi, On 21.04.2015 20:25, Dehao Chen wrote: OTOH, the most important patch (insn-level discriminator support) is not in yet. Cary has just retired. Do you know if anyone would be interested in porting insn-level discriminator support to trunk? Do you mean r210338, r210397, r210523, r214745 ? Can you explain why these patches are important for autofdo? What work should be done to port them to current 5 branch? Do you expect them to be applied to 6 branch? -- Ilya
Re: AutoFDO profile toolchain is open-sourced
> > It converts with the attached patches, but there's still some problem > > parsing the data: > > > > % ./create_gcov -binary loop -gcov_version 1 -gcov loop.gcda -gcov_version > > 0x500e > > % gcc50 -O2 -fprofile-use loop.c > > loop.c:1:0: warning: '/home/andi/src/autofdo/loop.gcda' is version ', > > expected version '500e' > > % > > You need to use -fauto-profile=loop.gcda instead of "-fprofile-use", > which is only for instrumentation based FDO. This is indeed not very intuitive. I wonder why it uses the same suffix suggesting that sample based and FDO based files are the same? Would it be possible to at least have this well documented in invoke.texi and perhaps we can fix the warning above to say something like "loop.gcda is autofdo profile, use -fauto-profile instead of -fprofile-use"? Honza > > Dehao > > > > > -Andi > >
Re: AutoFDO profile toolchain is open-sourced
Thanks, I'll forward the patches to quipper team. On Tue, Apr 21, 2015 at 8:47 PM, Andi Kleen wrote: > On Wed, Apr 22, 2015 at 05:15:47AM +0200, Andi Kleen wrote: >> On Tue, Apr 21, 2015 at 01:52:18PM -0700, Dehao Chen wrote: >> > Andi, >> > >> > Thanks for the patches. Turns out that the first 3 patches are already >> > in, the correct upstream quipper repository is: >> > >> > https://chromium.googlesource.com/chromiumos/platform2/+/master/chromiumos-wide-profiling/ >> > >> > The last 3 patches seem to be local hacks. Do you want any of them in? >> > >> > I just did a batch sync with quipper head. Please let me know if this >> > solves the perf problem. >> >> Still outdated: >> >> F0421 20:13:16.221422 22297 perf_reader.cc:1614] Check failed: attr_size <= >> sizeof(perf_event_attr) (104 vs. 96) > > It converts with the attached patches, but there's still some problem > parsing the data: > > % ./create_gcov -binary loop -gcov_version 1 -gcov loop.gcda -gcov_version > 0x500e > % gcc50 -O2 -fprofile-use loop.c > loop.c:1:0: warning: '/home/andi/src/autofdo/loop.gcda' is version ', > expected version '500e' > % You need to use -fauto-profile=loop.gcda instead of "-fprofile-use", which is only for instrumentation based FDO. Dehao > > -Andi >
Re: AutoFDO profile toolchain is open-sourced
On Wed, Apr 22, 2015 at 05:15:47AM +0200, Andi Kleen wrote: > On Tue, Apr 21, 2015 at 01:52:18PM -0700, Dehao Chen wrote: > > Andi, > > > > Thanks for the patches. Turns out that the first 3 patches are already > > in, the correct upstream quipper repository is: > > > > https://chromium.googlesource.com/chromiumos/platform2/+/master/chromiumos-wide-profiling/ > > > > The last 3 patches seem to be local hacks. Do you want any of them in? > > > > I just did a batch sync with quipper head. Please let me know if this > > solves the perf problem. > > Still outdated: > > F0421 20:13:16.221422 22297 perf_reader.cc:1614] Check failed: attr_size <= > sizeof(perf_event_attr) (104 vs. 96) It converts with the attached patches, but there's still some problem parsing the data: % ./create_gcov -binary loop -gcov_version 1 -gcov loop.gcda -gcov_version 0x500e % gcc50 -O2 -fprofile-use loop.c loop.c:1:0: warning: '/home/andi/src/autofdo/loop.gcda' is version ', expected version '500e' % -Andi autofdo-patches-2.tgz Description: application/gtar-compressed
Re: AutoFDO profile toolchain is open-sourced
On Tue, Apr 21, 2015 at 01:52:18PM -0700, Dehao Chen wrote: > Andi, > > Thanks for the patches. Turns out that the first 3 patches are already > in, the correct upstream quipper repository is: > > https://chromium.googlesource.com/chromiumos/platform2/+/master/chromiumos-wide-profiling/ > > The last 3 patches seem to be local hacks. Do you want any of them in? > > I just did a batch sync with quipper head. Please let me know if this > solves the perf problem. Still outdated: F0421 20:13:16.221422 22297 perf_reader.cc:1614] Check failed: attr_size <= sizeof(perf_event_attr) (104 vs. 96) -Andi
Re: AutoFDO profile toolchain is open-sourced
Ok, thanks for the tip of the flag. You would also need to pass "-use_lbr=false" to create a gcov file for a device that does not have LBR support. We tried this on ARM collected profiles and we got the same speedup as x86 collected profiles on linpack. Sebastian On Tue, Apr 21, 2015 at 3:53 PM, Dehao Chen wrote: > That's correct. For trunk, gcov_version is 0x1. We defined this as a > flag so that you can actually change it via --gcov_version=0x1 instead > of changing the code. > > Dehao > > On Tue, Apr 21, 2015 at 1:47 PM, Sebastian Pop wrote: >> We also needed to adjust the gcov_version in autofdo/gcov.cc to read >> 0x1 for dev branches of gcc (instead of the current 0x3430372a for >> some released version of GCC): >> >> -DEFINE_uint64(gcov_version, 0x3430372a, >> +DEFINE_uint64(gcov_version, 0x1, >> >> Sebastian >> >> On Tue, Apr 21, 2015 at 3:33 PM, Aditya K wrote: >>> After patching linux perf. This script collects creates a coverage file >>> (e.g., for linpack) which can be used for fdo. >>> >>> >>> gcov=linpack-x86.gcov >>> MAKE='make' >>> >>> >>> # x86 >>> x86() { >>> CC=/usr/bin/gcc >>> CXX=/usr/bin/g++ >>> >>> export CFLAGS="-Ofast -g3 -static" >>> export CPPFLAGS=$CFLAGS >>> >>> $MAKE -C $SRC/SingleSource/Benchmarks/Linpack clean >>> >>> $MAKE -C $SRC/SingleSource/Benchmarks/Linpack -k TEST=simple >>> TARGET_LLVMGCC=$CC TARGET_CXX=$CXX LLI_OPTFLAGS= TARGET_CC=$CC >>> TARGET_LLVMGXX=$CXX CC_UNDER_TEST_IS_GCC=1 TARGET_FLAGS= >>> USE_REFERENCE_OUTPUT=1CC_UNDER_TEST_TARGET_IS_AARCH64=1 OPTFLAGS= >>> LLC_OPTFLAGS= ENABLE_OPTIMIZED=1 ARCH=x86_64 ENABLE_HASHED_PROGRAM_OUTPUT=1 >>> DISABLE_JIT=1 >>> >>> perfdata=autofdo-linpack/perf-x86.data >>> >>> perf record -b -e branch-instructions -o $perfdata >>> $SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple >>> >>> autofdo/usr/bin/create_gcov >>> --binary=$SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple >>> --profile=$perfdata --gcov=$gcov >>> >>> } >>> >>> >>> hth, >>> -Aditya >>> >>> >>>> From: a...@firstfloor.org >>>> To: i.palac...@samsung.com >>>> CC: dnovi...@google.com; gcc@gcc.gnu.org; davi...@google.com; >>>> hubi...@ucw.cz; seb...@gmail.com; de...@google.com; v.bari...@samsung.com >>>> Subject: Re: AutoFDO profile toolchain is open-sourced >>>> Date: Tue, 21 Apr 2015 07:25:10 -0700 >>>> >>>> Ilya Palachev writes: >>>>> >>>>> But why create_gcov does not inform about that (no branch events were >>>>> found)? It creates empty gcov file and says nothing :( >>>>> >>>>> Moreover, in the mentioned README it is said that perf should also be >>>>> executed with option -e BR_INST_RETIRED:TAKEN. >>>> >>>> Standard perf doesn't have a full event list >>>> This assumes a perf patched with the libpfm patch. >>>> >>>> Also I suspect it really wants to use PEBS events, so pp should be added. >>>> >>>> Alternatively you can use ocperf (from >>>> http://github.com/andikleen/pmu-tools) which is just a wrapper: >>>> >>>> ocperf.py record -e br_inst_retired.near_taken:pp -b ... >>>> >>>> or specify the event manually (depending on your CPU, like) >>>> >>>> perf record -e >>>> cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp >>>> -b ... >>>> >>>> BTW the biggest problem with autofdo currently is that it is >>>> quite bitrotten and supports only several years old perf. >>>> So all of this above will only work with old distributions, >>>> unless you compile an old perf utility first. >>>> >>>> -Andi >>>> >>>> -- >>>> a...@linux.intel.com -- Speaking for myself only >>>
Re: AutoFDO profile toolchain is open-sourced
That's correct. For trunk, gcov_version is 0x1. We defined this as a flag so that you can actually change it via --gcov_version=0x1 instead of changing the code. Dehao On Tue, Apr 21, 2015 at 1:47 PM, Sebastian Pop wrote: > We also needed to adjust the gcov_version in autofdo/gcov.cc to read > 0x1 for dev branches of gcc (instead of the current 0x3430372a for > some released version of GCC): > > -DEFINE_uint64(gcov_version, 0x3430372a, > +DEFINE_uint64(gcov_version, 0x1, > > Sebastian > > On Tue, Apr 21, 2015 at 3:33 PM, Aditya K wrote: >> After patching linux perf. This script collects creates a coverage file >> (e.g., for linpack) which can be used for fdo. >> >> >> gcov=linpack-x86.gcov >> MAKE='make' >> >> >> # x86 >> x86() { >> CC=/usr/bin/gcc >> CXX=/usr/bin/g++ >> >> export CFLAGS="-Ofast -g3 -static" >> export CPPFLAGS=$CFLAGS >> >> $MAKE -C $SRC/SingleSource/Benchmarks/Linpack clean >> >> $MAKE -C $SRC/SingleSource/Benchmarks/Linpack -k TEST=simple >> TARGET_LLVMGCC=$CC TARGET_CXX=$CXX LLI_OPTFLAGS= TARGET_CC=$CC >> TARGET_LLVMGXX=$CXX CC_UNDER_TEST_IS_GCC=1 TARGET_FLAGS= >> USE_REFERENCE_OUTPUT=1CC_UNDER_TEST_TARGET_IS_AARCH64=1 OPTFLAGS= >> LLC_OPTFLAGS= ENABLE_OPTIMIZED=1 ARCH=x86_64 ENABLE_HASHED_PROGRAM_OUTPUT=1 >> DISABLE_JIT=1 >> >> perfdata=autofdo-linpack/perf-x86.data >> >> perf record -b -e branch-instructions -o $perfdata >> $SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple >> >> autofdo/usr/bin/create_gcov >> --binary=$SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple >> --profile=$perfdata --gcov=$gcov >> >> } >> >> >> hth, >> -Aditya >> >> >>> From: a...@firstfloor.org >>> To: i.palac...@samsung.com >>> CC: dnovi...@google.com; gcc@gcc.gnu.org; davi...@google.com; >>> hubi...@ucw.cz; seb...@gmail.com; de...@google.com; v.bari...@samsung.com >>> Subject: Re: AutoFDO profile toolchain is open-sourced >>> Date: Tue, 21 Apr 2015 07:25:10 -0700 >>> >>> Ilya Palachev writes: >>>> >>>> But why create_gcov does not inform about that (no branch events were >>>> found)? It creates empty gcov file and says nothing :( >>>> >>>> Moreover, in the mentioned README it is said that perf should also be >>>> executed with option -e BR_INST_RETIRED:TAKEN. >>> >>> Standard perf doesn't have a full event list >>> This assumes a perf patched with the libpfm patch. >>> >>> Also I suspect it really wants to use PEBS events, so pp should be added. >>> >>> Alternatively you can use ocperf (from >>> http://github.com/andikleen/pmu-tools) which is just a wrapper: >>> >>> ocperf.py record -e br_inst_retired.near_taken:pp -b ... >>> >>> or specify the event manually (depending on your CPU, like) >>> >>> perf record -e >>> cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp >>> -b ... >>> >>> BTW the biggest problem with autofdo currently is that it is >>> quite bitrotten and supports only several years old perf. >>> So all of this above will only work with old distributions, >>> unless you compile an old perf utility first. >>> >>> -Andi >>> >>> -- >>> a...@linux.intel.com -- Speaking for myself only >>
Re: AutoFDO profile toolchain is open-sourced
Andi, Thanks for the patches. Turns out that the first 3 patches are already in, the correct upstream quipper repository is: https://chromium.googlesource.com/chromiumos/platform2/+/master/chromiumos-wide-profiling/ The last 3 patches seem to be local hacks. Do you want any of them in? I just did a batch sync with quipper head. Please let me know if this solves the perf problem. Thanks, Dehao On Tue, Apr 21, 2015 at 10:36 AM, Andi Kleen wrote: > On Tue, Apr 21, 2015 at 10:27:49AM -0700, Dehao Chen wrote: >> In that case, we should get quipper fixed upstream to accommodate new >> format. (Maybe they already fixed it, I will do a batch sync to make >> quipper up-to-date). > > From a quick look at > > http://git.chromium.org/gitweb/?p=chromiumos/platform/chromiumos-wide-profiling.git;a=summary > > (I assume that is what you mean with upstream) > > it hasn't been updated. Is still stuck in 2013. > > I'm attaching what patches I have so far. > > -Andi
Re: AutoFDO profile toolchain is open-sourced
We also needed to adjust the gcov_version in autofdo/gcov.cc to read 0x1 for dev branches of gcc (instead of the current 0x3430372a for some released version of GCC): -DEFINE_uint64(gcov_version, 0x3430372a, +DEFINE_uint64(gcov_version, 0x1, Sebastian On Tue, Apr 21, 2015 at 3:33 PM, Aditya K wrote: > After patching linux perf. This script collects creates a coverage file > (e.g., for linpack) which can be used for fdo. > > > gcov=linpack-x86.gcov > MAKE='make' > > > # x86 > x86() { > CC=/usr/bin/gcc > CXX=/usr/bin/g++ > > export CFLAGS="-Ofast -g3 -static" > export CPPFLAGS=$CFLAGS > > $MAKE -C $SRC/SingleSource/Benchmarks/Linpack clean > > $MAKE -C $SRC/SingleSource/Benchmarks/Linpack -k TEST=simple > TARGET_LLVMGCC=$CC TARGET_CXX=$CXX LLI_OPTFLAGS= TARGET_CC=$CC > TARGET_LLVMGXX=$CXX CC_UNDER_TEST_IS_GCC=1 TARGET_FLAGS= > USE_REFERENCE_OUTPUT=1CC_UNDER_TEST_TARGET_IS_AARCH64=1 OPTFLAGS= > LLC_OPTFLAGS= ENABLE_OPTIMIZED=1 ARCH=x86_64 ENABLE_HASHED_PROGRAM_OUTPUT=1 > DISABLE_JIT=1 > > perfdata=autofdo-linpack/perf-x86.data > > perf record -b -e branch-instructions -o $perfdata > $SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple > > autofdo/usr/bin/create_gcov > --binary=$SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple > --profile=$perfdata --gcov=$gcov > > } > > > hth, > -Aditya > > >> From: a...@firstfloor.org >> To: i.palac...@samsung.com >> CC: dnovi...@google.com; gcc@gcc.gnu.org; davi...@google.com; >> hubi...@ucw.cz; seb...@gmail.com; de...@google.com; v.bari...@samsung.com >> Subject: Re: AutoFDO profile toolchain is open-sourced >> Date: Tue, 21 Apr 2015 07:25:10 -0700 >> >> Ilya Palachev writes: >>> >>> But why create_gcov does not inform about that (no branch events were >>> found)? It creates empty gcov file and says nothing :( >>> >>> Moreover, in the mentioned README it is said that perf should also be >>> executed with option -e BR_INST_RETIRED:TAKEN. >> >> Standard perf doesn't have a full event list >> This assumes a perf patched with the libpfm patch. >> >> Also I suspect it really wants to use PEBS events, so pp should be added. >> >> Alternatively you can use ocperf (from >> http://github.com/andikleen/pmu-tools) which is just a wrapper: >> >> ocperf.py record -e br_inst_retired.near_taken:pp -b ... >> >> or specify the event manually (depending on your CPU, like) >> >> perf record -e >> cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp >> -b ... >> >> BTW the biggest problem with autofdo currently is that it is >> quite bitrotten and supports only several years old perf. >> So all of this above will only work with old distributions, >> unless you compile an old perf utility first. >> >> -Andi >> >> -- >> a...@linux.intel.com -- Speaking for myself only >
RE: AutoFDO profile toolchain is open-sourced
After patching linux perf. This script collects creates a coverage file (e.g., for linpack) which can be used for fdo. gcov=linpack-x86.gcov MAKE='make' # x86 x86() { CC=/usr/bin/gcc CXX=/usr/bin/g++ export CFLAGS="-Ofast -g3 -static" export CPPFLAGS=$CFLAGS $MAKE -C $SRC/SingleSource/Benchmarks/Linpack clean $MAKE -C $SRC/SingleSource/Benchmarks/Linpack -k TEST=simple TARGET_LLVMGCC=$CC TARGET_CXX=$CXX LLI_OPTFLAGS= TARGET_CC=$CC TARGET_LLVMGXX=$CXX CC_UNDER_TEST_IS_GCC=1 TARGET_FLAGS= USE_REFERENCE_OUTPUT=1 CC_UNDER_TEST_TARGET_IS_AARCH64=1 OPTFLAGS= LLC_OPTFLAGS= ENABLE_OPTIMIZED=1 ARCH=x86_64 ENABLE_HASHED_PROGRAM_OUTPUT=1 DISABLE_JIT=1 perfdata=autofdo-linpack/perf-x86.data perf record -b -e branch-instructions -o $perfdata $SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple autofdo/usr/bin/create_gcov --binary=$SRC/SingleSource/Benchmarks/Linpack/Output/linpack-pc.simple --profile=$perfdata --gcov=$gcov } hth, -Aditya > From: a...@firstfloor.org > To: i.palac...@samsung.com > CC: dnovi...@google.com; gcc@gcc.gnu.org; davi...@google.com; hubi...@ucw.cz; > seb...@gmail.com; de...@google.com; v.bari...@samsung.com > Subject: Re: AutoFDO profile toolchain is open-sourced > Date: Tue, 21 Apr 2015 07:25:10 -0700 > > Ilya Palachev writes: >> >> But why create_gcov does not inform about that (no branch events were >> found)? It creates empty gcov file and says nothing :( >> >> Moreover, in the mentioned README it is said that perf should also be >> executed with option -e BR_INST_RETIRED:TAKEN. > > Standard perf doesn't have a full event list > This assumes a perf patched with the libpfm patch. > > Also I suspect it really wants to use PEBS events, so pp should be added. > > Alternatively you can use ocperf (from > http://github.com/andikleen/pmu-tools) which is just a wrapper: > > ocperf.py record -e br_inst_retired.near_taken:pp -b ... > > or specify the event manually (depending on your CPU, like) > > perf record -e > cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp > -b ... > > BTW the biggest problem with autofdo currently is that it is > quite bitrotten and supports only several years old perf. > So all of this above will only work with old distributions, > unless you compile an old perf utility first. > > -Andi > > -- > a...@linux.intel.com -- Speaking for myself only
Re: AutoFDO profile toolchain is open-sourced
On Tue, Apr 21, 2015 at 10:27:49AM -0700, Dehao Chen wrote: > In that case, we should get quipper fixed upstream to accommodate new > format. (Maybe they already fixed it, I will do a batch sync to make > quipper up-to-date). >From a quick look at http://git.chromium.org/gitweb/?p=chromiumos/platform/chromiumos-wide-profiling.git;a=summary (I assume that is what you mean with upstream) it hasn't been updated. Is still stuck in 2013. I'm attaching what patches I have so far. -Andi autofdo-newer-perf-0.tgz Description: application/gtar-compressed
Re: AutoFDO profile toolchain is open-sourced
In that case, we should get quipper fixed upstream to accommodate new format. (Maybe they already fixed it, I will do a batch sync to make quipper up-to-date). Dehao On Tue, Apr 21, 2015 at 10:24 AM, Andi Kleen wrote: >> > BTW the biggest problem with autofdo currently is that it is >> > quite bitrotten and supports only several years old perf. >> > So all of this above will only work with old distributions, >> > unless you compile an old perf utility first. >> >> Do you mean newer perf does not support LBR (-b) any more? > > No. > > perf extended its perf.data output format, and quipper cannot parse > any of the extensions, so it just bombs out with assertation > failures. > > I have a patch to hack around some of this, but still > couldn't get it actually to work so far. > > -Andi > -- > a...@linux.intel.com -- Speaking for myself only.
Re: AutoFDO profile toolchain is open-sourced
I'll get to it soon. When will stage1 close? OTOH, the most important patch (insn-level discriminator support) is not in yet. Cary has just retired. Do you know if anyone would be interested in porting insn-level discriminator support to trunk? Dehao On Tue, Apr 21, 2015 at 8:59 AM, Jan Hubicka wrote: >> You can use dump_gcov to show a text version of the profile dump and >> check if the profile data makes sense. If your program is just a very >> tight single loop, the current implementation in trunk may not yield >> good results because it does not have discriminator support. Try the >> google-4_9 branch instead. > > Can we possibly merge the remaining patches now when stage1 is open? > > Honza >> >> Dehao >> >> > >> > >> > -- >> > Best regards, >> > Ilya Palachev
Re: AutoFDO profile toolchain is open-sourced
> > BTW the biggest problem with autofdo currently is that it is > > quite bitrotten and supports only several years old perf. > > So all of this above will only work with old distributions, > > unless you compile an old perf utility first. > > Do you mean newer perf does not support LBR (-b) any more? No. perf extended its perf.data output format, and quipper cannot parse any of the extensions, so it just bombs out with assertation failures. I have a patch to hack around some of this, but still couldn't get it actually to work so far. -Andi -- a...@linux.intel.com -- Speaking for myself only.
Re: AutoFDO profile toolchain is open-sourced
> You can use dump_gcov to show a text version of the profile dump and > check if the profile data makes sense. If your program is just a very > tight single loop, the current implementation in trunk may not yield > good results because it does not have discriminator support. Try the > google-4_9 branch instead. Can we possibly merge the remaining patches now when stage1 is open? Honza > > Dehao > > > > > > > -- > > Best regards, > > Ilya Palachev
Re: AutoFDO profile toolchain is open-sourced
On Tue, Apr 21, 2015 at 7:25 AM, Andi Kleen wrote: > Ilya Palachev writes: >> >> But why create_gcov does not inform about that (no branch events were >> found)? It creates empty gcov file and says nothing :( >> >> Moreover, in the mentioned README it is said that perf should also be >> executed with option -e BR_INST_RETIRED:TAKEN. > > Standard perf doesn't have a full event list > This assumes a perf patched with the libpfm patch. > > Also I suspect it really wants to use PEBS events, so pp should be added. > > Alternatively you can use ocperf (from > http://github.com/andikleen/pmu-tools) which is just a wrapper: > > ocperf.py record -e br_inst_retired.near_taken:pp -b ... > > or specify the event manually (depending on your CPU, like) > > perf record -e > cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp > -b ... > > BTW the biggest problem with autofdo currently is that it is > quite bitrotten and supports only several years old perf. > So all of this above will only work with old distributions, > unless you compile an old perf utility first. Do you mean newer perf does not support LBR (-b) any more? Dehao > > -Andi > > -- > a...@linux.intel.com -- Speaking for myself only
Re: AutoFDO profile toolchain is open-sourced
On Tue, Apr 21, 2015 at 6:42 AM, Ilya Palachev wrote: > On 21.04.2015 14:57, Diego Novillo wrote: >> >> >From the autofdo page: https://github.com/google/autofdo >> >> [ ... ] >> Inputs: >> >> --profile: PERF_PROFILE collected using linux perf (with last branch >> record). >> In order to collect this profile, you will need to have an Intel CPU that >> have last branch record (LBR) support. You also need to have your linux >> kernel configured with LBR support. To profile: >> # perf record -c PERIOD -e EVENT -b -o perf.data -- ./command >> EVENT is refering to BR_INST_RETIRED:TAKEN if available. For some >> architectures, BR_INST_EXEC:TAKEN also works. >> [ ... ] >> >> The important one for autofdo is -b. It asks perf to use LBR registers >> for branch tracking (assuming your architecture supports it). > > > Thanks! It worked. Now big programs produce big gcov files. Sorry for this > confusing message. > > But why create_gcov does not inform about that (no branch events were > found)? It creates empty gcov file and says nothing :( > > Moreover, in the mentioned README it is said that perf should also be > executed with option -e BR_INST_RETIRED:TAKEN. > I tried to add it but perf said that > >invalid or unsupported event: 'BR_INST_RETIRED:TAKEN' >Run 'perf list' for a list of valid events > > For my architecture x86_64 the perf list contains > >$ sudo perf list | grep -i br > branch-instructions OR branches[Hardware event] > branch-misses [Hardware event] > branch-loads [Hardware >cache event] > branch-load-misses [Hardware >cache event] > branch-instructions OR cpu/branch-instructions/[Kernel PMU event] > branch-misses OR cpu/branch-misses/[Kernel PMU event] > mem:[:access] [Hardware breakpoint] > syscalls:sys_enter_brk [Tracepoint event] > syscalls:sys_exit_brk [Tracepoint event] > > There is no BR_INST_RETIRED:TAKEN there. Do you use some specific > configuration of perf for that? > > However, I tried to use option "-e branch-instructions". Before that the > following error was obtained: > >E0421 15:57:39.308374 11551 perf_parser.cc:210] Mapped 50% of >samples, expected at least 95% > > and now it disappeared (because of option "-e branch-instructions"). > > Though, the performance decreases after adding option > "-fauto-profile=file.gcov" or "-fprofile-use=file.gcov" to the list of > compiler options. > The program becomes 10% slower than before. > Can you explain that? Maybe I should configure perf so that it will be able > to collect events BR_INST_RETIRED:TAKEN ? How can it be done? You can use dump_gcov to show a text version of the profile dump and check if the profile data makes sense. If your program is just a very tight single loop, the current implementation in trunk may not yield good results because it does not have discriminator support. Try the google-4_9 branch instead. Dehao > > > -- > Best regards, > Ilya Palachev
Re: AutoFDO profile toolchain is open-sourced
Ilya Palachev writes: > > But why create_gcov does not inform about that (no branch events were > found)? It creates empty gcov file and says nothing :( > > Moreover, in the mentioned README it is said that perf should also be > executed with option -e BR_INST_RETIRED:TAKEN. Standard perf doesn't have a full event list This assumes a perf patched with the libpfm patch. Also I suspect it really wants to use PEBS events, so pp should be added. Alternatively you can use ocperf (from http://github.com/andikleen/pmu-tools) which is just a wrapper: ocperf.py record -e br_inst_retired.near_taken:pp -b ... or specify the event manually (depending on your CPU, like) perf record -e cpu/event=0xc4,umask=0x20,name=br_inst_retired_near_taken,period=49/pp -b ... BTW the biggest problem with autofdo currently is that it is quite bitrotten and supports only several years old perf. So all of this above will only work with old distributions, unless you compile an old perf utility first. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: AutoFDO profile toolchain is open-sourced
On 21.04.2015 14:57, Diego Novillo wrote: >From the autofdo page: https://github.com/google/autofdo [ ... ] Inputs: --profile: PERF_PROFILE collected using linux perf (with last branch record). In order to collect this profile, you will need to have an Intel CPU that have last branch record (LBR) support. You also need to have your linux kernel configured with LBR support. To profile: # perf record -c PERIOD -e EVENT -b -o perf.data -- ./command EVENT is refering to BR_INST_RETIRED:TAKEN if available. For some architectures, BR_INST_EXEC:TAKEN also works. [ ... ] The important one for autofdo is -b. It asks perf to use LBR registers for branch tracking (assuming your architecture supports it). Thanks! It worked. Now big programs produce big gcov files. Sorry for this confusing message. But why create_gcov does not inform about that (no branch events were found)? It creates empty gcov file and says nothing :( Moreover, in the mentioned README it is said that perf should also be executed with option -e BR_INST_RETIRED:TAKEN. I tried to add it but perf said that invalid or unsupported event: 'BR_INST_RETIRED:TAKEN' Run 'perf list' for a list of valid events For my architecture x86_64 the perf list contains $ sudo perf list | grep -i br branch-instructions OR branches[Hardware event] branch-misses [Hardware event] branch-loads [Hardware cache event] branch-load-misses [Hardware cache event] branch-instructions OR cpu/branch-instructions/[Kernel PMU event] branch-misses OR cpu/branch-misses/[Kernel PMU event] mem:[:access] [Hardware breakpoint] syscalls:sys_enter_brk [Tracepoint event] syscalls:sys_exit_brk [Tracepoint event] There is no BR_INST_RETIRED:TAKEN there. Do you use some specific configuration of perf for that? However, I tried to use option "-e branch-instructions". Before that the following error was obtained: E0421 15:57:39.308374 11551 perf_parser.cc:210] Mapped 50% of samples, expected at least 95% and now it disappeared (because of option "-e branch-instructions"). Though, the performance decreases after adding option "-fauto-profile=file.gcov" or "-fprofile-use=file.gcov" to the list of compiler options. The program becomes 10% slower than before. Can you explain that? Maybe I should configure perf so that it will be able to collect events BR_INST_RETIRED:TAKEN ? How can it be done? -- Best regards, Ilya Palachev
Re: AutoFDO profile toolchain is open-sourced
On Tue, Apr 21, 2015 at 6:33 AM, Ilya Palachev wrote: > ping? > > On 15.04.2015 10:41, Ilya Palachev wrote: >> >> Hi, >> >> One more question. >> > Does anybody know with which options should the perf be executed so that to > collect appropriate data for the autofdo converter? >From the autofdo page: https://github.com/google/autofdo [ ... ] Inputs: --profile: PERF_PROFILE collected using linux perf (with last branch record). In order to collect this profile, you will need to have an Intel CPU that have last branch record (LBR) support. You also need to have your linux kernel configured with LBR support. To profile: # perf record -c PERIOD -e EVENT -b -o perf.data -- ./command EVENT is refering to BR_INST_RETIRED:TAKEN if available. For some architectures, BR_INST_EXEC:TAKEN also works. [ ... ] The important one for autofdo is -b. It asks perf to use LBR registers for branch tracking (assuming your architecture supports it). The binary you run under perf should also have line table information (compiled with -gmlt) to produce location support for autofdo. Diego.
Re: AutoFDO profile toolchain is open-sourced
ping? On 15.04.2015 10:41, Ilya Palachev wrote: Hi, One more question. Does anybody know with which options should the perf be executed so that to collect appropriate data for the autofdo converter? I obtain the same data for different programs, and it seems to be empty (1600 Bytes). They have the same md5sum for different programs: # Data for simple program with 30 lines of code: $ md5sum ytest.gcov d85481c9154aa606ce4893b64fe109e7 ytest.gcov # Data for program of 3D Delaunay triangulation construction of 100 points. $ md5sum experimentCGAL_convexHullDynamic.gcov d85481c9154aa606ce4893b64fe109e7 experimentCGAL_convexHullDynamic.gcov We tried to collect perf data using option --call-graph fp but it does not help: the output gcov data is still the same. Sometimes create_gcov reports the following error: E0421 13:10:37.125629 8732 perf_parser.cc:209] Mapped 50% of samples, expected at least 95% But it does not mean that there are not enough samples collected in the profile, because 99% of samples are mapped in the case of very simple program (with 1 function). I try to find working case for more than a week but did not suceeded. Can anybody show me that create_gcov works at least for one case? -- Best regards, Ilya Palachev
Re: AutoFDO profile toolchain is open-sourced
Hi, One more question. On 10.04.2015 23:39, Jan Hubicka wrote: I must say I did not even try running AutoFDO myself (so I am happy to hear it works). I tried to use executable create_gcov built from AutoFDO repository at github. The problem is that the data generated by this program has size 1600 bytes not depending on the profile data given to it. Steps to reproduce the issue: 1. Build AutoFDO under x86_64 2. Build, for example, the benchmark ytest.c (see attachment): g++ -O2 -o ytest ytest.c -g2 (I used g++ that was built just now from gcc-5-branch branch from git://gcc.gnu.org/git/gcc.git) 3. Run it under perf to collect the profile data: sudo perf record ./ytest The perf reports no error and says that [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.125 MB perf.data (~5442 samples) ] Perf generates perf.data. 4. Run create_gcov on the obtained data: create_gcov --binary ytest --profile perf.data --gcov ytest.gcov --debug_dump It creates 2 files: * ytest.gcov which is 1600 bytes of size * ytest.gcov.imports which is empty Also there is no debug output from the program. If I run create_llvm_prof on the data create_llvm_prof --binary ytest --profile perf.data --out ytest.out --debug_dump It reports the following log: Length of symbol map: 1 Number of functions: 0 and creates an empty file ytest.out. Which is not true: all functions in the benchmark are marked with __attribute__((noinline)) and readelf says that they stay in the binary: readelf -s ytest | grep px_cycle 56: 00400640 111 FUNCGLOBAL DEFAULT 12 _Z8px_cyclei readelf -s ytest | grep py_cycle 60: 004006b036 FUNCGLOBAL DEFAULT 12 _Z8py_cyclev The size of resulting gcov data is the same (1600 bytes) for different levels of debug information (-g0, -g1, -g2) and for different input sources files. What am I doing wrong? -- Best regards, Ilya Palachev #define DX (480*4) #define DY (640*4) int* src = new int[DX*DY]; int* dst = new int[DX*DY]; int pxm = DX; int pym = DY; void px_cycle(int py) __attribute__((noinline)); void px_cycle(int py) { int *p1 = dst + (py*pxm); int *p2 = src + (pym - py - 1); for (int px = 0; px < pxm; px++) { if (px < pym && py < pxm) { *p1 = *p2; } p1++; p2 += pym; } } void py_cycle() __attribute__((noinline)); void py_cycle() { for (int py = 0; py < pym; py++) { px_cycle(py); } } int main() { int i; for (i = 0; i < 100; i++) { py_cycle(); } return 0; }
Re: AutoFDO profile toolchain is open-sourced
On Fri, Apr 10, 2015 at 3:43 PM, Jan Hubicka wrote: >> LBR is used for both cfg edge profiling and indirect call Target value >> profiling. > I see, that makes sense ;) I guess if we want to support profile collection > on targets w/o this feature we could still use one of the algorithms that > try to guess edge profile from BB profile. Our experience with sampling cycles or retired instructions to guess BB profile has not been great -- the profile quality is significantly worse than LBR (which can almost match instrumentation based profile). David > > Honza
Re: AutoFDO profile toolchain is open-sourced
On Tue, Apr 7, 2015 at 7:45 AM, Ilya Palachev wrote: > Hi, > > Here are some questions about AutoFDO. > > On 08.05.2014 02:55, Dehao Chen wrote: >> >> We have open-sourced AutoFDO profile toolchain in: >> >> https://github.com/google/autofdo >> >> For GCC developers, the most important tool is create_gcov, which >> converts sampling based profile to GCC-readable profile. Please refer >> to the readme file >> (https://raw.githubusercontent.com/google/autofdo/master/README) for >> more details. > > > In the mentioned README file it is said that " In order to collect this > profile, you will need to have an Intel CPU that have last branch record > (LBR) support." Is this information obsolete? Chrome Canary builds use > AutoFDO for ARMv7l > (https://code.google.com/p/chromium/issues/detail?id=434587) > > What about Aarch64 support? Is it supported? As mentioned by Sebastian, the current solution is to collect profile on Intel platform (with LBR support) and cross optimize arm/aarch64 target. AutoFDO support with other PMU events (cycles, retired instructions etc) still needs more tuning to match FDO performance. > >> To use the profile, one need to checkout >> https://gcc.gnu.org/svn/gcc/branches/google/gcc-4_8. We are working on >> porting AutoFDO to trunk >> (http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00438.html). > > > For now AutoFDO was merged into gcc-5.0 (trunk) branch. > Is it possible to backport it to 4.9 branch? Can you estimate required > efforts for that? The google gcc49 branch has the autofdo support. David > >> >> We have limited doc inside the open-sourced package, and we are >> planning to add more content to the wiki page >> (https://github.com/google/autofdo/wiki). Feel free to send me emails >> or discuss on github if you have any questions. >> >> Cheers, >> Dehao > > > -- > Best regards, > Ilya
Re: AutoFDO profile toolchain is open-sourced
> LBR is used for both cfg edge profiling and indirect call Target value > profiling. I see, that makes sense ;) I guess if we want to support profile collection on targets w/o this feature we could still use one of the algorithms that try to guess edge profile from BB profile. Honza
Re: AutoFDO profile toolchain is open-sourced
LBR is used for both cfg edge profiling and indirect call Target value profiling. David On Fri, Apr 10, 2015 at 3:26 PM, Xinliang David Li wrote: > LBR is used for both cfg edge profiling and indirect call Target value > profiling. > > David > > On Apr 10, 2015 10:39 AM, "Jan Hubicka" wrote: >> >> > On Tue, Apr 7, 2015 at 9:45 AM, Ilya Palachev >> > wrote: >> > > In the mentioned README file it is said that " In order to collect >> > > this >> > > profile, you will need to have an Intel CPU that have last branch >> > > record >> > > (LBR) support." Is this information obsolete? Chrome Canary builds use >> > > AutoFDO for ARMv7l >> > > (https://code.google.com/p/chromium/issues/detail?id=434587) >> > >> > It does not mean that the profile was recorded on an ARM system: they >> > can gather perf.data on x86 and then produce a coverage file that is >> > then used in ARM compiles. I tried it and seems to work well. >> >> I must say I did not even try running AutoFDO myself (so I am happy to >> hear >> it works). My understanding is that you need LBR only to get indirect >> call profiling working (i.e. you want to know from where the indirect >> function is called). >> >> Depending on your application this may not be the most important thing to >> record (either you don't have indirect calls in hot paths or they are >> handled >> resonably by speculative devirtualization) >> >> Some ARMs also has support for tracing jump pairs, right? >> Honza >> > >> > Sebastian
Re: AutoFDO profile toolchain is open-sourced
> On Tue, Apr 7, 2015 at 9:45 AM, Ilya Palachev wrote: > > In the mentioned README file it is said that " In order to collect this > > profile, you will need to have an Intel CPU that have last branch record > > (LBR) support." Is this information obsolete? Chrome Canary builds use > > AutoFDO for ARMv7l > > (https://code.google.com/p/chromium/issues/detail?id=434587) > > It does not mean that the profile was recorded on an ARM system: they > can gather perf.data on x86 and then produce a coverage file that is > then used in ARM compiles. I tried it and seems to work well. I must say I did not even try running AutoFDO myself (so I am happy to hear it works). My understanding is that you need LBR only to get indirect call profiling working (i.e. you want to know from where the indirect function is called). Depending on your application this may not be the most important thing to record (either you don't have indirect calls in hot paths or they are handled resonably by speculative devirtualization) Some ARMs also has support for tracing jump pairs, right? Honza > > Sebastian
Re: AutoFDO profile toolchain is open-sourced
Hi, Here are some questions about AutoFDO. On 08.05.2014 02:55, Dehao Chen wrote: We have open-sourced AutoFDO profile toolchain in: https://github.com/google/autofdo For GCC developers, the most important tool is create_gcov, which converts sampling based profile to GCC-readable profile. Please refer to the readme file (https://raw.githubusercontent.com/google/autofdo/master/README) for more details. In the mentioned README file it is said that " In order to collect this profile, you will need to have an Intel CPU that have last branch record (LBR) support." Is this information obsolete? Chrome Canary builds use AutoFDO for ARMv7l (https://code.google.com/p/chromium/issues/detail?id=434587) What about Aarch64 support? Is it supported? To use the profile, one need to checkout https://gcc.gnu.org/svn/gcc/branches/google/gcc-4_8. We are working on porting AutoFDO to trunk (http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00438.html). For now AutoFDO was merged into gcc-5.0 (trunk) branch. Is it possible to backport it to 4.9 branch? Can you estimate required efforts for that? We have limited doc inside the open-sourced package, and we are planning to add more content to the wiki page (https://github.com/google/autofdo/wiki). Feel free to send me emails or discuss on github if you have any questions. Cheers, Dehao -- Best regards, Ilya