Hi,

I simply analyze the instrumentation overhead. It is caused by a number of
profile initialization functions (__profile_init and __profile_pu_init) in
the hot loop body.

*Analysis:*
The instrumentation is called before VHO by default. Each PU has its
own __profile_init() and __profile_pu_init(). For FTensor, there are many
_init() functions in the hot loop body and thus introduce high overhead
after intensive inlining.

for i=1...100000
   foo(); bar(); zoo();
->
for i=1...100000

 
__profile_init();__profile_pu_init();...;__profile_init();__profile_pu_init();...;__profile_init();__profile_pu_init();...;

You can use -fb_phase=1 to instrument *before LNO which is after inlining*.
It is about 10 time slower. But the performance gain is only 2% on an old
Opteron machine.

The generated SSE instructions in the loop body is key to the performance
and need further investigation for the tuning.

*Thought:*
For C++ programs, many function bodies are very small. Instrumentation
overhead is high if these small functions are called in the hot region. I
suggest to perform "simple" inlining before instrumentation to reduce the
overhead according to my experience in another compiler.

I also disable value profiling for evaluation, and it is light-weighted for
the case.

BTW: Does open64 remove option -fb_type=N?  I want to disable value profile
with this option, but opencc complains libcginstr.so cannot be found. I
skim through the code and find value profile is always enabled for
WN_Instrument.  libcginstr is not handled by *configure *and not in the
osprey/targdir_lib2. That is, CG_* profile cannot work due to the lack of
libcginstr.so. Is the library deprecated?  Maybe I'm out...

Please correct me if I'm wrong.

==

I’d like to take this opportunity to ask a question.
Open64 support instrumentation in four phases (VHO, LNO, WOPT, CG). What
was the motivation and driven force? Could you share knowledge or
experience on their pros and cons in reality?  And why BEFORE_VHO is set as
default fb_phase?

Thanks a ton!

On Tue, Jun 26, 2012 at 6:20 PM, Sun Chan <sun.c...@gmail.com> wrote:

> someone must be doing value profile (memory op profile) to get to
> these kind of slow down. Of course, it could be something really
> stupid. My recollection is, it should be no more that 5 times slower
> back then
> Sun
>
> On Tue, Jun 26, 2012 at 5:33 PM, Jian-Xin Lai <laij...@gmail.com> wrote:
> > Yes, you are right. I measured PGO on both "-O3 -OPT:Ofast" and
> > "-Ofast" and found the PGO for "-Ofast" is much slower than 20x.
> >
> > For Tensor 3:
> > fb_create run:
> > real    52m39.138s
> > user    52m36.841s
> > sys     0m0.232s
> > fb_opt run:
> > real    0m7.622s
> > user    0m7.572s
> > sys     0m0.000s
> >
> > I haven't check why the overhead is so high.
> >
> > 2012/6/21 Walter Landry <wlan...@caltech.edu>:
> >> Jian-Xin Lai <laij...@gmail.com> wrote:
> >>> I tried the Open64 PGO on these benchmarks. Basically, the training
> >>> executable runs about 20 times slower. I guess the overhead of open64
> >>> PGO is comparable as ICC.
> >>
> >> What are the exact options you used when trying PGO?  I found that the
> >> C-tran code was about 20 times slower, but the expression template
> >> code was much worse than that.
> >>
> >>> But there is not much performance gain from Open64 PGO. Since all
> >>> test cases are single file, "-O3 -OPT:Ofast" may works better.
> >>
> >> That what I would have thought, but the FTensor results for the Intel
> >> compiler were much, much improved with PGO.
> >>
> >> Thanks,
> >> Walter Landry
> >>
> >>
> ------------------------------------------------------------------------------
> >> Live Security Virtual Conference
> >> Exclusive live event will cover all the ways today's security and
> >> threat landscape has changed and how IT managers can respond.
> Discussions
> >> will include endpoint security, mobile security and the latest in
> malware
> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> _______________________________________________
> >> Open64-devel mailing list
> >> Open64-devel@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/open64-devel
> >
> >
> >
> > --
> > Regards,
> > Lai Jian-Xin
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > Open64-devel mailing list
> > Open64-devel@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/open64-devel
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Open64-devel mailing list
> Open64-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/open64-devel
>



-- 
Regards,
Peng Yuan (袁鹏)
<http://www.wugu123.com>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Open64-devel mailing list
Open64-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/open64-devel

Reply via email to