On Wednesday, August 30, 2017 at 7:28:41 PM UTC-5, Jakob Kummerow wrote:
>
> On Wed, Aug 30, 2017 at 2:04 PM, <[email protected] <javascript:>> wrote:
>
>> valgrind might be more accurate as it measures actual number of executed
>> instructions vs perf counting samples
>>
>
> Well, that means it measures something different. Ultimately the number of
> executed instructions doesn't matter; what matters is how long it took.
>
Agreed, ultimately what matters is how long it takes to run a program.
Valgrind is a tool for me to look at which are the "most time consuming"
part of code.
>
>
>> Are you aware of any --prof options that will give me more detail about
>> the parsing in V8?
>>
>
> No. What kind of detail are you looking for? --prof gives you
> per-function tick counts, is that not detailed enough?
>
Yes, per-function tick counts are great. But the benchmark that I am using
(Octane) does not really represent the real world scenarios. Ideally, I
want to look at per-function tick counting when running speedometer2.0,
but I have not gotten to a point running speedometer 2.0 on a command line.
I also think runtime-call-stack measures the time for the functions that v8
"wants" to measure. For example, when I run Octane/typescirpt-compiler, it
measures ParseFunctionLiteral as the "hottest" function taking 270ms
(34.4%) and being called 1385 times. It looks like its taking measurement
is based on the calls/counters (RuntimeCallStats) that is placed at the
entry of the functions that it wants to instrument eg.
ParseFunctionLiteral. If we don't instrument a function (say
ParseAssignmentExpression), we will never see that function is hot (eg.
ParseAssignmentExpression is never seen as hot with --call-runtime-stats).
Maybe v8 developers have already done this analysis and have come to
conclusion that ParseAssignmentExpress is not hot enough, and can be
ignored to get its call stats.
Here's the output that I get from --prof --runtime-call-stats for
Octane/typescipt-compiler
Runtime Function/C++ Builtin Time
Count
========================================================================================
ParseFunctionLiteral 270.44ms 34.30%
1385 15.15%
StoreIC_Premonomorphic 130.69ms 16.58%
1224 13.39%
API_ScriptCompiler_CompileUnbound 74.92ms 9.50%
3 0.03%
PreParseWithVariableResolution 46.62ms 5.91%
393 4.30%
GC_Custom_SlowAllocateRaw 36.87ms 4.68%
14 0.15%
However, tools/linux-tick-processer does not show me that
ParseFunctionLiteral is as "hot". If it was taking 34% of the time for
execution, then it would have more ticks on its execution.
Here's the output from tools/linux-tick-processor
Statistical profiling result from v8.log, (801 ticks, 224 unaccounted, 0
excluded).
[Shared libraries]:
ticks total nonlib name
7 0.9% /lib/x86_64-linux-gnu/libc-2.23.so
1 0.1% [vdso]
[JavaScript]:
ticks total nonlib name
1 0.1% 0.1% Builtin: LoadIC_Uninitialized
[C++]:
ticks total nonlib name
341 42.6% 43.0%
v8::Shell::Load(v8::FunctionCallbackInfo<v8::Value> const&)
13 1.6% 1.6% _init
9 1.1% 1.1% v8::internal::MemoryAllocator::ZapBlock(unsigned
char*, unsigned long)
9 1.1% 1.1% std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >*
v8::base::CheckEQImpl<bool, bool>(bool, bool, char const*)
8 1.0% 1.0% v8::base::Acquire_Load(long const volatile*)
7 0.9% 0.9%
v8::internal::SmiTagging<8ul>::SmiToInt(v8::internal::Object const*)
So, I am a little confused about how to corelate performance data from the
same run using --runtime-call-stats, and linux-tick-processor. Is there
anything I am missing?
Because these data was inconsistent, I went back to measuring V8 with
valgrind and got the hottest function from valgrind. All I did this this:
valgrind --tool=callgrind ./d8 run.js
kcachegrind callgrind.out.XXXX
It does show that ParseFunctionLiteral is hot, but the hottest one is
ParseAssignmentExpression. I understand you have not used valgrind to
measure performance, and you have not seen this data. So now, I don't know
which one is the hottest function in Parsing, but I do now it is Parsing in
V8 which is very expensive. Hence, I wanted to know if there are specific
options that measures parsing in a little more detail.
Sirish
>
>
>>
>> Thanks.
>>
>> Sirish
>>
>>
>> On Monday, August 14, 2017 at 1:29:20 PM UTC-5, Jakob Kummerow wrote:
>>>
>>> Measuring and investigating performance is indeed difficult, and there
>>> is no single answer to how best to do it. I haven't heard of valgrind being
>>> used for this purpose, and don't know how to make sense of its output. I
>>> mostly use V8's builtin --prof and tools/linux-tick-processor, or when
>>> that's too coarse, the Linux perf tool (for the latter, see
>>> instructions in V8's wiki). --runtime-call-stats can also be highly
>>> useful for investigating certain situations.
>>>
>>> On Mon, Aug 14, 2017 at 9:16 AM, <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to look at performance of V8. At this time, this is all on
>>>> X86. To build V8, I am using NDK compiler that comes when I download V8.
>>>> That is pretty straightforward process as mentioned. Once I build
>>>> "release"
>>>> or "debug" V8, I ran mandreel and codeload Octance benchmarks with
>>>> valgrind.
>>>>
>>>
>>> Be aware that Release and Debug mode have vastly different performance
>>> profiles. Only Release mode is representative of real-world performance.
>>>
>>>
>>>> Then I use cg_annotate to look at Instruction count, data read/write,
>>>> branches etc. I am just curious where the bottlenecks are. I get the
>>>> following numbers for mandreel.
>>>>
>>>
>>> Also, note that "the bottlenecks" can be *very* different depending on
>>> what test/benchmark you run.
>>>
>>>
>>>>
>>>>
>>>> --------------------------------------------------------------------------------
>>>> Ir Dr Dw Bi Bc
>>>>
>>>> --------------------------------------------------------------------------------
>>>> 6,737,556,200 1,975,238,610 1,015,483,012 65,020,470 941,081,112
>>>> PROGRAM TOTALS
>>>>
>>>>
>>>> --------------------------------------------------------------------------------
>>>> Ir Dr Dw Bi Bc
>>>> file:function
>>>>
>>>> --------------------------------------------------------------------------------
>>>> 2,750,360,898 769,151,058 269,297,660 45,068,095 410,353,595 ???:???
>>>> 165,387,529 59,780,937 25,440,838 9,340 27,302,136
>>>> ???:v8::internal::Scanner::ScanIdentifierOrKeyword()
>>>> 156,426,896 43,240,550 25,588,801 0 17,909,648
>>>>
>>>> ???:v8::internal::ExpressionClassifier<v8::internal::ParserTypes<v8::internal::Parser>
>>>>
>>>> >::Accumulate(v8::internal::ExpressionClassifier<v8::internal::ParserTypes<v8::
>>>> >
>>>> internal::Parser> >*, unsigned int, bool)
>>>> 150,476,281 44,181,365 28,594,154 3,145,871 9,250,952
>>>> ???:v8::internal::Scanner::Scan()
>>>> ..
>>>> ..
>>>>
>>>>
>>>> It shows there there are 2.7billion instructions for ???:???. I am
>>>> guessing that these are the instructions that are reponsible for hidden
>>>> classes, loading/creating/deleting/accessing etc. If my guess is not
>>>> correct, please let me know. And then there are usual 165 million for
>>>> v8::internal::Scanner::ScanIdentifierOrKeyword(),150 million for scan()
>>>> etc.
>>>>
>>>> However, what is interesting is this - on the subsequent run on the
>>>> same machine, I see the following numbers:
>>>>
>>>> --------------------------------------------------------------------------------
>>>> Ir Dr Dw Bi Bc
>>>>
>>>> --------------------------------------------------------------------------------
>>>> 12,369,840,202 3,844,013,654 1,701,470,472 248,336,756 1,605,561,615
>>>> PROGRAM TOTALS
>>>>
>>>>
>>>> --------------------------------------------------------------------------------
>>>> Ir Dr Dw Bi Bc
>>>> file:function
>>>>
>>>> --------------------------------------------------------------------------------
>>>> 6,361,147,238 2,029,054,548 684,679,708 228,306,958 762,170,252 ???:???
>>>> 690,157,426 260,436,764 130,218,388 0 91,152,866
>>>> ???:v8::internal::Runtime_TryInstallOptimizedCode(int,
>>>> v8::internal::Object**, v8::internal::Isolate*)
>>>> 470,287,848 104,517,164 26,109,566 0 91,422,886
>>>>
>>>> /build/glibc-bfm8X4/glibc-2.23/nptl/../nptl/pthread_mutex_lock.c:pthread_mutex_lock
>>>> 365,612,760 104,464,484 13,067,997 0 104,438,144
>>>>
>>>> /build/glibc-bfm8X4/glibc-2.23/nptl/pthread_mutex_unlock.c:pthread_mutex_unlock
>>>> 364,611,492 91,152,878 104,174,716 0 13,021,838
>>>>
>>>> ???:v8::internal::StackGuard::CheckAndClearInterrupt(v8::internal::StackGuard::InterruptFlag)
>>>> 165,387,529 59,780,937 25,440,838 9,340 27,302,136
>>>> ???:v8::internal::Scanner::ScanIdentifierOrKeyword()
>>>> 156,426,896 43,240,550 25,588,801 0 17,909,648
>>>>
>>>> ???:v8::internal::ExpressionClassifier<v8::internal::ParserTypes<v8::internal::Parser>
>>>>
>>>> >::Accumulate(v8::internal::ExpressionClassifier<v8::internal::ParserTypes<v8::
>>>> >
>>>> internal::Parser> >*, unsigned int, bool)
>>>> 150,476,281 44,181,365 28,594,154 3,145,871 9,250,952
>>>> ???:v8::internal::Scanner::Scan()
>>>>
>>>>
>>>> All the numbers to the V8 functions remained the same; however all
>>>> hidden classes numbers and calls to the library went off the chart. For
>>>> eg,
>>>> instructions dealing with hidden classes went from 2.7 billion to 6.3
>>>> billion. I don't see how using the same benchmark would make such huge
>>>> difference in the numbers. Can anyone please explain?
>>>>
>>>> Also, how does V8 community look at performance numbers if not using
>>>> standard performance monitoring tools like valgrind? Is there any other
>>>> way
>>>> to look at performance numbers?
>>>>
>>>> Thanks.
>>>> Sirish
>>>>
>>>> --
>>>> --
>>>> v8-dev mailing list
>>>> [email protected]
>>>> http://groups.google.com/group/v8-dev
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "v8-dev" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>> --
>> v8-dev mailing list
>> [email protected] <javascript:>
>> http://groups.google.com/group/v8-dev
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "v8-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.