Thanks Jakob. As per instruction, I had to build perf to use perf inject. 
That does combine both jitted, and non jitted instructions, which is good 
to look at. However, in my opinion valgrind might be more accurate as it 
 measures actual number of executed instructions vs perf counting samples, 
especially in perf record. Also, I could not find a way to look at per 
function samples on perf. I can see annotation per instruction basis. Maybe 
there is none. 

I appreciate you telling me about   tools/linux-tick-processor   and  
--runtime-call-stats. Those are quite helpful. A quick followup question 
though - at this time I am interesting in parsing side of V8 before it gets 
into actual objects creating/bytecode/hidden classes. Are  you aware of any 
--prof options that will give me more detail about the parsing in V8? 

Thanks.

Sirish 


On Monday, August 14, 2017 at 1:29:20 PM UTC-5, Jakob Kummerow wrote:
>
> Measuring and investigating performance is indeed difficult, and there is 
> no single answer to how best to do it. I haven't heard of valgrind being 
> used for this purpose, and don't know how to make sense of its output. I 
> mostly use V8's builtin --prof and tools/linux-tick-processor, or when 
> that's too coarse, the Linux perf tool (for the latter, see instructions 
> in V8's wiki). --runtime-call-stats can also be highly useful for 
> investigating certain situations.
>
> On Mon, Aug 14, 2017 at 9:16 AM, <[email protected] <javascript:>> wrote:
>
>> Hi, 
>>
>> I am trying to look at performance of V8. At this time, this is all on 
>> X86. To build V8, I am using NDK compiler that comes when I download V8. 
>> That is pretty straightforward process as mentioned. Once I build "release" 
>> or  "debug" V8, I ran mandreel and codeload Octance benchmarks with 
>> valgrind. 
>>
>
> Be aware that Release and Debug mode have vastly different performance 
> profiles. Only Release mode is representative of real-world performance.
>  
>
>> Then I use cg_annotate to look at Instruction count, data read/write, 
>> branches etc.  I am just curious where the bottlenecks are. I get the 
>> following numbers for mandreel. 
>>
>
> Also, note that "the bottlenecks" can be *very* different depending on 
> what test/benchmark you run.
>  
>
>>
>>
>> --------------------------------------------------------------------------------
>>            Ir            Dr            Dw         Bi          Bc
>>
>> --------------------------------------------------------------------------------
>> 6,737,556,200 1,975,238,610 1,015,483,012 65,020,470 941,081,112  PROGRAM 
>> TOTALS
>>
>>
>> --------------------------------------------------------------------------------
>>            Ir          Dr          Dw         Bi          Bc 
>>  file:function
>>
>> --------------------------------------------------------------------------------
>> 2,750,360,898 769,151,058 269,297,660 45,068,095 410,353,595  ???:???
>>   165,387,529  59,780,937  25,440,838      9,340  27,302,136 
>>  ???:v8::internal::Scanner::ScanIdentifierOrKeyword()
>>   156,426,896  43,240,550  25,588,801          0  17,909,648 
>>  
>> ???:v8::internal::ExpressionClassifier<v8::internal::ParserTypes<v8::internal::Parser>
>>  
>> >::Accumulate(v8::internal::ExpressionClassifier<v8::internal::ParserTypes<v8::
>> > 
>>         internal::Parser> >*, unsigned int, bool)
>>   150,476,281  44,181,365  28,594,154  3,145,871   9,250,952 
>>  ???:v8::internal::Scanner::Scan()
>> ..
>> ..
>>
>>
>> It shows there there are 2.7billion instructions for ???:???. I am 
>> guessing that these are the instructions that are reponsible for hidden 
>> classes, loading/creating/deleting/accessing etc. If my guess is not 
>> correct, please let me know. And then there are usual 165 million for 
>> v8::internal::Scanner::ScanIdentifierOrKeyword(),150 million for scan() etc.
>>
>> However, what is interesting is this - on the subsequent run on the same 
>> machine, I see the following numbers:
>>
>> --------------------------------------------------------------------------------
>>             Ir            Dr            Dw          Bi            Bc
>>
>> --------------------------------------------------------------------------------
>> 12,369,840,202 3,844,013,654 1,701,470,472 248,336,756 1,605,561,615 
>>  PROGRAM TOTALS
>>
>>
>> --------------------------------------------------------------------------------
>>            Ir            Dr          Dw          Bi          Bc 
>>  file:function
>>
>> --------------------------------------------------------------------------------
>> 6,361,147,238 2,029,054,548 684,679,708 228,306,958 762,170,252  ???:???
>>   690,157,426   260,436,764 130,218,388           0  91,152,866 
>>  ???:v8::internal::Runtime_TryInstallOptimizedCode(int, 
>> v8::internal::Object**, v8::internal::Isolate*)
>>   470,287,848   104,517,164  26,109,566           0  91,422,886 
>>  
>> /build/glibc-bfm8X4/glibc-2.23/nptl/../nptl/pthread_mutex_lock.c:pthread_mutex_lock
>>   365,612,760   104,464,484  13,067,997           0 104,438,144 
>>  
>> /build/glibc-bfm8X4/glibc-2.23/nptl/pthread_mutex_unlock.c:pthread_mutex_unlock
>>   364,611,492    91,152,878 104,174,716           0  13,021,838 
>>  
>> ???:v8::internal::StackGuard::CheckAndClearInterrupt(v8::internal::StackGuard::InterruptFlag)
>>   165,387,529    59,780,937  25,440,838       9,340  27,302,136 
>>  ???:v8::internal::Scanner::ScanIdentifierOrKeyword()
>>   156,426,896    43,240,550  25,588,801           0  17,909,648 
>>  
>> ???:v8::internal::ExpressionClassifier<v8::internal::ParserTypes<v8::internal::Parser>
>>  
>> >::Accumulate(v8::internal::ExpressionClassifier<v8::internal::ParserTypes<v8::
>> > 
>>      internal::Parser> >*, unsigned int, bool)
>>   150,476,281    44,181,365  28,594,154   3,145,871   9,250,952 
>>  ???:v8::internal::Scanner::Scan()
>>
>>
>> All the numbers to the V8 functions remained the same; however all hidden 
>> classes numbers and calls to the library went off the chart. For eg, 
>> instructions dealing with hidden classes went from 2.7 billion to 6.3 
>> billion. I don't see how using the same benchmark would make such huge 
>> difference in the numbers. Can anyone please explain? 
>>
>> Also, how does V8 community look at performance numbers if not using 
>> standard performance monitoring tools like valgrind? Is there any other way 
>> to look at performance numbers?
>>
>> Thanks.
>> Sirish
>>
>> -- 
>> -- 
>> v8-dev mailing list
>> [email protected] <javascript:>
>> http://groups.google.com/group/v8-dev
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "v8-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
-- 
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to