Measuring and investigating performance is indeed difficult, and there is no single answer to how best to do it. I haven't heard of valgrind being used for this purpose, and don't know how to make sense of its output. I mostly use V8's builtin --prof and tools/linux-tick-processor, or when that's too coarse, the Linux perf tool (for the latter, see instructions in V8's wiki). --runtime-call-stats can also be highly useful for investigating certain situations.
On Mon, Aug 14, 2017 at 9:16 AM, <[email protected]> wrote: > Hi, > > I am trying to look at performance of V8. At this time, this is all on > X86. To build V8, I am using NDK compiler that comes when I download V8. > That is pretty straightforward process as mentioned. Once I build "release" > or "debug" V8, I ran mandreel and codeload Octance benchmarks with > valgrind. > Be aware that Release and Debug mode have vastly different performance profiles. Only Release mode is representative of real-world performance. > Then I use cg_annotate to look at Instruction count, data read/write, > branches etc. I am just curious where the bottlenecks are. I get the > following numbers for mandreel. > Also, note that "the bottlenecks" can be *very* different depending on what test/benchmark you run. > > ------------------------------------------------------------ > -------------------- > Ir Dr Dw Bi Bc > ------------------------------------------------------------ > -------------------- > 6,737,556,200 1,975,238,610 1,015,483,012 65,020,470 941,081,112 PROGRAM > TOTALS > > ------------------------------------------------------------ > -------------------- > Ir Dr Dw Bi Bc file:function > ------------------------------------------------------------ > -------------------- > 2,750,360,898 769,151,058 269,297,660 45,068,095 410,353,595 ???:??? > 165,387,529 59,780,937 25,440,838 9,340 27,302,136 > ???:v8::internal::Scanner::ScanIdentifierOrKeyword() > 156,426,896 43,240,550 25,588,801 0 17,909,648 > > ???:v8::internal::ExpressionClassifier<v8::internal::ParserTypes<v8::internal::Parser> > >::Accumulate(v8::internal::ExpressionClassifier<v8::internal::ParserTypes<v8:: > internal::Parser> >*, unsigned int, bool) > 150,476,281 44,181,365 28,594,154 3,145,871 9,250,952 > ???:v8::internal::Scanner::Scan() > .. > .. > > > It shows there there are 2.7billion instructions for ???:???. I am > guessing that these are the instructions that are reponsible for hidden > classes, loading/creating/deleting/accessing etc. If my guess is not > correct, please let me know. And then there are usual 165 million for > v8::internal::Scanner::ScanIdentifierOrKeyword(),150 million for scan() > etc. > > However, what is interesting is this - on the subsequent run on the same > machine, I see the following numbers: > ------------------------------------------------------------ > -------------------- > Ir Dr Dw Bi Bc > ------------------------------------------------------------ > -------------------- > 12,369,840,202 3,844,013,654 1,701,470,472 248,336,756 1,605,561,615 > PROGRAM TOTALS > > ------------------------------------------------------------ > -------------------- > Ir Dr Dw Bi Bc > file:function > ------------------------------------------------------------ > -------------------- > 6,361,147,238 2,029,054,548 684,679,708 228,306,958 762,170,252 ???:??? > 690,157,426 260,436,764 130,218,388 0 91,152,866 > ???:v8::internal::Runtime_TryInstallOptimizedCode(int, > v8::internal::Object**, v8::internal::Isolate*) > 470,287,848 104,517,164 26,109,566 0 91,422,886 > /build/glibc-bfm8X4/glibc-2.23/nptl/../nptl/pthread_mutex_ > lock.c:pthread_mutex_lock > 365,612,760 104,464,484 13,067,997 0 104,438,144 > /build/glibc-bfm8X4/glibc-2.23/nptl/pthread_mutex_unlock. > c:pthread_mutex_unlock > 364,611,492 91,152,878 104,174,716 0 13,021,838 > ???:v8::internal::StackGuard::CheckAndClearInterrupt(v8:: > internal::StackGuard::InterruptFlag) > 165,387,529 59,780,937 25,440,838 9,340 27,302,136 > ???:v8::internal::Scanner::ScanIdentifierOrKeyword() > 156,426,896 43,240,550 25,588,801 0 17,909,648 > > ???:v8::internal::ExpressionClassifier<v8::internal::ParserTypes<v8::internal::Parser> > >::Accumulate(v8::internal::ExpressionClassifier<v8::internal::ParserTypes<v8:: > internal::Parser> >*, unsigned int, bool) > 150,476,281 44,181,365 28,594,154 3,145,871 9,250,952 > ???:v8::internal::Scanner::Scan() > > > All the numbers to the V8 functions remained the same; however all hidden > classes numbers and calls to the library went off the chart. For eg, > instructions dealing with hidden classes went from 2.7 billion to 6.3 > billion. I don't see how using the same benchmark would make such huge > difference in the numbers. Can anyone please explain? > > Also, how does V8 community look at performance numbers if not using > standard performance monitoring tools like valgrind? Is there any other way > to look at performance numbers? > > Thanks. > Sirish > > -- > -- > v8-dev mailing list > [email protected] > http://groups.google.com/group/v8-dev > --- > You received this message because you are subscribed to the Google Groups > "v8-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- -- v8-dev mailing list [email protected] http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
