On Thu, 2006-07-13 at 18:11 -0500, Federico Mena Quintero wrote: > On Thu, 2006-07-13 at 20:16 +0100, Richard Purdie wrote: > > > Also of note is that the time spent in no-vmlinux increased > > significantly (11%). These tests were done against a hardfloat image so > > floating point instructions cause exceptions and get handled in kernel > > space. That jump is extremely likely to be the consequence of a > > significant increase in floating instruction usage. > > But is that really floating point handlers in the kernel, or is it > something else?
To prove it, I asked Jorn to run the tests again but resolving the kernel symbols. The results are at: http://www.o-hand.com/~jorn/pango-benchmarks/28/full-report.txt These functions are all part of the floating point emulator: 4255 1.7651 vmlinux do_fpe 3316 1.3756 vmlinux PerformLDF 3120 1.2943 vmlinux EmulateAll 2500 1.0371 vmlinux EmulateCPRT 2058 0.8537 vmlinux addFloat64Sigs 2002 0.8305 vmlinux EmulateCPDO 1937 0.8035 vmlinux DoubleCPDO 1863 0.7728 vmlinux EmulateCPDT 1515 0.6285 vmlinux roundAndPackFloat64 1296 0.5376 vmlinux checkCondition 1247 0.5173 vmlinux float64_mul 1214 0.5036 vmlinux PerformSTF 1175 0.4874 vmlinux emulate 865 0.3588 vmlinux float64_to_int32 839 0.3481 vmlinux SetRoundingMode 625 0.2593 vmlinux float64_add 620 0.2572 vmlinux PerformFIX 515 0.2136 vmlinux roundAndPackInt32 451 0.1871 vmlinux SetRoundingPrecision 377 0.1564 vmlinux float64_is_nan 372 0.1543 vmlinux PerformSFM 369 0.1531 vmlinux PerformLFM 355 0.1473 vmlinux nwfpe_enter 336 0.1394 vmlinux ret_from_exception 299 0.1240 vmlinux PerformFLT 284 0.1178 vmlinux float_raise At least some of the processor load in the following is also related to the above as the context switches and userspace data transfer would trigger them: 17374 7.2074 vmlinux xscale_mc_clear_user_page 1813 0.7521 vmlinux __flush_whole_cache 815 0.3381 vmlinux __dabt_usr 701 0.2908 vmlinux unmap_vmas 785 0.3256 vmlinux mc_copy_user_page 747 0.3099 vmlinux free_hot_cold_page 658 0.2730 vmlinux update_mmu_cache 631 0.2618 vmlinux get_page_from_freelist 535 0.2219 vmlinux xscale_flush_user_cache_range 593 0.2460 vmlinux cpu_xscale_switch_mm 573 0.2377 vmlinux schedule 443 0.1838 vmlinux __handle_mm_fault 392 0.1626 vmlinux __get_user_4 313 0.1298 vmlinux __arch_copy_to_user 293 0.1215 vmlinux find_vma So of the 30% of the time spent in the kernel, a significant fraction is spent in the floating point code, as I suspected. > Does oprofile give you stack traces for where each function is called? > Sysprof gives you that and it is fantastic. oprofile does give stack traces, both in user and kernel space although someone has broken the kernel support for it on arm in recent kernels. I'm looking into fixing it. > [Somone should port Sysprof to the ARM; it can't be hard.] Technically, oprofile can do everything sysprof can and a lot more besides (plus it already works on ARM). oprofile as a profiler is therefore the better piece of software. sysprof has a pretty GUI though which seems to attract people more than the capabilities. The real task is for someone to write a nice GUI for viewing oprofile traces. The problem is that sysprof's UI can't handle all the added data oprofile can provide and oprofile's power is a disadvantage when people have tried to design a good GUI for it. For reference, oprofile's data collection and analysis are two totally separate programs and can run on different machines (of different architectures). Richard _______________________________________________ Performance-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/performance-list
