Hi,
here are my latest results:
I included xeno_nucleus and xeno_native into the kernel. (see first oprofile
result).
(I also added debug information to the kernel which was of no additional help).
I detected that most of the time was spent in __ipipe_hard_cpuid.
I looked at that routine and broke that using 2 additional helper functions
that are use
to "monitor" the apic_read() resp. the GET_APIC_ID calls (see patch below).
The results of this experiment can be found at "second oprofile result" below.
When I interpret the results correctly, then it looks as if the apic_read() is
actually
eating up the performance. As this call "leaves" the CPU internal "area" and
accesses the external
APIC this sounds sensible.
I think this is done here to detect the current CPU.
Is it possible to detect differently or to store that information somehow with
a thread (TLS)
to avoid requesting it frequently?
I hope that helps a little bit to identify this issue and (perhaps) to find a
faster solution.
Thanks for all feedback on this!
Regards
Mathias
------------ Begin of patch -----------------
--- ipipe.c.orig 2007-05-09 16:16:32.000000000 +0200
+++ ipipe.c 2007-05-11 08:47:41.000000000 +0200
@@ -72,13 +72,30 @@
int (*__ipipe_logical_cpuid)(void) = &__ipipe_boot_cpuid;
+
+unsigned long __ipipe_hard_cpuid_apic_read(void)
+{
+ return apic_read(APIC_ID);
+}
+
+unsigned __ipipe_hard_cpuid_get_apic_id(unsigned long apic)
+{
+ return GET_APIC_ID(apic);
+}
+
+
static notrace int __ipipe_hard_cpuid(void)
{
unsigned long flags;
int cpu;
+ unsigned long apic;
+ unsigned apic_id;
local_irq_save_hw_notrace(flags);
- cpu = __ipipe_apicid_2_cpu[GET_APIC_ID(apic_read(APIC_ID))];
+ // cpu = __ipipe_apicid_2_cpu[GET_APIC_ID(apic_read(APIC_ID))];
+ apic = __ipipe_hard_cpuid_apic_read();
+ apic_id = __ipipe_hard_cpuid_get_apic_id(apic);
+ cpu = __ipipe_apicid_2_cpu[apic_id];
local_irq_restore_hw_notrace(flags);
return cpu;
}
--------- End of patch
------------- First oprofile result: -----------------------------
Using default event: GLOBAL_POWER_EVENTS:100000:1:1:1
Daemon started.
Profiler running.
delta is 50404054895 per step: 5040
Stopping profiling.
CPU: P4 / Xeon, speed 3192.16 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (ma
ndatory) count 100000
samples % image name app name symbol name
6273 56.8928 vmlinux vmlinux
__ipipe_hard_cpuid
900 8.1625 vmlinux vmlinux rt_sem_v
694 6.2942 vmlinux vmlinux
xnregistry_fetch
321 2.9113 vmlinux vmlinux
__ipipe_dispatch_event
293 2.6574 bash bash (no symbols)
250 2.2674 vmlinux vmlinux
hrtimer_run_queues
227 2.0588 libc-2.3.6.so libc-2.3.6.so (no symbols)
140 1.2697 vmlinux vmlinux delay_tsc
123 1.1155 libnative.so.0.0.0 libnative.so.0.0.0 rt_sem_p
103 0.9342 vmlinux vmlinux
__ipipe_stall_root
78 0.7074 vmlinux vmlinux
__ipipe_test_and_stall_root
67 0.6077 vmlinux vmlinux
apic_timer_interrupt
66 0.5986 vmlinux vmlinux
sysenter_past_esp
60 0.5442 vmlinux vmlinux
__ipipe_restore_pipeline_head
56 0.5079 vmlinux vmlinux do_wp_page
56 0.5079 vmlinux vmlinux
search_by_key
54 0.4898 oprofiled oprofiled (no symbols)
41 0.3718 vmlinux vmlinux
__ipipe_sync_stage
37 0.3356 ld-2.3.6.so ld-2.3.6.so do_lookup_x
37 0.3356 vmlinux vmlinux
__handle_mm_fault
25 0.2267 vmlinux vmlinux
__ipipe_handle_exception
25 0.2267 vmlinux vmlinux
find_get_page
25 0.2267 vmlinux vmlinux
get_page_from_freelist
25 0.2267 vmlinux vmlinux
run_timer_softirq
25 0.2267 vmlinux vmlinux
scheduler_tick
24 0.2177 vmlinux vmlinux
sysenter_exit
23 0.2086 vmlinux vmlinux
__ipipe_syscall_root
23 0.2086 vmlinux vmlinux
__ipipe_unstall_root
22 0.1995 libnative.so.0.0.0 libnative.so.0.0.0 rt_sem_v
21 0.1905 vmlinux vmlinux
__ipipe_test_root
21 0.1905 vmlinux vmlinux
ata_bmdma_start
19 0.1723 ld-2.3.6.so ld-2.3.6.so strcmp
19 0.1723 vmlinux vmlinux
ata_altstatus
19 0.1723 vmlinux vmlinux
ata_bmdma_irq_clear
18 0.1633 oprofile oprofile (no symbols)
18 0.1633 vmlinux vmlinux find_vma
18 0.1633 vmlinux vmlinux
flush_tlb_page
18 0.1633 vmlinux vmlinux
release_pages
18 0.1633 vmlinux vmlinux unmap_vmas
17 0.1542 ld-2.3.6.so ld-2.3.6.so
_dl_relocate_object
17 0.1542 vmlinux vmlinux
__ipipe_unstall_iret_root
---------------------------------------------
------------- Second oprofile result: -----------------------------
Using default event: GLOBAL_POWER_EVENTS:100000:1:1:1
Daemon started.
Profiler running.
delta is 51846556098 per step: 5184
Stopping profiling.
CPU: P4 / Xeon, speed 3192.33 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (ma
ndatory) count 100000
samples % image name app name symbol name
4350 39.4307 vmlinux vmlinux
__ipipe_hard_cpuid_apic_read
2336 21.1748 vmlinux vmlinux
__ipipe_hard_cpuid
306 2.7737 bash bash (no symbols)
287 2.6015 vmlinux vmlinux
__ipipe_dispatch_event
276 2.5018 vmlinux vmlinux
sysenter_past_esp
269 2.4384 vmlinux vmlinux
hrtimer_run_queues
264 2.3930 vmlinux vmlinux
__ipipe_syscall_root
245 2.2208 vmlinux vmlinux
xnregistry_fetch
209 1.8945 libc-2.3.6.so libc-2.3.6.so (no symbols)
173 1.5682 vmlinux vmlinux rt_sem_v
154 1.3959 vmlinux vmlinux
__ipipe_restore_pipeline_head
128 1.1603 vmlinux vmlinux
__copy_from_user_ll_nozero
102 0.9246 vmlinux vmlinux delay_tsc
100 0.9065 vmlinux vmlinux
__ipipe_stall_root
100 0.9065 vmlinux vmlinux
hisyscall_event
90 0.8158 vmlinux vmlinux
apic_timer_interrupt
89 0.8067 vmlinux vmlinux
__ipipe_test_and_stall_root
80 0.7252 vmlinux vmlinux rt_sem_p
74 0.6708 vmlinux vmlinux
sysenter_exit
58 0.5257 vmlinux vmlinux do_wp_page
53 0.4804 oprofiled oprofiled (no symbols)
53 0.4804 vmlinux vmlinux
search_by_key
50 0.4532 vmlinux vmlinux
__ipipe_sync_stage
40 0.3626 vmlinux vmlinux __rt_sem_v
39 0.3535 vmlinux vmlinux __rt_sem_p
31 0.2810 ld-2.3.6.so ld-2.3.6.so do_lookup_x
30 0.2719 vmlinux vmlinux
find_get_page
27 0.2447 vmlinux vmlinux
ata_bmdma_start
24 0.2175 vmlinux vmlinux
__ipipe_unstall_root
24 0.2175 vmlinux vmlinux
run_timer_softirq
23 0.2085 vmlinux vmlinux unmap_vmas
22 0.1994 vmlinux vmlinux
flush_tlb_page
21 0.1904 ld-2.3.6.so ld-2.3.6.so strcmp
21 0.1904 vmlinux vmlinux
__handle_mm_fault
19 0.1722 vmlinux vmlinux
__ipipe_test_root
18 0.1632 vmlinux vmlinux
do_page_fault
17 0.1541 oprofile oprofile (no symbols)
17 0.1541 vmlinux vmlinux
__ipipe_handle_exception
16 0.1450 vmlinux vmlinux __d_lookup
15 0.1360 vmlinux vmlinux
filemap_nopage
15 0.1360 vmlinux vmlinux
get_page_from_freelist
15 0.1360 vmlinux vmlinux
page_remove_rmap
15 0.1360 vmlinux vmlinux
restore_nocheck_notrace
14 0.1269 vmlinux vmlinux
_atomic_dec_and_lock
14 0.1269 vmlinux vmlinux
copy_page_range
14 0.1269 vmlinux vmlinux
scheduler_tick
12 0.1088 ld-2.3.6.so ld-2.3.6.so
_dl_relocate_object
12 0.1088 vmlinux vmlinux
__find_get_block
12 0.1088 vmlinux vmlinux
__ipipe_unstall_iret_root
--------------------------------------------------------------
> define CONFIG_XENO_OPT_DEBUG and CONFIG_DEBUG_KERNEL/CONFIG_DEBUG_INFO
> to have symbols and all. OProfile is only able to look up virtual
> address
> when debug symbols are present in file. You may have to pass the
> --vmlinux option to opcontrol. Compiling Xenomai like suggested by Jan
> will make your life easier, and will bring you an extra mini speedup
> if you're in that kind of business.
>
> --
> Stephane
>
--
Mathias Koehrer
[EMAIL PROTECTED]
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 39,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help