On Tue, Mar 4, 2014 at 5:00 AM, Kay Sievers <k...@vrfy.org> wrote: > On Mon, Mar 3, 2014 at 11:06 PM, Kay Sievers <k...@vrfy.org> wrote: >> On Mon, Mar 3, 2014 at 10:35 PM, Stefan Westerfeld <ste...@space.twc.de> >> wrote: >>> First of all: I'd really like to see kdbus being used as a general purpose >>> IPC >>> layer; so that developers working on client-/server software will no longer >>> need to create their own homemade IPC by using primitives like sockets or >>> similar. >>> >>> Now kdbus is advertised as high performance IPC solution, and compared to >>> the >>> traditional dbus approach, this may well be true. But are the numbers that >>> >>> $ test-bus-kernel-benchmark chart >>> >>> produces impressive? Or to put it in another way: will developers working on >>> client-/server software happily accept kdbus, because it performs as good >>> as a >>> homemade IPC solution would? Or does kdbus add overhead to a degree that >>> some >>> applications can't accept? >>> >>> To answer this, I wrote a program called "ibench" which passes messages >>> between >>> a client and a server, but instead of using kdbus to do it, it uses >>> traditional >>> pipes. To simulate main loop integration, it uses poll() in cases where a >>> normal >>> client or server application would go into the main loop, and wait to be >>> woken >>> up by filedescriptor activity. >>> >>> Now here are the results I obtained using >>> >>> - AMD Phenom(tm) 9850 Quad-Core Processor >>> - running Fedora 20 64-bit with systemd+kdbus from git >>> - system booted with kdbus and single kernel arguments >>> >>> ============================================================================ >>> *** single cpu performance: . >>> >>> SIZE COPY MEMFD KDBUS-MAX IBENCH SPEEDUP >>> >>> 1 32580 16390 32580 192007 5.89 >>> 2 40870 16960 40870 191730 4.69 >>> 4 40750 16870 40750 190938 4.69 >>> 8 40930 16950 40930 191234 4.67 >>> 16 40290 17150 40290 192041 4.77 >>> 32 40220 18050 40220 191963 4.77 >>> 64 40280 16930 40280 192183 4.77 >>> 128 40530 17440 40530 191649 4.73 >>> 256 40610 17610 40610 190405 4.69 >>> 512 40770 16690 40770 188671 4.63 >>> 1024 40670 17840 40670 185819 4.57 >>> 2048 40510 17780 40510 181050 4.47 >>> 4096 39610 17330 39610 154303 3.90 >>> 8192 38000 16540 38000 121710 3.20 >>> 16384 35900 15050 35900 80921 2.25 >>> 32768 31300 13020 31300 54062 1.73 >>> 65536 24300 9940 24300 27574 1.13 >>> 131072 16730 6820 16730 14886 0.89 >>> 262144 4420 4080 4420 6888 1.56 >>> 524288 1660 2040 2040 2781 1.36 >>> 1048576 800 950 950 1231 1.30 >>> 2097152 310 490 490 475 0.97 >>> 4194304 150 240 240 227 0.95 >>> >>> *** dual cpu performance: . >>> >>> SIZE COPY MEMFD KDBUS-MAX IBENCH SPEEDUP >>> >>> 1 31680 14000 31680 104664 3.30 >>> 2 34960 14290 34960 104926 3.00 >>> 4 34930 14050 34930 104659 3.00 >>> 8 24610 13300 24610 104058 4.23 >>> 16 33840 14740 33840 103800 3.07 >>> 32 33880 14400 33880 103917 3.07 >>> 64 34180 14220 34180 103349 3.02 >>> 128 34540 14260 34540 102622 2.97 >>> 256 37820 14240 37820 102076 2.70 >>> 512 37570 14270 37570 99105 2.64 >>> 1024 37570 14780 37570 96010 2.56 >>> 2048 21640 13330 21640 89602 4.14 >>> 4096 23430 13120 23430 73682 3.14 >>> 8192 34350 12300 34350 59827 1.74 >>> 16384 25180 10560 25180 43808 1.74 >>> 32768 20210 9700 20210 21112 1.04 >>> 65536 15440 7820 15440 10771 0.70 >>> 131072 11630 5670 11630 5775 0.50 >>> 262144 4080 3730 4080 3012 0.74 >>> 524288 1830 2040 2040 1421 0.70 >>> 1048576 810 950 950 631 0.66 >>> 2097152 310 490 490 269 0.55 >>> 4194304 150 240 240 133 0.55 >>> ============================================================================ >>> >>> I ran the tests twice - once using the same cpu for client and server (via >>> cpu >>> affinity) and once using a different cpu for client and server. >>> >>> The SIZE, COPY and MEMFD column are produced by "test-bus-kernel-benchmark >>> chart", the KDBUS-MAX column is the maximum of the COPY and MEMFD column. So >>> this is the effective number of roundtrips that kdbus is able to do at that >>> SIZE. The IBENCH column is the effective number of roundtrips that ibench >>> can >>> do at that SIZE. >>> >>> For many relevant cases, ibench outperforms kdbus (a lot). The SPEEDUP >>> factor >>> indicates how much faster ibench is than kdbus. For small to medium array >>> sizes, ibench always wins (sometimes a lot). For instance passing a 4Kb >>> array >>> from client to server and returning back, ibench is 3.90 times faster if >>> client >>> and server live on the same cpu, and 3.14 times faster if client and server >>> live on different cpus. >>> >>> I'm bringing this up now because it would be sad if kdbus became part of the >>> kernel and universally available, but application developers would still >>> build >>> their own protocols for performance reasons. And some things that may need >>> to >>> be changed to make kdbus run as fast as ibench may be backward incompatible >>> at >>> some level so it may be better to do it now than later on. >>> >>> The program "ibench" I wrote to provide a performance comparision for the >>> "test-bus-kernel-benchmark" program can be downloaded at >>> >>> http://space.twc.de/~stefan/download/ibench.c >>> >>> As a final note, ibench also supports using a socketpair() for communication >>> between client and server via #define at top, but pipe() communication was >>> faster in my test setup. >> >> Pipes are not interesting for general purpose D-Bus IPC; with a pipe >> the memory can "move* from one client to the other, because it is no >> longer needed in the process that fills the pipe. >> >> Pipes are a model out-of-focus for kdbus; using pipes where pipes are >> the appropriate IPC mechanism is just fine, there is no competition, >> and being 5 times slower than simple pipes is a very good number for >> kdbus. >> >> Kdbus is a low-level implementation for D-Bus, not much else, it will >> not try to cover all sorts of specialized IPC use cases. > > There is also a benchmark in the kdbus repo: > ./test/test-kdbus-benchmark > > It is probably better to compare that, as it does not include any of > the higher-level D-Bus overhead from the userspace library, it > operates on the raw kernel kdbus interface and is quite a lot faster > than the test in the systemd repo.
Fixed 8k message sizes in all three tools, with a concurrent CPU setup produces on an Intel i7 2.90GHz: ibench: 55.036 - 128.807 transactions/sec test-kdbus-benchmark: 73.356 - 82.654 transactions/sec test-bus-kernel-benchmark: 23.290 - 27.580 transactions/sec The test-kdbus-benchmark runs the full-featured kdbus, including reliability/integrity checks, header parsing, user accounting, priority queue handling, message/connection metadata handling. Perf output is attached for all three tools, which show that test-bus-kernel-benchmark needs to do a lot of other things not directly related to the raw memory copy performance, and it should not be directly compared. Kay
2.05% test-bus-kernel libc-2.19.90.so [.] _int_malloc 1.84% test-bus-kernel libc-2.19.90.so [.] vfprintf 1.64% test-bus-kernel test-bus-kernel-benchmark [.] bus_message_parse_fields 1.51% test-bus-kernel libc-2.19.90.so [.] memset 1.40% test-bus-kernel libc-2.19.90.so [.] _int_free 1.17% test-bus-kernel [kernel.kallsyms] [k] copy_user_enhanced_fast_string 1.12% test-bus-kernel libc-2.19.90.so [.] malloc_consolidate 1.11% test-bus-kernel [kernel.kallsyms] [k] mutex_lock 1.05% test-bus-kernel test-bus-kernel-benchmark [.] bus_kernel_make_message 1.04% test-bus-kernel [kernel.kallsyms] [k] kfree 0.94% test-bus-kernel libc-2.19.90.so [.] free 0.90% test-bus-kernel libc-2.19.90.so [.] __GI___strcmp_ssse3 0.88% test-bus-kernel test-bus-kernel-benchmark [.] message_extend_fields 0.83% test-bus-kernel [kdbus] [k] kdbus_handle_ioctl 0.83% test-bus-kernel libc-2.19.90.so [.] malloc 0.79% test-bus-kernel [kernel.kallsyms] [k] mutex_unlock 0.76% test-bus-kernel test-bus-kernel-benchmark [.] BUS_MESSAGE_IS_GVARIANT 0.73% test-bus-kernel libc-2.19.90.so [.] __libc_calloc 0.72% test-bus-kernel libc-2.19.90.so [.] memchr 0.71% test-bus-kernel [kdbus] [k] kdbus_conn_kmsg_send 0.67% test-bus-kernel test-bus-kernel-benchmark [.] buffer_peek 0.65% test-bus-kernel [kernel.kallsyms] [k] update_cfs_shares 0.58% test-bus-kernel [kernel.kallsyms] [k] system_call_after_swapgs 0.57% test-bus-kernel test-bus-kernel-benchmark [.] service_name_is_valid 0.56% test-bus-kernel test-bus-kernel-benchmark [.] build_struct_offsets 0.55% test-bus-kernel [kdbus] [k] kdbus_pool_copy 3.95% test-kdbus-benc [kernel.kallsyms] [k] copy_user_enhanced_fast_string 2.25% test-kdbus-benc [kernel.kallsyms] [k] clear_page_c_e 2.14% test-kdbus-benc [kernel.kallsyms] [k] _raw_spin_lock 1.78% test-kdbus-benc [kernel.kallsyms] [k] kfree 1.65% test-kdbus-benc [kernel.kallsyms] [k] mutex_lock 1.55% test-kdbus-benc [kernel.kallsyms] [k] get_page_from_freelist 1.46% test-kdbus-benc [kernel.kallsyms] [k] page_fault 1.40% test-kdbus-benc [kernel.kallsyms] [k] mutex_unlock 1.33% test-kdbus-benc [kernel.kallsyms] [k] memset 1.27% test-kdbus-benc [kernel.kallsyms] [k] shmem_getpage_gfp 1.16% test-kdbus-benc [kernel.kallsyms] [k] find_get_page 1.05% test-kdbus-benc [kernel.kallsyms] [k] memcpy 1.03% test-kdbus-benc [kernel.kallsyms] [k] set_page_dirty 1.00% test-kdbus-benc [kernel.kallsyms] [k] system_call 0.94% test-kdbus-benc [kernel.kallsyms] [k] system_call_after_swapgs 0.93% test-kdbus-benc [kernel.kallsyms] [k] kmem_cache_alloc 0.93% test-kdbus-benc test-kdbus-benchmark [.] timeval_diff 0.90% test-kdbus-benc [kernel.kallsyms] [k] page_waitqueue 0.86% test-kdbus-benc libpthread-2.19.90.so [.] __libc_close 0.83% test-kdbus-benc [kernel.kallsyms] [k] __call_rcu.constprop.63 0.81% test-kdbus-benc [kdbus] [k] kdbus_pool_copy 0.78% test-kdbus-benc [kernel.kallsyms] [k] strlen 0.77% test-kdbus-benc [kernel.kallsyms] [k] unlock_page 0.77% test-kdbus-benc [kdbus] [k] kdbus_meta_append 0.76% test-kdbus-benc [kernel.kallsyms] [k] find_lock_page 0.71% test-kdbus-benc test-kdbus-benchmark [.] handle_echo_reply 0.71% test-kdbus-benc [kernel.kallsyms] [k] __kmalloc 0.67% test-kdbus-benc [kernel.kallsyms] [k] unmap_single_vma 0.67% test-kdbus-benc [kernel.kallsyms] [k] flush_tlb_mm_range 0.65% test-kdbus-benc [kernel.kallsyms] [k] __fget_light 0.63% test-kdbus-benc [kernel.kallsyms] [k] fput 0.63% test-kdbus-benc [kdbus] [k] kdbus_handle_ioctl 16.09% ibench [kernel.kallsyms] [k] copy_user_enhanced_fast_string 4.76% ibench ibench [.] main 2.85% ibench [kernel.kallsyms] [k] pipe_read 2.81% ibench [kernel.kallsyms] [k] _raw_spin_lock_irqsave 2.31% ibench [kernel.kallsyms] [k] update_cfs_shares 2.19% ibench [kernel.kallsyms] [k] native_write_msr_safe 2.03% ibench [kernel.kallsyms] [k] mutex_unlock 2.02% ibench [kernel.kallsyms] [k] resched_task 1.74% ibench [kernel.kallsyms] [k] __schedule 1.67% ibench [kernel.kallsyms] [k] mutex_lock 1.61% ibench [kernel.kallsyms] [k] do_sys_poll 1.57% ibench [kernel.kallsyms] [k] get_page_from_freelist 1.46% ibench [kernel.kallsyms] [k] __fget_light 1.34% ibench [kernel.kallsyms] [k] update_rq_clock.part.83 1.28% ibench [kernel.kallsyms] [k] fsnotify 1.28% ibench [kernel.kallsyms] [k] enqueue_entity 1.28% ibench [kernel.kallsyms] [k] update_curr 1.25% ibench [kernel.kallsyms] [k] system_call 1.25% ibench [kernel.kallsyms] [k] __list_del_entry 1.22% ibench [kernel.kallsyms] [k] _raw_spin_lock 1.22% ibench [kernel.kallsyms] [k] system_call_after_swapgs 1.21% ibench [kernel.kallsyms] [k] task_waking_fair 1.18% ibench [kernel.kallsyms] [k] poll_schedule_timeout 1.17% ibench [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 1.14% ibench [kernel.kallsyms] [k] __alloc_pages_nodemask 1.06% ibench [kernel.kallsyms] [k] enqueue_task_fair 0.95% ibench [kernel.kallsyms] [k] pipe_write 0.88% ibench [kernel.kallsyms] [k] do_sync_read 0.87% ibench [kernel.kallsyms] [k] dequeue_entity
_______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel