Also I ran the 2 CPU example with all tracepoints on and here is what I got:
./scripts/run.py -p qemu_microvm --qemu-path /home/wkozaczuk/projects/qemu/ bin/release/native/x86_64-softmmu/qemu-system-x86_64 --nics 0 -m 64M -c 2 -- block-device-cache writeback,aio=threads -e '/radix -p 2 -r4096' -H --trace \* #In other terminal ./scripts/trace.py extract ./scripts/trace.py summary Collected 38141 samples spanning 100.38 ms Time ranges: CPU 0x01: 0.000000000 - 0.100380272 = 100.38 ms CPU 0x00: 0.083725677 - 0.100295947 = 16.57 ms Tracepoint statistics: name count ---- ----- access_scanner 5145 async_worker_started 1 clear_pte 256 condvar_wait 8 condvar_wake_all 12 memory_free 64 memory_malloc 68 memory_malloc_large 9 memory_malloc_mempool 38 memory_malloc_page 3 memory_page_alloc 9 memory_page_free 262 mutex_lock 5367 mutex_lock_wait 28 mutex_lock_wake 30 mutex_receive_lock 8 mutex_send_lock 8 mutex_unlock 5377 pcpu_worker_sheriff_started 1 pool_alloc 38 pool_free 52 pool_free_same_cpu 52 sched_idle 13 sched_idle_ret 13 sched_ipi 7 sched_load 118 sched_migrate 1 sched_preempt 23 sched_queue 71 sched_sched 101 sched_switch 70 sched_wait 46 sched_wait_ret 43 sched_wake 5197 thread_create 4 timer_cancel 5209 timer_fired 5150 timer_set 5211 vfs_pwritev 13 vfs_pwritev_ret 13 waitqueue_wake_all 1 waitqueue_wake_one 1 ./scripts/trace.py cpu-load 0.000000000 1 0.000000000 1 0.000000000 1 0.000002133 0 0.000002546 1 0.000002987 1 0.000030307 2 0.000030768 2 0.000032967 1 0.000040996 2 0.000041268 2 0.000041831 1 0.000043297 2 0.000043585 2 0.000045945 1 0.000046650 0 0.000290645 1 0.000291750 1 0.000294524 2 0.000295683 1 0.000297979 0 0.000304896 1 0.000305348 1 0.000306794 2 0.000307488 1 0.000309413 0 0.000316847 1 0.000317216 1 0.000318711 2 0.000319370 1 0.000321079 0 0.000327622 1 0.000328009 1 0.000531069 2 0.000532382 1 0.000539432 0 0.000573914 1 0.000574651 1 0.000576728 0 0.000584365 1 0.000584997 1 0.000587286 0 0.000591755 1 0.000592399 1 0.000594461 0 0.000598470 1 0.000599040 1 0.000611236 0 0.000835164 1 0.000836416 1 0.000843416 2 0.000843890 2 0.000845046 1 0.000856800 2 0.000857064 2 0.000858037 1 0.000862489 0 0.086250040 2 0 0.086252051 3 0 0.086253257 2 0 0.086254377 3 0 0.086296669 2 0 0.086297441 3 0 0.086336375 2 0 0.086337328 3 0 0.086337723 2 0 0.086338657 3 0 0.087719001 2 0 0.087720113 3 0 0.089164101 2 0 0.089165836 3 0 0.089166234 2 0 0.089167249 3 0 0.000000000 1 0.000000000 1 0.000000000 1 0.000002133 0 0.000002546 1 0.000002987 1 0.000030307 2 0.000030768 2 0.000032967 1 0.000040996 2 0.000041268 2 0.000041831 1 0.000043297 2 0.000043585 2 0.000045945 1 0.000046650 0 0.000290645 1 0.000291750 1 0.000294524 2 0.000295683 1 0.000297979 0 0.000304896 1 0.000305348 1 0.000306794 2 0.000307488 1 0.000309413 0 0.000316847 1 0.000317216 1 0.000318711 2 0.000319370 1 0.000321079 0 0.000327622 1 0.000328009 1 0.000531069 2 0.000532382 1 0.000539432 0 0.000573914 1 0.000574651 1 0.000576728 0 0.000584365 1 0.000584997 1 0.000587286 0 0.000591755 1 0.000592399 1 0.000594461 0 Is my understanding correct that the load was not spread evenly across both cpus? On Tuesday, February 25, 2020 at 1:09:08 PM UTC-5, Waldek Kozaczuk wrote: > So I did try to build and run the radix test (please note my Ubuntu laptop > has only 4 cores and hyper-threading disabled). BTW it seems that > particular benchmark does not need read-write FS so I used ROFS): > > ./scripts/manifest_from_host.sh -w ../splash2-posix/kernels/radix/radix && > ./scripts/*build* fs=rofs --append-manifest -j4 > > Linux host 1 cpu: > > ./radix -p 1 -r4096 > > > Integer Radix Sort > > 262144 Keys > > 1 Processors > > Radix = 4096 > > Max key = 524288 > > > > PROCESS STATISTICS > > Total Rank Sort > > Proc Time Time Time > > 0 7335 2568 4765 > > > TIMING INFORMATION > > Start time : 1582652832386234 > > Initialization finish time : 1582652832444092 > > Overall finish time : 1582652832451427 > > Total time with initialization : 65193 > > Total time without initialization : 7335 > > > Linux host 2 cpus: > ./radix -p 2 -r4096 > > Integer Radix Sort > 262144 Keys > 2 Processors > Radix = 4096 > Max key = 524288 > > > PROCESS STATISTICS > Total Rank Sort > Proc Time Time Time > 0 4325 1571 2704 > > TIMING INFORMATION > Start time : 1582652821496771 > Initialization finish time : 1582652821531279 > Overall finish time : 1582652821535604 > Total time with initialization : 38833 > Total time without initialization : 4325 > > host 4 cpus: > ./radix -p 4 -r4096 > > Integer Radix Sort > 262144 Keys > 4 Processors > Radix = 4096 > Max key = 524288 > > > PROCESS STATISTICS > Total Rank Sort > Proc Time Time Time > 0 2599 1077 1470 > > TIMING INFORMATION > Start time : 1582653906150199 > Initialization finish time : 1582653906171932 > Overall finish time : 1582653906174531 > Total time with initialization : 24332 > Total time without initialization : 2599 > > > OSv 1 CPU > ./scripts/run.py -p qemu_microvm --qemu-path > /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64 > > --nics 0 --nogdb -m 64M -c 1 --block-device-cache writeback,aio=threads -e > '/radix -p 1 -r4096' > OSv v0.54.0-119-g4ee4b788 > Booted up in 3.75 ms > Cmdline: /radix -p 1 -r4096 > > Integer Radix Sort > 262144 Keys > 1 Processors > Radix = 4096 > Max key = 524288 > > > PROCESS STATISTICS > Total Rank Sort > Proc Time Time Time > 0 6060 2002 4049 > > TIMING INFORMATION > Start time : 1582652845450708 > Initialization finish time : 1582652845500348 > Overall finish time : 1582652845506408 > Total time with initialization : 55700 > Total time without initialization : 6060 > > OSv 2 CPUs: > ./scripts/run.py -p qemu_microvm --qemu-path > /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64 > > --nics 0 --nogdb -m 64M -c 2 --block-device-cache writeback,aio=threads -e > '/radix -p 2 -r4096' > OSv v0.54.0-119-g4ee4b788 > Booted up in 4.81 ms > Cmdline: /radix -p 2 -r4096 > > Integer Radix Sort > 262144 Keys > 2 Processors > Radix = 4096 > Max key = 524288 > > > PROCESS STATISTICS > Total Rank Sort > Proc Time Time Time > 0 5797 1702 4089 > > TIMING INFORMATION > Start time : 1582653305076852 > Initialization finish time : 1582653305129792 > Overall finish time : 1582653305135589 > Total time with initialization : 58737 > Total time without initialization : 5797 > > OSv 4 cpus > ./scripts/run.py -p qemu_microvm --qemu-path > /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64 > > --nics 0 --nogdb -m 64M -c 4 --block-device-cache writeback,aio=threads -e > '/radix -p 4 -r4096' > OSv v0.54.0-119-g4ee4b788 > Booted up in 5.26 ms > Cmdline: /radix -p 4 -r4096 > > Integer Radix Sort > 262144 Keys > 4 Processors > Radix = 4096 > Max key = 524288 > > > PROCESS STATISTICS > Total Rank Sort > Proc Time Time Time > 0 6498 2393 4099 > > TIMING INFORMATION > Start time : 1582653946823458 > Initialization finish time : 1582653946875522 > Overall finish time : 1582653946882020 > Total time with initialization : 58562 > Total time without initialization : 6498 > > > As you can see with single CPU the benchmark seems to be 10-15 % faster. > But with two and four CPUs OSv barely sees any improvements, whereas on > host the app runs 40% faster. So OSv does not seem to scale at all > (somebody mentioned it used to) so it would be nice to understand why. OSv > has many sophisticated tracing tools that can help here - > https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py > > Waldek > > BTW1. I tried to bump size of the matrix to something higher but with > -r8192 the app crashes on both Linux and OSv. > BTW2. It would be interestingly to compare OSv with Linux guest (vs host). > > On Tuesday, February 25, 2020 at 10:05:08 AM UTC-5, [email protected] > wrote: >> >> Thanks for the response! I will get this information to you after work >> with the few modifications you recommended! The application is essentially >> just testing CPU performance using multiprocessing. Nothing too fancy about >> it! The code I am using can be found at: >> >> https://www.github.com/ProfessorWest/splash2-posix >> >> In side of the kernels folder located at radix.c and I change the problem >> size to 16,777,206. >> >> If you happen to examine the code, do ignore the lacking cleanness of the >> code...we just smashed everything into one file for simplicity on our end. >> (Running the same code across all platforms being benchmarked). >> >> On Tuesday, February 25, 2020 at 8:52:48 AM UTC-5, Waldek Kozaczuk wrote: >>> >>> Hi, >>> >>> I am quite late to the party :-) Could you run OSv on single CPU with >>> verbose on (add -V to run.py) and send us the output so we can see a little >>> more what is happening. To disable networking you need to add '--nics=0' >>> (for all 50 options run.py supports run it with '--help'). I am not >>> familiar with that benchmark but I wonder if it needs read-write FS (ZFS in >>> OSv case), if not that you can build OSv images with read-only FS >>> (./scripts/build fs=rofs). Lastly, you can improve boot time by running OSv >>> on firecracker ( >>> https://github.com/cloudius-systems/osv/wiki/Running-OSv-on-Firecracker) >>> or on QEMU microvm (-p qemu_imcrovm - requires QEMU >= 4.1), with read-only >>> FS on both OSv should boot within 5ms, ZFS within 40ms). Last thing - >>> writing to console on OSv can be quite slow, I wonder how much this >>> benchmark does it. >>> >>> While I definitely agree with my colleague Nadav, where he essentially >>> says do not use OSv if the raw performance matters (database for example) >>> and Linux will beat it no matter what, OSv may have advantages in use cases >>> where pure performance does not matter (it still needs to be reasonable). I >>> think the best use cases for OSv are serverless or stateless apps >>> (microservices or web assembly) running on single CPU where all state >>> management is delegated to a remote persistent store (most custom-built >>> business apps are like that) and where high isolation matters. >>> >>> Relatedly, I think it might be more useful to think of OSv (and other >>> unikernels) as highly isolated processes. To that end, we still need to >>> optimize memory overhead (stacks for example) and improve virtio-fs support >>> (in this case to run the app on OSv you do not need full image, just kernel >>> to run a Linux app). >>> >>> Also, I think the lack of good tooling in unikernel space affects their >>> adoption. Compare it with docker - build, push, pull, run. OSv has its >>> equivalent - capstan - but at this point, we do not really have a registry >>> where one can pull the latest OSv kernel or push, pull images. Trying to >>> run an app on OSv is still quite painful to a business app developer - it >>> probably takes at least 30 minutes or so. >>> >>> Lastly, I think one of the main reasons for Docker adoption, was >>> repeatability (besides its fantastic ease of use) where one can create an >>> image and expect it to run almost the same way in production. Imagine you >>> can achieve that with OSv. >>> >>> Waldek >>> >>> On Tuesday, February 25, 2020 at 7:00:16 AM UTC-5, [email protected] >>> wrote: >>>> >>>> Very well explained. Thank you for that. That does make perfect sense >>>> as well. >>> >>> -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/0b85eefa-dee1-47b0-9fa9-b043bd61d67b%40googlegroups.com.
