Re: [osv-dev] Benchmarking OSv

Waldek Kozaczuk Tue, 25 Feb 2020 10:31:54 -0800

Also I ran the 2 CPU example with all tracepoints on and here is what I got:


./scripts/run.py -p qemu_microvm --qemu-path /home/wkozaczuk/projects/qemu/
bin/release/native/x86_64-softmmu/qemu-system-x86_64 --nics 0 -m 64M -c 2 --
block-device-cache writeback,aio=threads -e '/radix -p 2 -r4096' -H --trace 
\*

#In other terminal
./scripts/trace.py extract
./scripts/trace.py summary
Collected 38141 samples spanning 100.38 ms

Time ranges:

  CPU 0x01:  0.000000000 -  0.100380272 =  100.38 ms
  CPU 0x00:  0.083725677 -  0.100295947 =   16.57 ms

Tracepoint statistics:

  name                           count
  ----                           -----
  access_scanner                  5145
  async_worker_started               1
  clear_pte                        256
  condvar_wait                       8
  condvar_wake_all                  12
  memory_free                       64
  memory_malloc                     68
  memory_malloc_large                9
  memory_malloc_mempool             38
  memory_malloc_page                 3
  memory_page_alloc                  9
  memory_page_free                 262
  mutex_lock                      5367
  mutex_lock_wait                   28
  mutex_lock_wake                   30
  mutex_receive_lock                 8
  mutex_send_lock                    8
  mutex_unlock                    5377
  pcpu_worker_sheriff_started        1
  pool_alloc                        38
  pool_free                         52
  pool_free_same_cpu                52
  sched_idle                        13
  sched_idle_ret                    13
  sched_ipi                          7
  sched_load                       118
  sched_migrate                      1
  sched_preempt                     23
  sched_queue                       71
  sched_sched                      101
  sched_switch                      70
  sched_wait                        46
  sched_wait_ret                    43
  sched_wake                      5197
  thread_create                      4
  timer_cancel                    5209
  timer_fired                     5150
  timer_set                       5211
  vfs_pwritev                       13
  vfs_pwritev_ret                   13
  waitqueue_wake_all                 1
  waitqueue_wake_one                 1

./scripts/trace.py cpu-load
 0.000000000             1
 0.000000000             1
 0.000000000             1
 0.000002133             0
 0.000002546             1
 0.000002987             1
 0.000030307             2
 0.000030768             2
 0.000032967             1
 0.000040996             2
 0.000041268             2
 0.000041831             1
 0.000043297             2
 0.000043585             2
 0.000045945             1
 0.000046650             0
 0.000290645             1
 0.000291750             1
 0.000294524             2
 0.000295683             1
 0.000297979             0
 0.000304896             1
 0.000305348             1
 0.000306794             2
 0.000307488             1
 0.000309413             0
 0.000316847             1
 0.000317216             1
 0.000318711             2
 0.000319370             1
 0.000321079             0
 0.000327622             1
 0.000328009             1
 0.000531069             2
 0.000532382             1
 0.000539432             0
 0.000573914             1
 0.000574651             1
 0.000576728             0
 0.000584365             1
 0.000584997             1
 0.000587286             0
 0.000591755             1
 0.000592399             1
 0.000594461             0
 0.000598470             1
 0.000599040             1
 0.000611236             0
 0.000835164             1
 0.000836416             1
 0.000843416             2
 0.000843890             2
 0.000845046             1
 0.000856800             2
 0.000857064             2
 0.000858037             1
 0.000862489             0
 0.086250040          2  0
 0.086252051          3  0
 0.086253257          2  0
 0.086254377          3  0
 0.086296669          2  0
 0.086297441          3  0
 0.086336375          2  0
 0.086337328          3  0
 0.086337723          2  0
 0.086338657          3  0
 0.087719001          2  0
 0.087720113          3  0
 0.089164101          2  0
 0.089165836          3  0
 0.089166234          2  0
 0.089167249          3  0
 0.000000000             1
 0.000000000             1
 0.000000000             1
 0.000002133             0
 0.000002546             1
 0.000002987             1
 0.000030307             2
 0.000030768             2
 0.000032967             1
 0.000040996             2
 0.000041268             2
 0.000041831             1
 0.000043297             2
 0.000043585             2
 0.000045945             1
 0.000046650             0
 0.000290645             1
 0.000291750             1
 0.000294524             2
 0.000295683             1
 0.000297979             0
 0.000304896             1
 0.000305348             1
 0.000306794             2
 0.000307488             1
 0.000309413             0
 0.000316847             1
 0.000317216             1
 0.000318711             2
 0.000319370             1
 0.000321079             0
 0.000327622             1
 0.000328009             1
 0.000531069             2
 0.000532382             1
 0.000539432             0
 0.000573914             1
 0.000574651             1
 0.000576728             0
 0.000584365             1
 0.000584997             1
 0.000587286             0
 0.000591755             1
 0.000592399             1
 0.000594461             0

Is my understanding correct that the load was not spread evenly across both 
cpus?

On Tuesday, February 25, 2020 at 1:09:08 PM UTC-5, Waldek Kozaczuk wrote:

> So I did try to build and run the radix test (please note my Ubuntu laptop 
> has only 4 cores and hyper-threading disabled). BTW it seems that 
> particular benchmark does not need read-write FS so I used ROFS):
>
> ./scripts/manifest_from_host.sh -w ../splash2-posix/kernels/radix/radix && 
> ./scripts/*build* fs=rofs --append-manifest -j4
>
> Linux host 1 cpu:
>
> ./radix -p 1 -r4096
>
>
> Integer Radix Sort
>
>     262144 Keys
>
>     1 Processors
>
>     Radix = 4096
>
>     Max key = 524288
>
>
>
>                  PROCESS STATISTICS
>
>               Total            Rank            Sort
>
> Proc          Time             Time            Time
>
>    0           7335            2568            4765
>
>
>                  TIMING INFORMATION
>
> Start time                        : 1582652832386234
>
> Initialization finish time        : 1582652832444092
>
> Overall finish time               : 1582652832451427
>
> Total time with initialization    :            65193
>
> Total time without initialization :             7335
>
>
> Linux host 2 cpus:
> ./radix -p 2 -r4096
>
> Integer Radix Sort
>      262144 Keys
>      2 Processors
>      Radix = 4096
>      Max key = 524288
>
>
>                  PROCESS STATISTICS
>                Total            Rank            Sort
>  Proc          Time             Time            Time
>     0           4325            1571            2704
>
>                  TIMING INFORMATION
> Start time                        : 1582652821496771
> Initialization finish time        : 1582652821531279
> Overall finish time               : 1582652821535604
> Total time with initialization    :            38833
> Total time without initialization :             4325
>
> host 4 cpus:
> ./radix -p 4 -r4096
>
> Integer Radix Sort
>      262144 Keys
>      4 Processors
>      Radix = 4096
>      Max key = 524288
>
>
>                  PROCESS STATISTICS
>                Total            Rank            Sort
>  Proc          Time             Time            Time
>     0           2599            1077            1470
>
>                  TIMING INFORMATION
> Start time                        : 1582653906150199
> Initialization finish time        : 1582653906171932
> Overall finish time               : 1582653906174531
> Total time with initialization    :            24332
> Total time without initialization :             2599
>
>
> OSv 1 CPU
> ./scripts/run.py -p qemu_microvm --qemu-path 
> /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64
>  
> --nics 0 --nogdb -m 64M -c 1 --block-device-cache writeback,aio=threads -e 
> '/radix -p 1 -r4096'
> OSv v0.54.0-119-g4ee4b788
> Booted up in 3.75 ms
> Cmdline: /radix -p 1 -r4096 
>
> Integer Radix Sort
>      262144 Keys
>      1 Processors
>      Radix = 4096
>      Max key = 524288
>
>
>                  PROCESS STATISTICS
>                Total            Rank            Sort
>  Proc          Time             Time            Time
>     0           6060            2002            4049
>
>                  TIMING INFORMATION
> Start time                        : 1582652845450708
> Initialization finish time        : 1582652845500348
> Overall finish time               : 1582652845506408
> Total time with initialization    :            55700
> Total time without initialization :             6060
>
> OSv 2 CPUs:
> ./scripts/run.py -p qemu_microvm --qemu-path 
> /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64
>  
> --nics 0 --nogdb -m 64M -c 2 --block-device-cache writeback,aio=threads -e 
> '/radix -p 2 -r4096'
> OSv v0.54.0-119-g4ee4b788
> Booted up in 4.81 ms
> Cmdline: /radix -p 2 -r4096 
>
> Integer Radix Sort
>      262144 Keys
>      2 Processors
>      Radix = 4096
>      Max key = 524288
>
>
>                  PROCESS STATISTICS
>                Total            Rank            Sort
>  Proc          Time             Time            Time
>     0           5797            1702            4089
>
>                  TIMING INFORMATION
> Start time                        : 1582653305076852
> Initialization finish time        : 1582653305129792
> Overall finish time               : 1582653305135589
> Total time with initialization    :            58737
> Total time without initialization :             5797
>
> OSv 4 cpus
> ./scripts/run.py -p qemu_microvm --qemu-path 
> /home/wkozaczuk/projects/qemu/bin/release/native/x86_64-softmmu/qemu-system-x86_64
>  
> --nics 0 --nogdb -m 64M -c 4 --block-device-cache writeback,aio=threads -e 
> '/radix -p 4 -r4096'
> OSv v0.54.0-119-g4ee4b788
> Booted up in 5.26 ms
> Cmdline: /radix -p 4 -r4096 
>
> Integer Radix Sort
>      262144 Keys
>      4 Processors
>      Radix = 4096
>      Max key = 524288
>
>
>                  PROCESS STATISTICS
>                Total            Rank            Sort
>  Proc          Time             Time            Time
>     0           6498            2393            4099
>
>                  TIMING INFORMATION
> Start time                        : 1582653946823458
> Initialization finish time        : 1582653946875522
> Overall finish time               : 1582653946882020
> Total time with initialization    :            58562
> Total time without initialization :             6498
>
>
> As you can see with single CPU the benchmark seems to be 10-15 % faster. 
> But with two and four CPUs OSv barely sees any improvements, whereas on 
> host the app runs 40% faster. So OSv does not seem to scale at all 
> (somebody mentioned it used to) so it would be nice to understand why. OSv 
> has many sophisticated tracing tools that can help here - 
> https://github.com/cloudius-systems/osv/wiki/Trace-analysis-using-trace.py
>
> Waldek
>
> BTW1. I tried to bump size of the matrix to something higher but with 
> -r8192 the app crashes on both Linux and OSv.
> BTW2. It would be interestingly to compare OSv with Linux guest (vs host).
>
> On Tuesday, February 25, 2020 at 10:05:08 AM UTC-5, [email protected] 
> wrote:
>>
>> Thanks for the response! I will get this information to you after work 
>> with the few modifications you recommended! The application is essentially 
>> just testing CPU performance using multiprocessing. Nothing too fancy about 
>> it! The code I am using can be found at:
>>
>> https://www.github.com/ProfessorWest/splash2-posix
>>
>> In side of the kernels folder located at radix.c and I change the problem 
>> size to 16,777,206. 
>>
>> If you happen to examine the code, do ignore the lacking cleanness of the 
>> code...we just smashed everything into one file for simplicity on our end. 
>> (Running the same code across all platforms being benchmarked). 
>>
>> On Tuesday, February 25, 2020 at 8:52:48 AM UTC-5, Waldek Kozaczuk wrote:
>>>
>>> Hi,
>>>
>>> I am quite late to the party :-) Could you run OSv on single CPU with 
>>> verbose on (add -V to run.py) and send us the output so we can see a little 
>>> more what is happening. To disable networking you need to add '--nics=0' 
>>> (for all 50 options run.py supports run it with '--help'). I am not 
>>> familiar with that benchmark but I wonder if it needs read-write FS (ZFS in 
>>> OSv case), if not that you can build OSv images with read-only FS 
>>> (./scripts/build fs=rofs). Lastly, you can improve boot time by running OSv 
>>> on firecracker (
>>> https://github.com/cloudius-systems/osv/wiki/Running-OSv-on-Firecracker) 
>>> or on QEMU microvm (-p qemu_imcrovm - requires QEMU >= 4.1), with read-only 
>>> FS on both OSv should boot within 5ms, ZFS within 40ms). Last thing - 
>>> writing to console on OSv can be quite slow, I wonder how much this 
>>> benchmark does it.
>>>
>>> While I definitely agree with my colleague Nadav, where he essentially 
>>> says do not use OSv if the raw performance matters (database for example) 
>>> and Linux will beat it no matter what, OSv may have advantages in use cases 
>>> where pure performance does not matter (it still needs to be reasonable). I 
>>> think the best use cases for OSv are serverless or stateless apps 
>>> (microservices or web assembly) running on single CPU where all state 
>>> management is delegated to a remote persistent store (most custom-built 
>>> business apps are like that) and where high isolation matters. 
>>>
>>> Relatedly, I think it might be more useful to think of OSv (and other 
>>> unikernels) as highly isolated processes. To that end, we still need to 
>>> optimize memory overhead (stacks for example) and improve virtio-fs support 
>>> (in this case to run the app on OSv you do not need full image, just kernel 
>>> to run a Linux app).
>>>
>>> Also, I think the lack of good tooling in unikernel space affects their 
>>> adoption. Compare it with docker - build, push, pull, run. OSv has its 
>>> equivalent - capstan - but at this point, we do not really have a registry 
>>> where one can pull the latest OSv kernel or push, pull images. Trying to 
>>> run an app on OSv is still quite painful to a business app developer - it 
>>> probably takes at least 30 minutes or so. 
>>>
>>> Lastly, I think one of the main reasons for Docker adoption, was 
>>> repeatability (besides its fantastic ease of use) where one can create an 
>>> image and expect it to run almost the same way in production. Imagine you 
>>> can achieve that with OSv. 
>>>
>>> Waldek
>>>
>>> On Tuesday, February 25, 2020 at 7:00:16 AM UTC-5, [email protected] 
>>> wrote:
>>>>
>>>> Very well explained. Thank you for that. That does make perfect sense 
>>>> as well. 
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/0b85eefa-dee1-47b0-9fa9-b043bd61d67b%40googlegroups.com.

Re: [osv-dev] Benchmarking OSv

Reply via email to