Re: OSv runs on Docker's Hyperkit under 100ms

2018-04-22 Thread Nadav Har'El
On Fri, Apr 20, 2018 at 4:07 AM, Waldek Kozaczuk 
wrote:

> To make SMP working I had to hack OSv to pass fee00900 when
> enabling APIC for first CPU and fee00800 for all other CPUs. It
> looks like (based on source code of hyperkit) it requires that the APIC
> registers memory area base address passed in when enabling it needs to be
> the same as when it was read. But why is it different for each CPU? It
> looks like QEMU/KVM, VMware, XEN hypervisors OSv runs on do not have this
> requirement. Unfortunately I am not very familiar with APIC so if anybody
> can enlighten me I would appreciate it.
>
> I was also wondering if anybody knows what is the reason behind this logic:
>
> void apic_driver::read_base()
> {
> static constexpr u64 base_addr_mask = 0xFF000;
> _apic_base = rdmsr(msr::IA32_APIC_BASE) & base_addr_mask;
> }
>
>
> Why are we masking with 0xFF000? Based on the logs from OSv when
> running on hyperkit this logic effectively overwrites original APIC base
> address as fee0:
>
> ### apic_driver:read_base() - read base as  : fee00900
>
> ### apic_driver:read_base() - saved base as : fee0
>
> ### xapic:enable() - enabling with base as  : fee00900
>
> So in case of hyperkit when we pass fee0 instead of 
> fee00900
> (which is what hyperkit returned in read_base()) it rejects it. However the
> same logic works just fine with other hypervisors.
>

First, to explain the various addresses you saw and what these "800" and
"900" mean:

The last 12 bits of this MSR are *not* part of the address (which is
supposed to be page aligned, i.e., last 12 bits are zero), but rather,
various other flags.
Of particular interest are the bits 0x800 which means "enabled", and 0x100
which means "bootstrap". That latter should be set only on the first CPU -
which explains why you saw 0x900 on the first CPU and 0x800 on all others.

The bug is, as you suspected, probably in

void xapic::enable()
{
wrmsr(msr::IA32_APIC_BASE, _apic_base | APIC_BASE_GLOBAL_ENABLE);
software_enable();
}

After _apic_base was previously stripped from all the bit flags, this code
adds back just one. But apparently hyperkit doesn't like losing the BSP
("bootstrap") flag on the first APIC, for some reason. And it shouldn't
have. It's a bug we removed it. I think this code should be changed to do:

wrmsr(msr::IA32_APIC_BASE, rdmsr(msr::IA32_APIC_BASE) |
APIC_BASE_GLOBAL_ENABLE);

And not use "_apic_base" at all.

I think that x2apic::enable() should be changed similarly to rdmsr instead
of using _apic_base.

If you could check these fixes and if they work, send a patch, that would
be great
Thanks.



>
> Waldek
>
> PS. Adding an article link I found abound APIC -
> https://wiki.osdev.org/APIC
>
> On Tuesday, April 17, 2018 at 12:08:42 PM UTC-4, Waldek Kozaczuk wrote:
>>
>> I forgot to add that to achieve this kind of timing I built image with
>> ROFS:
>>
>> ./scripts/build image=native-example fs=rofs
>>
>> On Monday, April 16, 2018 at 5:25:37 PM UTC-4, Waldek Kozaczuk wrote:
>>>
>>> I have never tried brew to install it but possibly it can work.
>>>
>>> I cloned it directly from https://github.com/moby/hyperkit and then
>>> built locally. That way I could put all kinds of debug statements to figure
>>> out why OSv was not working. Feel free to use my fork of hyperkit -
>>> https://github.com/wkozaczuk/hyperkit/tree/osv - which has ton of debug
>>> statements.
>>>
>>> To build hyperkit locally you need to install developer tools from Apple
>>> that includes gcc, make, git, etc. I believe if you open terminal and type
>>> 'gcc' it will ask you if you want to install developer tools. And then git
>>> clone, make and you have you own hyperkit under build subdirectory.
>>>
>>> I did not have to modify hyperkit to make it work with OSv. All my
>>> modifications are on OSv multiboot branch - https://github.com/wkozaczuk
>>> /osv/tree/multiboot. Beside tons of debug statements added all over the
>>> place I added multiboot_header.asm and multiboot.S with some hard-coded
>>> values to pass correct memory info to OSv. I also modified Makefile,
>>> lzloader.ld and disabled assert in hpet.cc (
>>> https://github.com/wkozaczuk/osv/blob/multiboot/drivers/hpet.cc#L55) -
>>> hyperkit does not seem to support 64-bit counters. Finally I hacked
>>> arch/x64/apic.cc to properly read and then pass APIC memory base offset
>>> when enabling APIC - otherwise interrupts would not work. I do not not
>>> understand why original apic logic in OSv did not work.
>>>
>>> To run it have a script like this:
>>>
>>> IMAGE=$1
>>> DISK=$2
>>>
>>> build/hyperkit -A -m 512M -s 0:0,hostbridge \
>>>   -s 31,lpc \
>>>   -l com1,stdio \
>>>   -s 4,virtio-blk,$DISK \
>>>   -f multiboot,$IMAGE
>>>
>>> where IMAGE is lzloader.elf and IMAGE is build/release/usr.img converted
>>> to raw.
>>>
>>> Enjoy!
>>>
>>> Waldek
>>>
>>> PS. I also had to hard code cmdline in loader.cc. I think it should 

Re: OSv runs on Docker's Hyperkit under 100ms

2018-04-19 Thread Waldek Kozaczuk
I have run more tests and here are some more elaborate tests I have 
successfully run:

   - native example with ZFS
  - 
  
  OSv v0.24-516-gc872202
  
  console_multiplexer::console_multiplexer()
  
  acpi::early_init()
  
  interrupt_descriptor_table initialized
  
  ### apic_driver: Read base fee00900
  
  ### apic_driver: _apic_base fee0
  
  Hello from C code
  
  
  real0m0.165s
  
  user0m0.027s
  
  sys 0m0.141s
  - java with ROFS
  - 
  
  OSv v0.24-519-g94a7640
  
  console_multiplexer::console_multiplexer()
  
  acpi::early_init()
  
  interrupt_descriptor_table initialized
  
  ### apic_driver:read_base() - read base as  : fee00900
  
  ### apic_driver:read_base() - saved base as : fee0
  
  ### apic_driver:enable() - enabling with base as : fee00900
  
  1 CPUs detected
  
  Firmware vendor: BHYVE
  
  bsd: initializing - done
  
  VFS: mounting ramfs at /
  
  VFS: mounting devfs at /dev
  
  net: initializing - done
  
  ---> blk::blk - enabled MSI 1
  
  device_register(): registering device vblk0
  
  device_register(): registering device vblk0.1
  
  virtio-blk: Add blk device instances 0 as vblk0, devsize=40128000
  
  device_register(): registering device console
  
  device_register(): registering device null
  
  random: intel drng, rdrand registered as a source.
  
  device_register(): registering device random
  
  device_register(): registering device urandom
  
  random:  initialized
  
  VFS: unmounting /dev
  
  VFS: mounting rofs at /rofs
  
  [rofs] device vblk0.1 opened!
  
  [rofs] read superblock!
  
  [rofs] read structure blocks!
  
  VFS: mounting devfs at /dev
  
  VFS: mounting procfs at /proc
  
  VFS: mounting ramfs at /tmp
  
  java.so: Starting JVM app using: 
  io/osv/nonisolated/RunNonIsolatedJvmApp
  
  java.so: Setting Java system classloader to 
  NonIsolatingOsvSystemClassLoader
  
  random: device unblocked.
  
  Hello, World!
  
  VFS: unmounting /dev
  
  VFS: unmounting /proc
  
  VFS: unmounting /
  
  ROFS: spent 42.90 ms reading from disk
  
  ROFS: read 35323 512-byte blocks from disk
  
  ROFS: allocated 35568 512-byte blocks of cache memory
  
  ROFS: hit ratio is 83.34%
  
  Powering off.
  
  
  real0m0.338s
  
  user0m0.035s
  
  sys 0m0.298s
  - httpserver to valdate networking
  - 
  
  OSv v0.24-520-gf577249
  
  console_multiplexer::console_multiplexer()
  
  acpi::early_init()
  
  smp_init()
  
  interrupt_descriptor_table initialized
  
  ### Before create_apic_driver
  
  ### apic_driver:read_base() - read base as  : fee00900
  
  ### apic_driver:read_base() - saved base as : fee0
  
  ### xapic:enable() - enabling with base as  : fee00900
  
  1 CPUs detected
  
  Firmware vendor: BHYVE
  
  smp_launch() -> DONE
  
  bsd: initializing - done
  
  VFS: mounting ramfs at /
  
  VFS: mounting devfs at /dev
  
  net: initializing - done
  
  eth0: ethernet address: d2:e2:e3:b0:2e:38
  
  ---> blk::blk - enabled MSI 1
  
  device_register(): registering device vblk0
  
  device_register(): registering device vblk0.1
  
  virtio-blk: Add blk device instances 0 as vblk0, devsize=17775616
  
  device_register(): registering device console
  
  device_register(): registering device null
  
  random: intel drng, rdrand registered as a source.
  
  device_register(): registering device random
  
  device_register(): registering device urandom
  
  random:  initialized
  
  VFS: unmounting /dev
  
  VFS: mounting rofs at /rofs
  
  [rofs] device vblk0.1 opened!
  
  [rofs] read superblock!
  
  [rofs] read structure blocks!
  
  VFS: mounting devfs at /dev
  
  VFS: mounting procfs at /proc
  
  VFS: mounting ramfs at /tmp
  
  [I/27 dhcp]: Broadcasting DHCPDISCOVER message with xid: [1012600569]
  
  [I/27 dhcp]: Waiting for IP...
  
  random: device unblocked.
  
  [I/27 dhcp]: Broadcasting DHCPDISCOVER message with xid: [2082401196]
  
  [I/27 dhcp]: Waiting for IP...
  
  [I/33 dhcp]: Received DHCPOFFER message from DHCP server: 
  192.168.64.1 regarding offerred IP address: 192.168.64.4
  
  [I/33 dhcp]: Broadcasting DHCPREQUEST message 

Re: OSv runs on Docker's Hyperkit under 100ms

2018-04-17 Thread Waldek Kozaczuk
I forgot to add that to achieve this kind of timing I built image with ROFS:

./scripts/build image=native-example fs=rofs

On Monday, April 16, 2018 at 5:25:37 PM UTC-4, Waldek Kozaczuk wrote:
>
> I have never tried brew to install it but possibly it can work.
>
> I cloned it directly from https://github.com/moby/hyperkit and then built 
> locally. That way I could put all kinds of debug statements to figure out 
> why OSv was not working. Feel free to use my fork of hyperkit - 
> https://github.com/wkozaczuk/hyperkit/tree/osv - which has ton of debug 
> statements.
>
> To build hyperkit locally you need to install developer tools from Apple 
> that includes gcc, make, git, etc. I believe if you open terminal and type 
> 'gcc' it will ask you if you want to install developer tools. And then git 
> clone, make and you have you own hyperkit under build subdirectory.
>
> I did not have to modify hyperkit to make it work with OSv. All my 
> modifications are on OSv multiboot branch - 
> https://github.com/wkozaczuk/osv/tree/multiboot. Beside tons of debug 
> statements added all over the place I added multiboot_header.asm 
> and multiboot.S with some hard-coded values to pass correct memory info to 
> OSv. I also modified Makefile, lzloader.ld and disabled assert in hpet.cc (
> https://github.com/wkozaczuk/osv/blob/multiboot/drivers/hpet.cc#L55) - 
> hyperkit does not seem to support 64-bit counters. Finally I hacked 
> arch/x64/apic.cc to properly read and then pass APIC memory base offset 
> when enabling APIC - otherwise interrupts would not work. I do not not 
> understand why original apic logic in OSv did not work. 
>
> To run it have a script like this:
>
> IMAGE=$1
> DISK=$2
>
> build/hyperkit -A -m 512M -s 0:0,hostbridge \
>   -s 31,lpc \
>   -l com1,stdio \
>   -s 4,virtio-blk,$DISK \
>   -f multiboot,$IMAGE
>
> where IMAGE is lzloader.elf and IMAGE is build/release/usr.img converted 
> to raw.
>
> Enjoy!
>
> Waldek
>
> PS. I also had to hard code cmdline in loader.cc. I think it should come 
> from multiboot.
>
> On Sunday, April 15, 2018 at 8:36:54 PM UTC-4, Asias He wrote:
>>
>>
>>
>> On Wed, Apr 11, 2018 at 3:29 AM, Waldek Kozaczuk  
>> wrote:
>>
>>> Last week I have been trying to hack OSv to run on hyperkit and finally 
>>> I managed to execute native hello world example with ROFS. 
>>>
>>> Here is a timing on hyperkit/OSX (the bootchart does not work on 
>>> hyperkit due to not granular enough timer):
>>>
>>> OSv v0.24-516-gc872202
>>> Hello from C code
>>>
>>> *real 0m0.075s *
>>> *user 0m0.012s *
>>> *sys 0m0.058s*
>>>
>>> command to boot it (please note that I hacked the lzloader ELF to 
>>> support multiboot):
>>>
>>> hyperkit -A -m 512M \
>>>   -s 0:0,hostbridge \
>>>   -s 31,lpc \
>>>   -l com1,stdio \
>>>   -s 4,virtio-blk,test.img \
>>>   -f multiboot,lzloader.elf
>>>
>>
>> Impressive! How hard is it to setup hyperkit on osx, just brew install?
>>
>>  
>>
>>>
>>> Here is a timing on QEMU/KVM on Linux (same hardware - my laptop is 
>>> setup to triple-boot Ubuntu 16/Mac OSX and Windows):
>>>
>>> OSv v0.24-510-g451dc6d
>>> 4 CPUs detected
>>> Firmware vendor: SeaBIOS
>>> bsd: initializing - done
>>> VFS: mounting ramfs at /
>>> VFS: mounting devfs at /dev
>>> net: initializing - done
>>> vga: Add VGA device instance
>>> virtio-blk: Add blk device instances 0 as vblk0, devsize=8520192
>>> random: intel drng, rdrand registered as a source.
>>> random:  initialized
>>> VFS: unmounting /dev
>>> VFS: mounting rofs at /rofs
>>> VFS: mounting devfs at /dev
>>> VFS: mounting procfs at /proc
>>> VFS: mounting ramfs at /tmp
>>> disk read (real mode): 28.31ms, (+28.31ms)
>>> uncompress lzloader.elf: 49.63ms, (+21.32ms)
>>> TLS initialization: 50.23ms, (+0.59ms)
>>> .init functions: 52.22ms, (+1.99ms)
>>> SMP launched: 53.01ms, (+0.79ms)
>>> VFS initialized: 55.25ms, (+2.24ms)
>>> Network initialized: 55.54ms, (+0.29ms)
>>> pvpanic done: 55.66ms, (+0.12ms)
>>> pci enumerated: 60.40ms, (+4.74ms)
>>> drivers probe: 60.40ms, (+0.00ms)
>>> drivers loaded: 126.37ms, (+65.97ms)
>>> ROFS mounted: 128.65ms, (+2.28ms)
>>> Total time: 128.65ms, (+0.00ms)
>>> Hello from C code
>>> VFS: unmounting /dev
>>> VFS: unmounting /proc
>>> VFS: unmounting /
>>> ROFS: spent 1.00 ms reading from disk
>>> ROFS: read 21 512-byte blocks from disk
>>> ROFS: allocated 18 512-byte blocks of cache memory
>>> ROFS: hit ratio is 89.47%
>>> Powering off.
>>>
>>> *real 0m1.049s*
>>> *user 0m0.173s*
>>> *sys 0m0.253s*
>>>
>>> booted like so:
>>>
>>> qemu-system-x86_64 -m 2G -smp 4 \
>>>
>>>  -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0,scsi=off \
>>>
>>>  -drive 
>>> file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native
>>>  \
>>>
>>>  -enable-kvm -cpu host,+x2apic \
>>>
>>>  -chardev stdio,mux=on,id=stdio,signal=off \
>>>
>>>  -mon chardev=stdio,mode=readline 
>>>
>>>  -device isa-serial,chardev=stdio
>>>
>>>
>>> In both cases I am not using networking - only block device. BTW I have 
>>

Re: OSv runs on Docker's Hyperkit under 100ms

2018-04-16 Thread Waldek Kozaczuk
I have never tried brew to install it but possibly it can work.

I cloned it directly from https://github.com/moby/hyperkit and then built 
locally. That way I could put all kinds of debug statements to figure out 
why OSv was not working. Feel free to use my fork of hyperkit 
- https://github.com/wkozaczuk/hyperkit/tree/osv - which has ton of debug 
statements.

To build hyperkit locally you need to install developer tools from Apple 
that includes gcc, make, git, etc. I believe if you open terminal and type 
'gcc' it will ask you if you want to install developer tools. And then git 
clone, make and you have you own hyperkit under build subdirectory.

I did not have to modify hyperkit to make it work with OSv. All my 
modifications are on OSv multiboot branch 
- https://github.com/wkozaczuk/osv/tree/multiboot. Beside tons of debug 
statements added all over the place I added multiboot_header.asm 
and multiboot.S with some hard-coded values to pass correct memory info to 
OSv. I also modified Makefile, lzloader.ld and disabled assert in hpet.cc 
(https://github.com/wkozaczuk/osv/blob/multiboot/drivers/hpet.cc#L55) - 
hyperkit does not seem to support 64-bit counters. Finally I hacked 
arch/x64/apic.cc to properly read and then pass APIC memory base offset 
when enabling APIC - otherwise interrupts would not work. I do not not 
understand why original apic logic in OSv did not work. 

To run it have a script like this:

IMAGE=$1
DISK=$2

build/hyperkit -A -m 512M -s 0:0,hostbridge \
  -s 31,lpc \
  -l com1,stdio \
  -s 4,virtio-blk,$DISK \
  -f multiboot,$IMAGE

where IMAGE is lzloader.elf and IMAGE is build/release/usr.img converted to 
raw.

Enjoy!

Waldek

PS. I also had to hard code cmdline in loader.cc. I think it should come 
from multiboot.

On Sunday, April 15, 2018 at 8:36:54 PM UTC-4, Asias He wrote:
>
>
>
> On Wed, Apr 11, 2018 at 3:29 AM, Waldek Kozaczuk  > wrote:
>
>> Last week I have been trying to hack OSv to run on hyperkit and finally I 
>> managed to execute native hello world example with ROFS. 
>>
>> Here is a timing on hyperkit/OSX (the bootchart does not work on hyperkit 
>> due to not granular enough timer):
>>
>> OSv v0.24-516-gc872202
>> Hello from C code
>>
>> *real 0m0.075s *
>> *user 0m0.012s *
>> *sys 0m0.058s*
>>
>> command to boot it (please note that I hacked the lzloader ELF to support 
>> multiboot):
>>
>> hyperkit -A -m 512M \
>>   -s 0:0,hostbridge \
>>   -s 31,lpc \
>>   -l com1,stdio \
>>   -s 4,virtio-blk,test.img \
>>   -f multiboot,lzloader.elf
>>
>
> Impressive! How hard is it to setup hyperkit on osx, just brew install?
>
>  
>
>>
>> Here is a timing on QEMU/KVM on Linux (same hardware - my laptop is setup 
>> to triple-boot Ubuntu 16/Mac OSX and Windows):
>>
>> OSv v0.24-510-g451dc6d
>> 4 CPUs detected
>> Firmware vendor: SeaBIOS
>> bsd: initializing - done
>> VFS: mounting ramfs at /
>> VFS: mounting devfs at /dev
>> net: initializing - done
>> vga: Add VGA device instance
>> virtio-blk: Add blk device instances 0 as vblk0, devsize=8520192
>> random: intel drng, rdrand registered as a source.
>> random:  initialized
>> VFS: unmounting /dev
>> VFS: mounting rofs at /rofs
>> VFS: mounting devfs at /dev
>> VFS: mounting procfs at /proc
>> VFS: mounting ramfs at /tmp
>> disk read (real mode): 28.31ms, (+28.31ms)
>> uncompress lzloader.elf: 49.63ms, (+21.32ms)
>> TLS initialization: 50.23ms, (+0.59ms)
>> .init functions: 52.22ms, (+1.99ms)
>> SMP launched: 53.01ms, (+0.79ms)
>> VFS initialized: 55.25ms, (+2.24ms)
>> Network initialized: 55.54ms, (+0.29ms)
>> pvpanic done: 55.66ms, (+0.12ms)
>> pci enumerated: 60.40ms, (+4.74ms)
>> drivers probe: 60.40ms, (+0.00ms)
>> drivers loaded: 126.37ms, (+65.97ms)
>> ROFS mounted: 128.65ms, (+2.28ms)
>> Total time: 128.65ms, (+0.00ms)
>> Hello from C code
>> VFS: unmounting /dev
>> VFS: unmounting /proc
>> VFS: unmounting /
>> ROFS: spent 1.00 ms reading from disk
>> ROFS: read 21 512-byte blocks from disk
>> ROFS: allocated 18 512-byte blocks of cache memory
>> ROFS: hit ratio is 89.47%
>> Powering off.
>>
>> *real 0m1.049s*
>> *user 0m0.173s*
>> *sys 0m0.253s*
>>
>> booted like so:
>>
>> qemu-system-x86_64 -m 2G -smp 4 \
>>
>>  -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0,scsi=off \
>>
>>  -drive 
>> file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native
>>  \
>>
>>  -enable-kvm -cpu host,+x2apic \
>>
>>  -chardev stdio,mux=on,id=stdio,signal=off \
>>
>>  -mon chardev=stdio,mode=readline 
>>
>>  -device isa-serial,chardev=stdio
>>
>>
>> In both cases I am not using networking - only block device. BTW I have 
>> not tested how networking nor SMP on hyperkit with OSv. 
>>
>> So as you can see* OSv is 10 (ten) times faster* on the same hardware. I 
>> am not sure if my results are representative. But if they are it would mean 
>> that QEMU is probably the culprit. Please see my questions/consideration 
>> toward the end of the email.
>>
>> Anyway let me give you some background. What is 

Re: OSv runs on Docker's Hyperkit under 100ms

2018-04-15 Thread Asias He
On Wed, Apr 11, 2018 at 3:29 AM, Waldek Kozaczuk 
wrote:

> Last week I have been trying to hack OSv to run on hyperkit and finally I
> managed to execute native hello world example with ROFS.
>
> Here is a timing on hyperkit/OSX (the bootchart does not work on hyperkit
> due to not granular enough timer):
>
> OSv v0.24-516-gc872202
> Hello from C code
>
> *real 0m0.075s *
> *user 0m0.012s *
> *sys 0m0.058s*
>
> command to boot it (please note that I hacked the lzloader ELF to support
> multiboot):
>
> hyperkit -A -m 512M \
>   -s 0:0,hostbridge \
>   -s 31,lpc \
>   -l com1,stdio \
>   -s 4,virtio-blk,test.img \
>   -f multiboot,lzloader.elf
>

Impressive! How hard is it to setup hyperkit on osx, just brew install?



>
> Here is a timing on QEMU/KVM on Linux (same hardware - my laptop is setup
> to triple-boot Ubuntu 16/Mac OSX and Windows):
>
> OSv v0.24-510-g451dc6d
> 4 CPUs detected
> Firmware vendor: SeaBIOS
> bsd: initializing - done
> VFS: mounting ramfs at /
> VFS: mounting devfs at /dev
> net: initializing - done
> vga: Add VGA device instance
> virtio-blk: Add blk device instances 0 as vblk0, devsize=8520192
> random: intel drng, rdrand registered as a source.
> random:  initialized
> VFS: unmounting /dev
> VFS: mounting rofs at /rofs
> VFS: mounting devfs at /dev
> VFS: mounting procfs at /proc
> VFS: mounting ramfs at /tmp
> disk read (real mode): 28.31ms, (+28.31ms)
> uncompress lzloader.elf: 49.63ms, (+21.32ms)
> TLS initialization: 50.23ms, (+0.59ms)
> .init functions: 52.22ms, (+1.99ms)
> SMP launched: 53.01ms, (+0.79ms)
> VFS initialized: 55.25ms, (+2.24ms)
> Network initialized: 55.54ms, (+0.29ms)
> pvpanic done: 55.66ms, (+0.12ms)
> pci enumerated: 60.40ms, (+4.74ms)
> drivers probe: 60.40ms, (+0.00ms)
> drivers loaded: 126.37ms, (+65.97ms)
> ROFS mounted: 128.65ms, (+2.28ms)
> Total time: 128.65ms, (+0.00ms)
> Hello from C code
> VFS: unmounting /dev
> VFS: unmounting /proc
> VFS: unmounting /
> ROFS: spent 1.00 ms reading from disk
> ROFS: read 21 512-byte blocks from disk
> ROFS: allocated 18 512-byte blocks of cache memory
> ROFS: hit ratio is 89.47%
> Powering off.
>
> *real 0m1.049s*
> *user 0m0.173s*
> *sys 0m0.253s*
>
> booted like so:
>
> qemu-system-x86_64 -m 2G -smp 4 \
>
>  -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0,scsi=off \
>
>  -drive 
> file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native
>  \
>
>  -enable-kvm -cpu host,+x2apic \
>
>  -chardev stdio,mux=on,id=stdio,signal=off \
>
>  -mon chardev=stdio,mode=readline
>
>  -device isa-serial,chardev=stdio
>
>
> In both cases I am not using networking - only block device. BTW I have
> not tested how networking nor SMP on hyperkit with OSv.
>
> So as you can see* OSv is 10 (ten) times faster* on the same hardware. I
> am not sure if my results are representative. But if they are it would mean
> that QEMU is probably the culprit. Please see my questions/consideration
> toward the end of the email.
>
> Anyway let me give you some background. What is hyperkit? Hyperkit (
> https://github.com/moby/hyperkit) is a fork by Docker of xhyve (
> https://github.com/mist64/xhyve) which itself is a port of bhyve (
> https://www.freebsd.org/doc/handbook/virtualization-host-bhyve.html) -
> hypervisor on FreeBSD. Bhyve architecture is similar to that of KVM/QEMU
> but QEMU-equivalent of bhyve is much lighter and simpler:
>
> "The bhyve BSD-licensed hypervisor became part of the base system with
> FreeBSD 10.0-RELEASE. This hypervisor supports a number of guests,
> including FreeBSD, OpenBSD, and many Linux® distributions. By default,
> bhyve provides access to serial console and does not emulate a graphical
> console. Virtualization offload features of newer CPUs are used to avoid
> the legacy methods of translating instructions and manually managing memory
> mappings.
>
> The bhyve design requires a processor that supports Intel® Extended Page
> Tables (EPT) or AMD® Rapid Virtualization Indexing (RVI) or Nested Page
> Tables (NPT). Hosting Linux® guests or FreeBSD guests with more than one
> vCPU requires VMX unrestricted mode support (UG). Most newer processors,
> specifically the Intel® Core™ i3/i5/i7 and Intel® Xeon™ E3/E5/E7, support
> these features. UG support was introduced with Intel's Westmere
> micro-architecture. For a complete list of Intel® processors that support
> EPT, refer to http://ark.intel.com/search/advanced?s=t&;
> ExtendedPageTables=true. RVI is found on the third generation and later
> of the AMD Opteron™ (Barcelona) processors"
>
> Hyperkit/Xhyve is a port of bhyve but targets Apple OSX as a host system
> and instead of FreeBSD vmm kernel module uses Apple hypervisor framework (
> https://developer.apple.com/documentation/hypervisor). Docker, I think,
> forked xhyve to create hyperkit in order to provide lighter alternative of
> running Docker containers on Linux on Mac. So in essence hyperkit is a
> component of Docker for Mac vs Docker Machine/Toolbox (based on
> Vir

Re: OSv runs on Docker's Hyperkit under 100ms

2018-04-15 Thread Waldek Kozaczuk
Please see my responses inlined below.

On Sunday, April 15, 2018 at 11:30:54 AM UTC-4, Nadav Har'El wrote:
>
>
> On Tue, Apr 10, 2018 at 10:29 PM, Waldek Kozaczuk  > wrote:
>
>> Last week I have been trying to hack OSv to run on hyperkit and finally I 
>> managed to execute native hello world example with ROFS. 
>>
>
> Excellent :-)
>  
>
>>
>> Here is a timing on hyperkit/OSX (the bootchart does not work on hyperkit 
>> due to not granular enough timer):
>>
>> OSv v0.24-516-gc872202
>> Hello from C code
>>
>> *real 0m0.075s*
>>
>
> Impressive :-)
>  
>
>> *user 0m0.012s *
>> *sys 0m0.058s*
>>
>> command to boot it (please note that I hacked the lzloader ELF to support 
>> multiboot):
>>
>
> What kind of hack is this?
>

The hacks are somewhat described in the last post 
of https://github.com/cloudius-systems/osv/issues/948. The biggest hack 
which I have not posted details of yet was in the logic enabling xapic. In 
essence there is some peculiarity how APIC is setup (please see this code 
with comments on my multiboot branch 
- https://github.com/wkozaczuk/osv/blob/multiboot/arch/x64/apic.cc#L90-L124). 
So OSv reads memory address base in apic_driver::read_base() and than in 
ought to pass it back in xapic::enable(). Whatever hyperkit gets is NOT 
what it advertised in read_base() and crashes. So if 
I hardcode the base address in xapic::enable() without masking it and 
passing as is in enable it all works - meaning hyperkit does not abort and 
OSv receives interrupts. 
The same change breaks OSv on QEMU though. 
 

> I notice in arch/x64/boot32.S we do try to support the multiboot format. 
> Maybe the compression destroyed that?
> If so, maybe this should be considered a bug? I never tried to do anything 
> with multiboot myself. Avi added this
> code to boot32.s in the very first days of OSv, in December 2012! (commit 
> bf2c6bae2)
>
>
>> hyperkit -A -m 512M \
>>   -s 0:0,hostbridge \
>>   -s 31,lpc \
>>   -l com1,stdio \
>>   -s 4,virtio-blk,test.img \
>>   -f multiboot,lzloader.elf
>>
>> Here is a timing on QEMU/KVM on Linux (same hardware - my laptop is setup 
>> to triple-boot Ubuntu 16/Mac OSX and Windows):
>>
>> OSv v0.24-510-g451dc6d
>> 4 CPUs detected
>> Firmware vendor: SeaBIOS
>> bsd: initializing - done
>> VFS: mounting ramfs at /
>> VFS: mounting devfs at /dev
>> net: initializing - done
>> vga: Add VGA device instance
>> virtio-blk: Add blk device instances 0 as vblk0, devsize=8520192
>> random: intel drng, rdrand registered as a source.
>> random:  initialized
>> VFS: unmounting /dev
>> VFS: mounting rofs at /rofs
>> VFS: mounting devfs at /dev
>> VFS: mounting procfs at /proc
>> VFS: mounting ramfs at /tmp
>> disk read (real mode): 28.31ms, (+28.31ms)
>> uncompress lzloader.elf: 49.63ms, (+21.32ms)
>> TLS initialization: 50.23ms, (+0.59ms)
>> .init functions: 52.22ms, (+1.99ms)
>> SMP launched: 53.01ms, (+0.79ms)
>> VFS initialized: 55.25ms, (+2.24ms)
>> Network initialized: 55.54ms, (+0.29ms)
>> pvpanic done: 55.66ms, (+0.12ms)
>> pci enumerated: 60.40ms, (+4.74ms)
>> drivers probe: 60.40ms, (+0.00ms)
>> drivers loaded: 126.37ms, (+65.97ms)
>>
>
> This one is a whopper. I wonder if it's some sort of qemu limitation 
> making driver initialization so slow, or we're doing something slow in OSv.
>  
>
>> ROFS mounted: 128.65ms, (+2.28ms)
>> Total time: 128.65ms, (+0.00ms)
>> Hello from C code
>> VFS: unmounting /dev
>> VFS: unmounting /proc
>> VFS: unmounting /
>> ROFS: spent 1.00 ms reading from disk
>> ROFS: read 21 512-byte blocks from disk
>> ROFS: allocated 18 512-byte blocks of cache memory
>> ROFS: hit ratio is 89.47%
>> Powering off.
>>
>> *real 0m1.049s*
>>
>
> So according to this, OSv took 128ms to boot, and there is about 900ms 
> more overhead of some sort coming from qemu?
>

In my other recent email about kvmtool I mentioned that I tried qemu-lite 
(which I think is a subset of QEMU done by deactivating a lot of stuff) and 
could see QEMU+KVM+OSv take only 250ms. So still slower than hyperkit but 
much better than regular QEMU. This is also not surprising given that 
hyperkit is only 25K lines of code.

Also another advantage is multiboot where host mmaps lzloader.elf and OSv 
does not need to read its kernel in real mode. As a matter of fact whole 
real mode logic is bypassed.
 

>  
>
>> *user 0m0.173s*
>> *sys 0m0.253s*
>>
>> booted like so:
>>
>> qemu-system-x86_64 -m 2G -smp 4 \
>>
>>  -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0,scsi=off \
>>
>>  -drive 
>> file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native
>>  \
>>
>>  -enable-kvm -cpu host,+x2apic \
>>
>>  -chardev stdio,mux=on,id=stdio,signal=off \
>>
>>  -mon chardev=stdio,mode=readline 
>>
>>  -device isa-serial,chardev=stdio
>>
>>
>> In both cases I am not using networking - only block device. BTW I have 
>> not tested how networking nor SMP on hyperkit with OSv. 
>>
>> So as you can see* OSv is 10 (ten) times faster* on the same hardware. I 
>> am not sure if my re

Re: OSv runs on Docker's Hyperkit under 100ms

2018-04-15 Thread Nadav Har'El
On Tue, Apr 10, 2018 at 10:29 PM, Waldek Kozaczuk 
wrote:

> Last week I have been trying to hack OSv to run on hyperkit and finally I
> managed to execute native hello world example with ROFS.
>

Excellent :-)


>
> Here is a timing on hyperkit/OSX (the bootchart does not work on hyperkit
> due to not granular enough timer):
>
> OSv v0.24-516-gc872202
> Hello from C code
>
> *real 0m0.075s*
>

Impressive :-)


> *user 0m0.012s *
> *sys 0m0.058s*
>
> command to boot it (please note that I hacked the lzloader ELF to support
> multiboot):
>

What kind of hack is this?
I notice in arch/x64/boot32.S we do try to support the multiboot format.
Maybe the compression destroyed that?
If so, maybe this should be considered a bug? I never tried to do anything
with multiboot myself. Avi added this
code to boot32.s in the very first days of OSv, in December 2012! (commit
bf2c6bae2)


> hyperkit -A -m 512M \
>   -s 0:0,hostbridge \
>   -s 31,lpc \
>   -l com1,stdio \
>   -s 4,virtio-blk,test.img \
>   -f multiboot,lzloader.elf
>
> Here is a timing on QEMU/KVM on Linux (same hardware - my laptop is setup
> to triple-boot Ubuntu 16/Mac OSX and Windows):
>
> OSv v0.24-510-g451dc6d
> 4 CPUs detected
> Firmware vendor: SeaBIOS
> bsd: initializing - done
> VFS: mounting ramfs at /
> VFS: mounting devfs at /dev
> net: initializing - done
> vga: Add VGA device instance
> virtio-blk: Add blk device instances 0 as vblk0, devsize=8520192
> random: intel drng, rdrand registered as a source.
> random:  initialized
> VFS: unmounting /dev
> VFS: mounting rofs at /rofs
> VFS: mounting devfs at /dev
> VFS: mounting procfs at /proc
> VFS: mounting ramfs at /tmp
> disk read (real mode): 28.31ms, (+28.31ms)
> uncompress lzloader.elf: 49.63ms, (+21.32ms)
> TLS initialization: 50.23ms, (+0.59ms)
> .init functions: 52.22ms, (+1.99ms)
> SMP launched: 53.01ms, (+0.79ms)
> VFS initialized: 55.25ms, (+2.24ms)
> Network initialized: 55.54ms, (+0.29ms)
> pvpanic done: 55.66ms, (+0.12ms)
> pci enumerated: 60.40ms, (+4.74ms)
> drivers probe: 60.40ms, (+0.00ms)
> drivers loaded: 126.37ms, (+65.97ms)
>

This one is a whopper. I wonder if it's some sort of qemu limitation making
driver initialization so slow, or we're doing something slow in OSv.


> ROFS mounted: 128.65ms, (+2.28ms)
> Total time: 128.65ms, (+0.00ms)
> Hello from C code
> VFS: unmounting /dev
> VFS: unmounting /proc
> VFS: unmounting /
> ROFS: spent 1.00 ms reading from disk
> ROFS: read 21 512-byte blocks from disk
> ROFS: allocated 18 512-byte blocks of cache memory
> ROFS: hit ratio is 89.47%
> Powering off.
>
> *real 0m1.049s*
>

So according to this, OSv took 128ms to boot, and there is about 900ms more
overhead of some sort coming from qemu?


> *user 0m0.173s*
> *sys 0m0.253s*
>
> booted like so:
>
> qemu-system-x86_64 -m 2G -smp 4 \
>
>  -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0,scsi=off \
>
>  -drive 
> file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native
>  \
>
>  -enable-kvm -cpu host,+x2apic \
>
>  -chardev stdio,mux=on,id=stdio,signal=off \
>
>  -mon chardev=stdio,mode=readline
>
>  -device isa-serial,chardev=stdio
>
>
> In both cases I am not using networking - only block device. BTW I have
> not tested how networking nor SMP on hyperkit with OSv.
>
> So as you can see* OSv is 10 (ten) times faster* on the same hardware. I
> am not sure if my results are representative. But if they are it would mean
> that QEMU is probably the culprit. Please see my questions/consideration
> toward the end of the email.
>
> Anyway let me give you some background. What is hyperkit? Hyperkit (
> https://github.com/moby/hyperkit) is a fork by Docker of xhyve (
> https://github.com/mist64/xhyve) which itself is a port of bhyve (
> https://www.freebsd.org/doc/handbook/virtualization-host-bhyve.html) -
> hypervisor on FreeBSD. Bhyve architecture is similar to that of KVM/QEMU
> but QEMU-equivalent of bhyve is much lighter and simpler:
>
> "The bhyve BSD-licensed hypervisor became part of the base system with
> FreeBSD 10.0-RELEASE. This hypervisor supports a number of guests,
> including FreeBSD, OpenBSD, and many Linux® distributions. By default,
> bhyve provides access to serial console and does not emulate a graphical
> console. Virtualization offload features of newer CPUs are used to avoid
> the legacy methods of translating instructions and manually managing memory
> mappings.
>
> The bhyve design requires a processor that supports Intel® Extended Page
> Tables (EPT) or AMD® Rapid Virtualization Indexing (RVI) or Nested Page
> Tables (NPT). Hosting Linux® guests or FreeBSD guests with more than one
> vCPU requires VMX unrestricted mode support (UG). Most newer processors,
> specifically the Intel® Core™ i3/i5/i7 and Intel® Xeon™ E3/E5/E7, support
> these features. UG support was introduced with Intel's Westmere
> micro-architecture. For a complete list of Intel® processors that support
> EPT, refer to http://ark.intel.com/search/advanced?s=t&

OSv runs on Docker's Hyperkit under 100ms

2018-04-10 Thread Waldek Kozaczuk
Last week I have been trying to hack OSv to run on hyperkit and finally I 
managed to execute native hello world example with ROFS. 

Here is a timing on hyperkit/OSX (the bootchart does not work on hyperkit 
due to not granular enough timer):

OSv v0.24-516-gc872202
Hello from C code

*real 0m0.075s *
*user 0m0.012s *
*sys 0m0.058s*

command to boot it (please note that I hacked the lzloader ELF to support 
multiboot):

hyperkit -A -m 512M \
  -s 0:0,hostbridge \
  -s 31,lpc \
  -l com1,stdio \
  -s 4,virtio-blk,test.img \
  -f multiboot,lzloader.elf

Here is a timing on QEMU/KVM on Linux (same hardware - my laptop is setup 
to triple-boot Ubuntu 16/Mac OSX and Windows):

OSv v0.24-510-g451dc6d
4 CPUs detected
Firmware vendor: SeaBIOS
bsd: initializing - done
VFS: mounting ramfs at /
VFS: mounting devfs at /dev
net: initializing - done
vga: Add VGA device instance
virtio-blk: Add blk device instances 0 as vblk0, devsize=8520192
random: intel drng, rdrand registered as a source.
random:  initialized
VFS: unmounting /dev
VFS: mounting rofs at /rofs
VFS: mounting devfs at /dev
VFS: mounting procfs at /proc
VFS: mounting ramfs at /tmp
disk read (real mode): 28.31ms, (+28.31ms)
uncompress lzloader.elf: 49.63ms, (+21.32ms)
TLS initialization: 50.23ms, (+0.59ms)
.init functions: 52.22ms, (+1.99ms)
SMP launched: 53.01ms, (+0.79ms)
VFS initialized: 55.25ms, (+2.24ms)
Network initialized: 55.54ms, (+0.29ms)
pvpanic done: 55.66ms, (+0.12ms)
pci enumerated: 60.40ms, (+4.74ms)
drivers probe: 60.40ms, (+0.00ms)
drivers loaded: 126.37ms, (+65.97ms)
ROFS mounted: 128.65ms, (+2.28ms)
Total time: 128.65ms, (+0.00ms)
Hello from C code
VFS: unmounting /dev
VFS: unmounting /proc
VFS: unmounting /
ROFS: spent 1.00 ms reading from disk
ROFS: read 21 512-byte blocks from disk
ROFS: allocated 18 512-byte blocks of cache memory
ROFS: hit ratio is 89.47%
Powering off.

*real 0m1.049s*
*user 0m0.173s*
*sys 0m0.253s*

booted like so:

qemu-system-x86_64 -m 2G -smp 4 \

 -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0,scsi=off \

 -drive 
file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native
 \

 -enable-kvm -cpu host,+x2apic \

 -chardev stdio,mux=on,id=stdio,signal=off \

 -mon chardev=stdio,mode=readline 

 -device isa-serial,chardev=stdio


In both cases I am not using networking - only block device. BTW I have not 
tested how networking nor SMP on hyperkit with OSv. 

So as you can see* OSv is 10 (ten) times faster* on the same hardware. I am 
not sure if my results are representative. But if they are it would mean 
that QEMU is probably the culprit. Please see my questions/consideration 
toward the end of the email.

Anyway let me give you some background. What is hyperkit? Hyperkit 
(https://github.com/moby/hyperkit) is a fork by Docker of xhyve 
(https://github.com/mist64/xhyve) which itself is a port of bhyve 
(https://www.freebsd.org/doc/handbook/virtualization-host-bhyve.html) - 
hypervisor on FreeBSD. Bhyve architecture is similar to that of KVM/QEMU 
but QEMU-equivalent of bhyve is much lighter and simpler:

"The bhyve BSD-licensed hypervisor became part of the base system with 
FreeBSD 10.0-RELEASE. This hypervisor supports a number of guests, 
including FreeBSD, OpenBSD, and many Linux® distributions. By default, bhyve
 provides access to serial console and does not emulate a graphical 
console. Virtualization offload features of newer CPUs are used to avoid 
the legacy methods of translating instructions and manually managing memory 
mappings.

The bhyve design requires a processor that supports Intel® Extended Page 
Tables (EPT) or AMD® Rapid Virtualization Indexing (RVI) or Nested Page 
Tables (NPT). Hosting Linux® guests or FreeBSD guests with more than one 
vCPU requires VMX unrestricted mode support (UG). Most newer processors, 
specifically the Intel® Core™ i3/i5/i7 and Intel® Xeon™ E3/E5/E7, support 
these features. UG support was introduced with Intel's Westmere 
micro-architecture. For a complete list of Intel® processors that support 
EPT, refer to 
http://ark.intel.com/search/advanced?s=t&ExtendedPageTables=true. RVI is 
found on the third generation and later of the AMD Opteron™ (Barcelona) 
processors"

Hyperkit/Xhyve is a port of bhyve but targets Apple OSX as a host system 
and instead of FreeBSD vmm kernel module uses Apple hypervisor framework 
(https://developer.apple.com/documentation/hypervisor). Docker, I think, 
forked xhyve to create hyperkit in order to provide lighter alternative of 
running Docker containers on Linux on Mac. So in essence hyperkit is a 
component of Docker for Mac vs Docker Machine/Toolbox (based on 
VirtualBox). Please see for details there 
- https://docs.docker.com/docker-for-mac/docker-toolbox/.

How does it apply to OSv? It only applies if you want to run OSv on Mac. 
Now the only choice is QEMU (dog slow because no KVM) or VirtualBox (pretty 
fast once OSv is up but it takes long time to boot and has other 
configuration quirks).