Re: OSv runs on Docker's Hyperkit under 100ms
On Fri, Apr 20, 2018 at 4:07 AM, Waldek Kozaczuk wrote: > To make SMP working I had to hack OSv to pass fee00900 when > enabling APIC for first CPU and fee00800 for all other CPUs. It > looks like (based on source code of hyperkit) it requires that the APIC > registers memory area base address passed in when enabling it needs to be > the same as when it was read. But why is it different for each CPU? It > looks like QEMU/KVM, VMware, XEN hypervisors OSv runs on do not have this > requirement. Unfortunately I am not very familiar with APIC so if anybody > can enlighten me I would appreciate it. > > I was also wondering if anybody knows what is the reason behind this logic: > > void apic_driver::read_base() > { > static constexpr u64 base_addr_mask = 0xFF000; > _apic_base = rdmsr(msr::IA32_APIC_BASE) & base_addr_mask; > } > > > Why are we masking with 0xFF000? Based on the logs from OSv when > running on hyperkit this logic effectively overwrites original APIC base > address as fee0: > > ### apic_driver:read_base() - read base as : fee00900 > > ### apic_driver:read_base() - saved base as : fee0 > > ### xapic:enable() - enabling with base as : fee00900 > > So in case of hyperkit when we pass fee0 instead of > fee00900 > (which is what hyperkit returned in read_base()) it rejects it. However the > same logic works just fine with other hypervisors. > First, to explain the various addresses you saw and what these "800" and "900" mean: The last 12 bits of this MSR are *not* part of the address (which is supposed to be page aligned, i.e., last 12 bits are zero), but rather, various other flags. Of particular interest are the bits 0x800 which means "enabled", and 0x100 which means "bootstrap". That latter should be set only on the first CPU - which explains why you saw 0x900 on the first CPU and 0x800 on all others. The bug is, as you suspected, probably in void xapic::enable() { wrmsr(msr::IA32_APIC_BASE, _apic_base | APIC_BASE_GLOBAL_ENABLE); software_enable(); } After _apic_base was previously stripped from all the bit flags, this code adds back just one. But apparently hyperkit doesn't like losing the BSP ("bootstrap") flag on the first APIC, for some reason. And it shouldn't have. It's a bug we removed it. I think this code should be changed to do: wrmsr(msr::IA32_APIC_BASE, rdmsr(msr::IA32_APIC_BASE) | APIC_BASE_GLOBAL_ENABLE); And not use "_apic_base" at all. I think that x2apic::enable() should be changed similarly to rdmsr instead of using _apic_base. If you could check these fixes and if they work, send a patch, that would be great Thanks. > > Waldek > > PS. Adding an article link I found abound APIC - > https://wiki.osdev.org/APIC > > On Tuesday, April 17, 2018 at 12:08:42 PM UTC-4, Waldek Kozaczuk wrote: >> >> I forgot to add that to achieve this kind of timing I built image with >> ROFS: >> >> ./scripts/build image=native-example fs=rofs >> >> On Monday, April 16, 2018 at 5:25:37 PM UTC-4, Waldek Kozaczuk wrote: >>> >>> I have never tried brew to install it but possibly it can work. >>> >>> I cloned it directly from https://github.com/moby/hyperkit and then >>> built locally. That way I could put all kinds of debug statements to figure >>> out why OSv was not working. Feel free to use my fork of hyperkit - >>> https://github.com/wkozaczuk/hyperkit/tree/osv - which has ton of debug >>> statements. >>> >>> To build hyperkit locally you need to install developer tools from Apple >>> that includes gcc, make, git, etc. I believe if you open terminal and type >>> 'gcc' it will ask you if you want to install developer tools. And then git >>> clone, make and you have you own hyperkit under build subdirectory. >>> >>> I did not have to modify hyperkit to make it work with OSv. All my >>> modifications are on OSv multiboot branch - https://github.com/wkozaczuk >>> /osv/tree/multiboot. Beside tons of debug statements added all over the >>> place I added multiboot_header.asm and multiboot.S with some hard-coded >>> values to pass correct memory info to OSv. I also modified Makefile, >>> lzloader.ld and disabled assert in hpet.cc ( >>> https://github.com/wkozaczuk/osv/blob/multiboot/drivers/hpet.cc#L55) - >>> hyperkit does not seem to support 64-bit counters. Finally I hacked >>> arch/x64/apic.cc to properly read and then pass APIC memory base offset >>> when enabling APIC - otherwise interrupts would not work. I do not not >>> understand why original apic logic in OSv did not work. >>> >>> To run it have a script like this: >>> >>> IMAGE=$1 >>> DISK=$2 >>> >>> build/hyperkit -A -m 512M -s 0:0,hostbridge \ >>> -s 31,lpc \ >>> -l com1,stdio \ >>> -s 4,virtio-blk,$DISK \ >>> -f multiboot,$IMAGE >>> >>> where IMAGE is lzloader.elf and IMAGE is build/release/usr.img converted >>> to raw. >>> >>> Enjoy! >>> >>> Waldek >>> >>> PS. I also had to hard code cmdline in loader.cc. I think it should
Re: OSv runs on Docker's Hyperkit under 100ms
I have run more tests and here are some more elaborate tests I have successfully run: - native example with ZFS - OSv v0.24-516-gc872202 console_multiplexer::console_multiplexer() acpi::early_init() interrupt_descriptor_table initialized ### apic_driver: Read base fee00900 ### apic_driver: _apic_base fee0 Hello from C code real0m0.165s user0m0.027s sys 0m0.141s - java with ROFS - OSv v0.24-519-g94a7640 console_multiplexer::console_multiplexer() acpi::early_init() interrupt_descriptor_table initialized ### apic_driver:read_base() - read base as : fee00900 ### apic_driver:read_base() - saved base as : fee0 ### apic_driver:enable() - enabling with base as : fee00900 1 CPUs detected Firmware vendor: BHYVE bsd: initializing - done VFS: mounting ramfs at / VFS: mounting devfs at /dev net: initializing - done ---> blk::blk - enabled MSI 1 device_register(): registering device vblk0 device_register(): registering device vblk0.1 virtio-blk: Add blk device instances 0 as vblk0, devsize=40128000 device_register(): registering device console device_register(): registering device null random: intel drng, rdrand registered as a source. device_register(): registering device random device_register(): registering device urandom random: initialized VFS: unmounting /dev VFS: mounting rofs at /rofs [rofs] device vblk0.1 opened! [rofs] read superblock! [rofs] read structure blocks! VFS: mounting devfs at /dev VFS: mounting procfs at /proc VFS: mounting ramfs at /tmp java.so: Starting JVM app using: io/osv/nonisolated/RunNonIsolatedJvmApp java.so: Setting Java system classloader to NonIsolatingOsvSystemClassLoader random: device unblocked. Hello, World! VFS: unmounting /dev VFS: unmounting /proc VFS: unmounting / ROFS: spent 42.90 ms reading from disk ROFS: read 35323 512-byte blocks from disk ROFS: allocated 35568 512-byte blocks of cache memory ROFS: hit ratio is 83.34% Powering off. real0m0.338s user0m0.035s sys 0m0.298s - httpserver to valdate networking - OSv v0.24-520-gf577249 console_multiplexer::console_multiplexer() acpi::early_init() smp_init() interrupt_descriptor_table initialized ### Before create_apic_driver ### apic_driver:read_base() - read base as : fee00900 ### apic_driver:read_base() - saved base as : fee0 ### xapic:enable() - enabling with base as : fee00900 1 CPUs detected Firmware vendor: BHYVE smp_launch() -> DONE bsd: initializing - done VFS: mounting ramfs at / VFS: mounting devfs at /dev net: initializing - done eth0: ethernet address: d2:e2:e3:b0:2e:38 ---> blk::blk - enabled MSI 1 device_register(): registering device vblk0 device_register(): registering device vblk0.1 virtio-blk: Add blk device instances 0 as vblk0, devsize=17775616 device_register(): registering device console device_register(): registering device null random: intel drng, rdrand registered as a source. device_register(): registering device random device_register(): registering device urandom random: initialized VFS: unmounting /dev VFS: mounting rofs at /rofs [rofs] device vblk0.1 opened! [rofs] read superblock! [rofs] read structure blocks! VFS: mounting devfs at /dev VFS: mounting procfs at /proc VFS: mounting ramfs at /tmp [I/27 dhcp]: Broadcasting DHCPDISCOVER message with xid: [1012600569] [I/27 dhcp]: Waiting for IP... random: device unblocked. [I/27 dhcp]: Broadcasting DHCPDISCOVER message with xid: [2082401196] [I/27 dhcp]: Waiting for IP... [I/33 dhcp]: Received DHCPOFFER message from DHCP server: 192.168.64.1 regarding offerred IP address: 192.168.64.4 [I/33 dhcp]: Broadcasting DHCPREQUEST message
Re: OSv runs on Docker's Hyperkit under 100ms
I forgot to add that to achieve this kind of timing I built image with ROFS: ./scripts/build image=native-example fs=rofs On Monday, April 16, 2018 at 5:25:37 PM UTC-4, Waldek Kozaczuk wrote: > > I have never tried brew to install it but possibly it can work. > > I cloned it directly from https://github.com/moby/hyperkit and then built > locally. That way I could put all kinds of debug statements to figure out > why OSv was not working. Feel free to use my fork of hyperkit - > https://github.com/wkozaczuk/hyperkit/tree/osv - which has ton of debug > statements. > > To build hyperkit locally you need to install developer tools from Apple > that includes gcc, make, git, etc. I believe if you open terminal and type > 'gcc' it will ask you if you want to install developer tools. And then git > clone, make and you have you own hyperkit under build subdirectory. > > I did not have to modify hyperkit to make it work with OSv. All my > modifications are on OSv multiboot branch - > https://github.com/wkozaczuk/osv/tree/multiboot. Beside tons of debug > statements added all over the place I added multiboot_header.asm > and multiboot.S with some hard-coded values to pass correct memory info to > OSv. I also modified Makefile, lzloader.ld and disabled assert in hpet.cc ( > https://github.com/wkozaczuk/osv/blob/multiboot/drivers/hpet.cc#L55) - > hyperkit does not seem to support 64-bit counters. Finally I hacked > arch/x64/apic.cc to properly read and then pass APIC memory base offset > when enabling APIC - otherwise interrupts would not work. I do not not > understand why original apic logic in OSv did not work. > > To run it have a script like this: > > IMAGE=$1 > DISK=$2 > > build/hyperkit -A -m 512M -s 0:0,hostbridge \ > -s 31,lpc \ > -l com1,stdio \ > -s 4,virtio-blk,$DISK \ > -f multiboot,$IMAGE > > where IMAGE is lzloader.elf and IMAGE is build/release/usr.img converted > to raw. > > Enjoy! > > Waldek > > PS. I also had to hard code cmdline in loader.cc. I think it should come > from multiboot. > > On Sunday, April 15, 2018 at 8:36:54 PM UTC-4, Asias He wrote: >> >> >> >> On Wed, Apr 11, 2018 at 3:29 AM, Waldek Kozaczuk >> wrote: >> >>> Last week I have been trying to hack OSv to run on hyperkit and finally >>> I managed to execute native hello world example with ROFS. >>> >>> Here is a timing on hyperkit/OSX (the bootchart does not work on >>> hyperkit due to not granular enough timer): >>> >>> OSv v0.24-516-gc872202 >>> Hello from C code >>> >>> *real 0m0.075s * >>> *user 0m0.012s * >>> *sys 0m0.058s* >>> >>> command to boot it (please note that I hacked the lzloader ELF to >>> support multiboot): >>> >>> hyperkit -A -m 512M \ >>> -s 0:0,hostbridge \ >>> -s 31,lpc \ >>> -l com1,stdio \ >>> -s 4,virtio-blk,test.img \ >>> -f multiboot,lzloader.elf >>> >> >> Impressive! How hard is it to setup hyperkit on osx, just brew install? >> >> >> >>> >>> Here is a timing on QEMU/KVM on Linux (same hardware - my laptop is >>> setup to triple-boot Ubuntu 16/Mac OSX and Windows): >>> >>> OSv v0.24-510-g451dc6d >>> 4 CPUs detected >>> Firmware vendor: SeaBIOS >>> bsd: initializing - done >>> VFS: mounting ramfs at / >>> VFS: mounting devfs at /dev >>> net: initializing - done >>> vga: Add VGA device instance >>> virtio-blk: Add blk device instances 0 as vblk0, devsize=8520192 >>> random: intel drng, rdrand registered as a source. >>> random: initialized >>> VFS: unmounting /dev >>> VFS: mounting rofs at /rofs >>> VFS: mounting devfs at /dev >>> VFS: mounting procfs at /proc >>> VFS: mounting ramfs at /tmp >>> disk read (real mode): 28.31ms, (+28.31ms) >>> uncompress lzloader.elf: 49.63ms, (+21.32ms) >>> TLS initialization: 50.23ms, (+0.59ms) >>> .init functions: 52.22ms, (+1.99ms) >>> SMP launched: 53.01ms, (+0.79ms) >>> VFS initialized: 55.25ms, (+2.24ms) >>> Network initialized: 55.54ms, (+0.29ms) >>> pvpanic done: 55.66ms, (+0.12ms) >>> pci enumerated: 60.40ms, (+4.74ms) >>> drivers probe: 60.40ms, (+0.00ms) >>> drivers loaded: 126.37ms, (+65.97ms) >>> ROFS mounted: 128.65ms, (+2.28ms) >>> Total time: 128.65ms, (+0.00ms) >>> Hello from C code >>> VFS: unmounting /dev >>> VFS: unmounting /proc >>> VFS: unmounting / >>> ROFS: spent 1.00 ms reading from disk >>> ROFS: read 21 512-byte blocks from disk >>> ROFS: allocated 18 512-byte blocks of cache memory >>> ROFS: hit ratio is 89.47% >>> Powering off. >>> >>> *real 0m1.049s* >>> *user 0m0.173s* >>> *sys 0m0.253s* >>> >>> booted like so: >>> >>> qemu-system-x86_64 -m 2G -smp 4 \ >>> >>> -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0,scsi=off \ >>> >>> -drive >>> file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native >>> \ >>> >>> -enable-kvm -cpu host,+x2apic \ >>> >>> -chardev stdio,mux=on,id=stdio,signal=off \ >>> >>> -mon chardev=stdio,mode=readline >>> >>> -device isa-serial,chardev=stdio >>> >>> >>> In both cases I am not using networking - only block device. BTW I have >>
Re: OSv runs on Docker's Hyperkit under 100ms
I have never tried brew to install it but possibly it can work. I cloned it directly from https://github.com/moby/hyperkit and then built locally. That way I could put all kinds of debug statements to figure out why OSv was not working. Feel free to use my fork of hyperkit - https://github.com/wkozaczuk/hyperkit/tree/osv - which has ton of debug statements. To build hyperkit locally you need to install developer tools from Apple that includes gcc, make, git, etc. I believe if you open terminal and type 'gcc' it will ask you if you want to install developer tools. And then git clone, make and you have you own hyperkit under build subdirectory. I did not have to modify hyperkit to make it work with OSv. All my modifications are on OSv multiboot branch - https://github.com/wkozaczuk/osv/tree/multiboot. Beside tons of debug statements added all over the place I added multiboot_header.asm and multiboot.S with some hard-coded values to pass correct memory info to OSv. I also modified Makefile, lzloader.ld and disabled assert in hpet.cc (https://github.com/wkozaczuk/osv/blob/multiboot/drivers/hpet.cc#L55) - hyperkit does not seem to support 64-bit counters. Finally I hacked arch/x64/apic.cc to properly read and then pass APIC memory base offset when enabling APIC - otherwise interrupts would not work. I do not not understand why original apic logic in OSv did not work. To run it have a script like this: IMAGE=$1 DISK=$2 build/hyperkit -A -m 512M -s 0:0,hostbridge \ -s 31,lpc \ -l com1,stdio \ -s 4,virtio-blk,$DISK \ -f multiboot,$IMAGE where IMAGE is lzloader.elf and IMAGE is build/release/usr.img converted to raw. Enjoy! Waldek PS. I also had to hard code cmdline in loader.cc. I think it should come from multiboot. On Sunday, April 15, 2018 at 8:36:54 PM UTC-4, Asias He wrote: > > > > On Wed, Apr 11, 2018 at 3:29 AM, Waldek Kozaczuk > wrote: > >> Last week I have been trying to hack OSv to run on hyperkit and finally I >> managed to execute native hello world example with ROFS. >> >> Here is a timing on hyperkit/OSX (the bootchart does not work on hyperkit >> due to not granular enough timer): >> >> OSv v0.24-516-gc872202 >> Hello from C code >> >> *real 0m0.075s * >> *user 0m0.012s * >> *sys 0m0.058s* >> >> command to boot it (please note that I hacked the lzloader ELF to support >> multiboot): >> >> hyperkit -A -m 512M \ >> -s 0:0,hostbridge \ >> -s 31,lpc \ >> -l com1,stdio \ >> -s 4,virtio-blk,test.img \ >> -f multiboot,lzloader.elf >> > > Impressive! How hard is it to setup hyperkit on osx, just brew install? > > > >> >> Here is a timing on QEMU/KVM on Linux (same hardware - my laptop is setup >> to triple-boot Ubuntu 16/Mac OSX and Windows): >> >> OSv v0.24-510-g451dc6d >> 4 CPUs detected >> Firmware vendor: SeaBIOS >> bsd: initializing - done >> VFS: mounting ramfs at / >> VFS: mounting devfs at /dev >> net: initializing - done >> vga: Add VGA device instance >> virtio-blk: Add blk device instances 0 as vblk0, devsize=8520192 >> random: intel drng, rdrand registered as a source. >> random: initialized >> VFS: unmounting /dev >> VFS: mounting rofs at /rofs >> VFS: mounting devfs at /dev >> VFS: mounting procfs at /proc >> VFS: mounting ramfs at /tmp >> disk read (real mode): 28.31ms, (+28.31ms) >> uncompress lzloader.elf: 49.63ms, (+21.32ms) >> TLS initialization: 50.23ms, (+0.59ms) >> .init functions: 52.22ms, (+1.99ms) >> SMP launched: 53.01ms, (+0.79ms) >> VFS initialized: 55.25ms, (+2.24ms) >> Network initialized: 55.54ms, (+0.29ms) >> pvpanic done: 55.66ms, (+0.12ms) >> pci enumerated: 60.40ms, (+4.74ms) >> drivers probe: 60.40ms, (+0.00ms) >> drivers loaded: 126.37ms, (+65.97ms) >> ROFS mounted: 128.65ms, (+2.28ms) >> Total time: 128.65ms, (+0.00ms) >> Hello from C code >> VFS: unmounting /dev >> VFS: unmounting /proc >> VFS: unmounting / >> ROFS: spent 1.00 ms reading from disk >> ROFS: read 21 512-byte blocks from disk >> ROFS: allocated 18 512-byte blocks of cache memory >> ROFS: hit ratio is 89.47% >> Powering off. >> >> *real 0m1.049s* >> *user 0m0.173s* >> *sys 0m0.253s* >> >> booted like so: >> >> qemu-system-x86_64 -m 2G -smp 4 \ >> >> -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0,scsi=off \ >> >> -drive >> file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native >> \ >> >> -enable-kvm -cpu host,+x2apic \ >> >> -chardev stdio,mux=on,id=stdio,signal=off \ >> >> -mon chardev=stdio,mode=readline >> >> -device isa-serial,chardev=stdio >> >> >> In both cases I am not using networking - only block device. BTW I have >> not tested how networking nor SMP on hyperkit with OSv. >> >> So as you can see* OSv is 10 (ten) times faster* on the same hardware. I >> am not sure if my results are representative. But if they are it would mean >> that QEMU is probably the culprit. Please see my questions/consideration >> toward the end of the email. >> >> Anyway let me give you some background. What is
Re: OSv runs on Docker's Hyperkit under 100ms
On Wed, Apr 11, 2018 at 3:29 AM, Waldek Kozaczuk wrote: > Last week I have been trying to hack OSv to run on hyperkit and finally I > managed to execute native hello world example with ROFS. > > Here is a timing on hyperkit/OSX (the bootchart does not work on hyperkit > due to not granular enough timer): > > OSv v0.24-516-gc872202 > Hello from C code > > *real 0m0.075s * > *user 0m0.012s * > *sys 0m0.058s* > > command to boot it (please note that I hacked the lzloader ELF to support > multiboot): > > hyperkit -A -m 512M \ > -s 0:0,hostbridge \ > -s 31,lpc \ > -l com1,stdio \ > -s 4,virtio-blk,test.img \ > -f multiboot,lzloader.elf > Impressive! How hard is it to setup hyperkit on osx, just brew install? > > Here is a timing on QEMU/KVM on Linux (same hardware - my laptop is setup > to triple-boot Ubuntu 16/Mac OSX and Windows): > > OSv v0.24-510-g451dc6d > 4 CPUs detected > Firmware vendor: SeaBIOS > bsd: initializing - done > VFS: mounting ramfs at / > VFS: mounting devfs at /dev > net: initializing - done > vga: Add VGA device instance > virtio-blk: Add blk device instances 0 as vblk0, devsize=8520192 > random: intel drng, rdrand registered as a source. > random: initialized > VFS: unmounting /dev > VFS: mounting rofs at /rofs > VFS: mounting devfs at /dev > VFS: mounting procfs at /proc > VFS: mounting ramfs at /tmp > disk read (real mode): 28.31ms, (+28.31ms) > uncompress lzloader.elf: 49.63ms, (+21.32ms) > TLS initialization: 50.23ms, (+0.59ms) > .init functions: 52.22ms, (+1.99ms) > SMP launched: 53.01ms, (+0.79ms) > VFS initialized: 55.25ms, (+2.24ms) > Network initialized: 55.54ms, (+0.29ms) > pvpanic done: 55.66ms, (+0.12ms) > pci enumerated: 60.40ms, (+4.74ms) > drivers probe: 60.40ms, (+0.00ms) > drivers loaded: 126.37ms, (+65.97ms) > ROFS mounted: 128.65ms, (+2.28ms) > Total time: 128.65ms, (+0.00ms) > Hello from C code > VFS: unmounting /dev > VFS: unmounting /proc > VFS: unmounting / > ROFS: spent 1.00 ms reading from disk > ROFS: read 21 512-byte blocks from disk > ROFS: allocated 18 512-byte blocks of cache memory > ROFS: hit ratio is 89.47% > Powering off. > > *real 0m1.049s* > *user 0m0.173s* > *sys 0m0.253s* > > booted like so: > > qemu-system-x86_64 -m 2G -smp 4 \ > > -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0,scsi=off \ > > -drive > file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native > \ > > -enable-kvm -cpu host,+x2apic \ > > -chardev stdio,mux=on,id=stdio,signal=off \ > > -mon chardev=stdio,mode=readline > > -device isa-serial,chardev=stdio > > > In both cases I am not using networking - only block device. BTW I have > not tested how networking nor SMP on hyperkit with OSv. > > So as you can see* OSv is 10 (ten) times faster* on the same hardware. I > am not sure if my results are representative. But if they are it would mean > that QEMU is probably the culprit. Please see my questions/consideration > toward the end of the email. > > Anyway let me give you some background. What is hyperkit? Hyperkit ( > https://github.com/moby/hyperkit) is a fork by Docker of xhyve ( > https://github.com/mist64/xhyve) which itself is a port of bhyve ( > https://www.freebsd.org/doc/handbook/virtualization-host-bhyve.html) - > hypervisor on FreeBSD. Bhyve architecture is similar to that of KVM/QEMU > but QEMU-equivalent of bhyve is much lighter and simpler: > > "The bhyve BSD-licensed hypervisor became part of the base system with > FreeBSD 10.0-RELEASE. This hypervisor supports a number of guests, > including FreeBSD, OpenBSD, and many Linux® distributions. By default, > bhyve provides access to serial console and does not emulate a graphical > console. Virtualization offload features of newer CPUs are used to avoid > the legacy methods of translating instructions and manually managing memory > mappings. > > The bhyve design requires a processor that supports Intel® Extended Page > Tables (EPT) or AMD® Rapid Virtualization Indexing (RVI) or Nested Page > Tables (NPT). Hosting Linux® guests or FreeBSD guests with more than one > vCPU requires VMX unrestricted mode support (UG). Most newer processors, > specifically the Intel® Core™ i3/i5/i7 and Intel® Xeon™ E3/E5/E7, support > these features. UG support was introduced with Intel's Westmere > micro-architecture. For a complete list of Intel® processors that support > EPT, refer to http://ark.intel.com/search/advanced?s=t&; > ExtendedPageTables=true. RVI is found on the third generation and later > of the AMD Opteron™ (Barcelona) processors" > > Hyperkit/Xhyve is a port of bhyve but targets Apple OSX as a host system > and instead of FreeBSD vmm kernel module uses Apple hypervisor framework ( > https://developer.apple.com/documentation/hypervisor). Docker, I think, > forked xhyve to create hyperkit in order to provide lighter alternative of > running Docker containers on Linux on Mac. So in essence hyperkit is a > component of Docker for Mac vs Docker Machine/Toolbox (based on > Vir
Re: OSv runs on Docker's Hyperkit under 100ms
Please see my responses inlined below. On Sunday, April 15, 2018 at 11:30:54 AM UTC-4, Nadav Har'El wrote: > > > On Tue, Apr 10, 2018 at 10:29 PM, Waldek Kozaczuk > wrote: > >> Last week I have been trying to hack OSv to run on hyperkit and finally I >> managed to execute native hello world example with ROFS. >> > > Excellent :-) > > >> >> Here is a timing on hyperkit/OSX (the bootchart does not work on hyperkit >> due to not granular enough timer): >> >> OSv v0.24-516-gc872202 >> Hello from C code >> >> *real 0m0.075s* >> > > Impressive :-) > > >> *user 0m0.012s * >> *sys 0m0.058s* >> >> command to boot it (please note that I hacked the lzloader ELF to support >> multiboot): >> > > What kind of hack is this? > The hacks are somewhat described in the last post of https://github.com/cloudius-systems/osv/issues/948. The biggest hack which I have not posted details of yet was in the logic enabling xapic. In essence there is some peculiarity how APIC is setup (please see this code with comments on my multiboot branch - https://github.com/wkozaczuk/osv/blob/multiboot/arch/x64/apic.cc#L90-L124). So OSv reads memory address base in apic_driver::read_base() and than in ought to pass it back in xapic::enable(). Whatever hyperkit gets is NOT what it advertised in read_base() and crashes. So if I hardcode the base address in xapic::enable() without masking it and passing as is in enable it all works - meaning hyperkit does not abort and OSv receives interrupts. The same change breaks OSv on QEMU though. > I notice in arch/x64/boot32.S we do try to support the multiboot format. > Maybe the compression destroyed that? > If so, maybe this should be considered a bug? I never tried to do anything > with multiboot myself. Avi added this > code to boot32.s in the very first days of OSv, in December 2012! (commit > bf2c6bae2) > > >> hyperkit -A -m 512M \ >> -s 0:0,hostbridge \ >> -s 31,lpc \ >> -l com1,stdio \ >> -s 4,virtio-blk,test.img \ >> -f multiboot,lzloader.elf >> >> Here is a timing on QEMU/KVM on Linux (same hardware - my laptop is setup >> to triple-boot Ubuntu 16/Mac OSX and Windows): >> >> OSv v0.24-510-g451dc6d >> 4 CPUs detected >> Firmware vendor: SeaBIOS >> bsd: initializing - done >> VFS: mounting ramfs at / >> VFS: mounting devfs at /dev >> net: initializing - done >> vga: Add VGA device instance >> virtio-blk: Add blk device instances 0 as vblk0, devsize=8520192 >> random: intel drng, rdrand registered as a source. >> random: initialized >> VFS: unmounting /dev >> VFS: mounting rofs at /rofs >> VFS: mounting devfs at /dev >> VFS: mounting procfs at /proc >> VFS: mounting ramfs at /tmp >> disk read (real mode): 28.31ms, (+28.31ms) >> uncompress lzloader.elf: 49.63ms, (+21.32ms) >> TLS initialization: 50.23ms, (+0.59ms) >> .init functions: 52.22ms, (+1.99ms) >> SMP launched: 53.01ms, (+0.79ms) >> VFS initialized: 55.25ms, (+2.24ms) >> Network initialized: 55.54ms, (+0.29ms) >> pvpanic done: 55.66ms, (+0.12ms) >> pci enumerated: 60.40ms, (+4.74ms) >> drivers probe: 60.40ms, (+0.00ms) >> drivers loaded: 126.37ms, (+65.97ms) >> > > This one is a whopper. I wonder if it's some sort of qemu limitation > making driver initialization so slow, or we're doing something slow in OSv. > > >> ROFS mounted: 128.65ms, (+2.28ms) >> Total time: 128.65ms, (+0.00ms) >> Hello from C code >> VFS: unmounting /dev >> VFS: unmounting /proc >> VFS: unmounting / >> ROFS: spent 1.00 ms reading from disk >> ROFS: read 21 512-byte blocks from disk >> ROFS: allocated 18 512-byte blocks of cache memory >> ROFS: hit ratio is 89.47% >> Powering off. >> >> *real 0m1.049s* >> > > So according to this, OSv took 128ms to boot, and there is about 900ms > more overhead of some sort coming from qemu? > In my other recent email about kvmtool I mentioned that I tried qemu-lite (which I think is a subset of QEMU done by deactivating a lot of stuff) and could see QEMU+KVM+OSv take only 250ms. So still slower than hyperkit but much better than regular QEMU. This is also not surprising given that hyperkit is only 25K lines of code. Also another advantage is multiboot where host mmaps lzloader.elf and OSv does not need to read its kernel in real mode. As a matter of fact whole real mode logic is bypassed. > > >> *user 0m0.173s* >> *sys 0m0.253s* >> >> booted like so: >> >> qemu-system-x86_64 -m 2G -smp 4 \ >> >> -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0,scsi=off \ >> >> -drive >> file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native >> \ >> >> -enable-kvm -cpu host,+x2apic \ >> >> -chardev stdio,mux=on,id=stdio,signal=off \ >> >> -mon chardev=stdio,mode=readline >> >> -device isa-serial,chardev=stdio >> >> >> In both cases I am not using networking - only block device. BTW I have >> not tested how networking nor SMP on hyperkit with OSv. >> >> So as you can see* OSv is 10 (ten) times faster* on the same hardware. I >> am not sure if my re
Re: OSv runs on Docker's Hyperkit under 100ms
On Tue, Apr 10, 2018 at 10:29 PM, Waldek Kozaczuk wrote: > Last week I have been trying to hack OSv to run on hyperkit and finally I > managed to execute native hello world example with ROFS. > Excellent :-) > > Here is a timing on hyperkit/OSX (the bootchart does not work on hyperkit > due to not granular enough timer): > > OSv v0.24-516-gc872202 > Hello from C code > > *real 0m0.075s* > Impressive :-) > *user 0m0.012s * > *sys 0m0.058s* > > command to boot it (please note that I hacked the lzloader ELF to support > multiboot): > What kind of hack is this? I notice in arch/x64/boot32.S we do try to support the multiboot format. Maybe the compression destroyed that? If so, maybe this should be considered a bug? I never tried to do anything with multiboot myself. Avi added this code to boot32.s in the very first days of OSv, in December 2012! (commit bf2c6bae2) > hyperkit -A -m 512M \ > -s 0:0,hostbridge \ > -s 31,lpc \ > -l com1,stdio \ > -s 4,virtio-blk,test.img \ > -f multiboot,lzloader.elf > > Here is a timing on QEMU/KVM on Linux (same hardware - my laptop is setup > to triple-boot Ubuntu 16/Mac OSX and Windows): > > OSv v0.24-510-g451dc6d > 4 CPUs detected > Firmware vendor: SeaBIOS > bsd: initializing - done > VFS: mounting ramfs at / > VFS: mounting devfs at /dev > net: initializing - done > vga: Add VGA device instance > virtio-blk: Add blk device instances 0 as vblk0, devsize=8520192 > random: intel drng, rdrand registered as a source. > random: initialized > VFS: unmounting /dev > VFS: mounting rofs at /rofs > VFS: mounting devfs at /dev > VFS: mounting procfs at /proc > VFS: mounting ramfs at /tmp > disk read (real mode): 28.31ms, (+28.31ms) > uncompress lzloader.elf: 49.63ms, (+21.32ms) > TLS initialization: 50.23ms, (+0.59ms) > .init functions: 52.22ms, (+1.99ms) > SMP launched: 53.01ms, (+0.79ms) > VFS initialized: 55.25ms, (+2.24ms) > Network initialized: 55.54ms, (+0.29ms) > pvpanic done: 55.66ms, (+0.12ms) > pci enumerated: 60.40ms, (+4.74ms) > drivers probe: 60.40ms, (+0.00ms) > drivers loaded: 126.37ms, (+65.97ms) > This one is a whopper. I wonder if it's some sort of qemu limitation making driver initialization so slow, or we're doing something slow in OSv. > ROFS mounted: 128.65ms, (+2.28ms) > Total time: 128.65ms, (+0.00ms) > Hello from C code > VFS: unmounting /dev > VFS: unmounting /proc > VFS: unmounting / > ROFS: spent 1.00 ms reading from disk > ROFS: read 21 512-byte blocks from disk > ROFS: allocated 18 512-byte blocks of cache memory > ROFS: hit ratio is 89.47% > Powering off. > > *real 0m1.049s* > So according to this, OSv took 128ms to boot, and there is about 900ms more overhead of some sort coming from qemu? > *user 0m0.173s* > *sys 0m0.253s* > > booted like so: > > qemu-system-x86_64 -m 2G -smp 4 \ > > -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0,scsi=off \ > > -drive > file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native > \ > > -enable-kvm -cpu host,+x2apic \ > > -chardev stdio,mux=on,id=stdio,signal=off \ > > -mon chardev=stdio,mode=readline > > -device isa-serial,chardev=stdio > > > In both cases I am not using networking - only block device. BTW I have > not tested how networking nor SMP on hyperkit with OSv. > > So as you can see* OSv is 10 (ten) times faster* on the same hardware. I > am not sure if my results are representative. But if they are it would mean > that QEMU is probably the culprit. Please see my questions/consideration > toward the end of the email. > > Anyway let me give you some background. What is hyperkit? Hyperkit ( > https://github.com/moby/hyperkit) is a fork by Docker of xhyve ( > https://github.com/mist64/xhyve) which itself is a port of bhyve ( > https://www.freebsd.org/doc/handbook/virtualization-host-bhyve.html) - > hypervisor on FreeBSD. Bhyve architecture is similar to that of KVM/QEMU > but QEMU-equivalent of bhyve is much lighter and simpler: > > "The bhyve BSD-licensed hypervisor became part of the base system with > FreeBSD 10.0-RELEASE. This hypervisor supports a number of guests, > including FreeBSD, OpenBSD, and many Linux® distributions. By default, > bhyve provides access to serial console and does not emulate a graphical > console. Virtualization offload features of newer CPUs are used to avoid > the legacy methods of translating instructions and manually managing memory > mappings. > > The bhyve design requires a processor that supports Intel® Extended Page > Tables (EPT) or AMD® Rapid Virtualization Indexing (RVI) or Nested Page > Tables (NPT). Hosting Linux® guests or FreeBSD guests with more than one > vCPU requires VMX unrestricted mode support (UG). Most newer processors, > specifically the Intel® Core™ i3/i5/i7 and Intel® Xeon™ E3/E5/E7, support > these features. UG support was introduced with Intel's Westmere > micro-architecture. For a complete list of Intel® processors that support > EPT, refer to http://ark.intel.com/search/advanced?s=t&
OSv runs on Docker's Hyperkit under 100ms
Last week I have been trying to hack OSv to run on hyperkit and finally I managed to execute native hello world example with ROFS. Here is a timing on hyperkit/OSX (the bootchart does not work on hyperkit due to not granular enough timer): OSv v0.24-516-gc872202 Hello from C code *real 0m0.075s * *user 0m0.012s * *sys 0m0.058s* command to boot it (please note that I hacked the lzloader ELF to support multiboot): hyperkit -A -m 512M \ -s 0:0,hostbridge \ -s 31,lpc \ -l com1,stdio \ -s 4,virtio-blk,test.img \ -f multiboot,lzloader.elf Here is a timing on QEMU/KVM on Linux (same hardware - my laptop is setup to triple-boot Ubuntu 16/Mac OSX and Windows): OSv v0.24-510-g451dc6d 4 CPUs detected Firmware vendor: SeaBIOS bsd: initializing - done VFS: mounting ramfs at / VFS: mounting devfs at /dev net: initializing - done vga: Add VGA device instance virtio-blk: Add blk device instances 0 as vblk0, devsize=8520192 random: intel drng, rdrand registered as a source. random: initialized VFS: unmounting /dev VFS: mounting rofs at /rofs VFS: mounting devfs at /dev VFS: mounting procfs at /proc VFS: mounting ramfs at /tmp disk read (real mode): 28.31ms, (+28.31ms) uncompress lzloader.elf: 49.63ms, (+21.32ms) TLS initialization: 50.23ms, (+0.59ms) .init functions: 52.22ms, (+1.99ms) SMP launched: 53.01ms, (+0.79ms) VFS initialized: 55.25ms, (+2.24ms) Network initialized: 55.54ms, (+0.29ms) pvpanic done: 55.66ms, (+0.12ms) pci enumerated: 60.40ms, (+4.74ms) drivers probe: 60.40ms, (+0.00ms) drivers loaded: 126.37ms, (+65.97ms) ROFS mounted: 128.65ms, (+2.28ms) Total time: 128.65ms, (+0.00ms) Hello from C code VFS: unmounting /dev VFS: unmounting /proc VFS: unmounting / ROFS: spent 1.00 ms reading from disk ROFS: read 21 512-byte blocks from disk ROFS: allocated 18 512-byte blocks of cache memory ROFS: hit ratio is 89.47% Powering off. *real 0m1.049s* *user 0m0.173s* *sys 0m0.253s* booted like so: qemu-system-x86_64 -m 2G -smp 4 \ -device virtio-blk-pci,id=blk0,bootindex=0,drive=hd0,scsi=off \ -drive file=/home/wkozaczuk/projects/osv/build/last/usr.img,if=none,id=hd0,cache=none,aio=native \ -enable-kvm -cpu host,+x2apic \ -chardev stdio,mux=on,id=stdio,signal=off \ -mon chardev=stdio,mode=readline -device isa-serial,chardev=stdio In both cases I am not using networking - only block device. BTW I have not tested how networking nor SMP on hyperkit with OSv. So as you can see* OSv is 10 (ten) times faster* on the same hardware. I am not sure if my results are representative. But if they are it would mean that QEMU is probably the culprit. Please see my questions/consideration toward the end of the email. Anyway let me give you some background. What is hyperkit? Hyperkit (https://github.com/moby/hyperkit) is a fork by Docker of xhyve (https://github.com/mist64/xhyve) which itself is a port of bhyve (https://www.freebsd.org/doc/handbook/virtualization-host-bhyve.html) - hypervisor on FreeBSD. Bhyve architecture is similar to that of KVM/QEMU but QEMU-equivalent of bhyve is much lighter and simpler: "The bhyve BSD-licensed hypervisor became part of the base system with FreeBSD 10.0-RELEASE. This hypervisor supports a number of guests, including FreeBSD, OpenBSD, and many Linux® distributions. By default, bhyve provides access to serial console and does not emulate a graphical console. Virtualization offload features of newer CPUs are used to avoid the legacy methods of translating instructions and manually managing memory mappings. The bhyve design requires a processor that supports Intel® Extended Page Tables (EPT) or AMD® Rapid Virtualization Indexing (RVI) or Nested Page Tables (NPT). Hosting Linux® guests or FreeBSD guests with more than one vCPU requires VMX unrestricted mode support (UG). Most newer processors, specifically the Intel® Core™ i3/i5/i7 and Intel® Xeon™ E3/E5/E7, support these features. UG support was introduced with Intel's Westmere micro-architecture. For a complete list of Intel® processors that support EPT, refer to http://ark.intel.com/search/advanced?s=t&ExtendedPageTables=true. RVI is found on the third generation and later of the AMD Opteron™ (Barcelona) processors" Hyperkit/Xhyve is a port of bhyve but targets Apple OSX as a host system and instead of FreeBSD vmm kernel module uses Apple hypervisor framework (https://developer.apple.com/documentation/hypervisor). Docker, I think, forked xhyve to create hyperkit in order to provide lighter alternative of running Docker containers on Linux on Mac. So in essence hyperkit is a component of Docker for Mac vs Docker Machine/Toolbox (based on VirtualBox). Please see for details there - https://docs.docker.com/docker-for-mac/docker-toolbox/. How does it apply to OSv? It only applies if you want to run OSv on Mac. Now the only choice is QEMU (dog slow because no KVM) or VirtualBox (pretty fast once OSv is up but it takes long time to boot and has other configuration quirks).