On Monday, February 11, 2019 at 3:54:39 AM UTC-5, Nadav Har'El wrote: > > On Sun, Feb 10, 2019 at 10:50 PM Waldek Kozaczuk <[email protected] > <javascript:>> wrote: > >> This patch is not meant to be applied but it serves to illustrate what is >> left to make OSv boot on firecracker. As you can see it is not a daunting >> amount of work anymore given modern virtio is supported now. >> >> In am not sure in what order we should implement the remaining work. >> Ideally every patch should be testable >> > > Indeed, although I don't mind that the firecracker-specific stuff you'll > be the only one testing it (since I'm not planning to run it any time soon). > > Maybe you can even add to run.py a firecracker hypersor options, so if > someone just installs this hypervisor he can use run.py? > It is definitely doable.
> > > >> which means that virtio-mmio support should be added last given it is not >> possible to test it on QEMU (this >> https://github.com/qemu/qemu/blob/595123df1d54ed8fbab9e1a73d5a58c5bb71058f/hw/virtio/virtio-mmio. >> >> <https://github.com/qemu/qemu/blob/595123df1d54ed8fbab9e1a73d5a58c5bb71058f/hw/virtio/virtio-mmio.c> >> > seems to suggest that QEMU is able to emulate virtio-mmio devices but I >> have not found any examples of using it on x86-64 platform, anyone has >> ideas?). Please see my extra comments/questions below. >> >> On Sunday, February 10, 2019 at 3:22:44 PM UTC-5, Waldek Kozaczuk wrote: >>> >>> - clean up modern/legacy virtio support (configurability) >>> >> This mostly is about fragments of code in virtio layer similar to this - >> https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-blk.cc#L177-L187 >> >> - where code needs to be tweaked to properly identify from feature flags >> what configuration data is available and only read this. >> >>> - add virtio-mmio support including parsing virtio-mmio devices from >>> command line >>> - enhance boot.S and OSv boot in general to support vmlinux format >>> (decompressed ELF 64-bit Linux kernel) >>> - pass e820 and command line parameters through zero-page >>> >> This is most non-trivial amount of work that requires changing OSv boot >> protocol so it better conforms to a Linux (for reference >> https://www.kernel.org/doc/Documentation/x86/boot.txt). It would be nice >> to support both 32-bit and 64-bit ELF direct boot LInux protocol. It is not >> clear to me what is the best way to support it especially if we still want >> to be able to boot OSv in real mode, protected and long mode at the same >> time and ideally minimize number of versions of boot artifacts (usr.img, >> lzloader.elf, loader.elf, etc). Maybe this deserves its own email thread. >> > Do you have any ideas around supporting 32-bit and 64-bit ELF direct boot LInux protocol? Ideally I would would want to avoid extra loader ELF version but single ELF cannot have 2 entry points - one for 32-bit and another for 64-bit, can it? > - make APCI optional and/or disable-able which includes: >>> - make pvpanic probe only if ACPI on >>> - make poweroff work when ACPI off >>> - parse SMP information from MP table instead of MADT if ACPI off >>> >> >> I think that all the 3 ACPI related items above can be part of a single >> patch. >> >>> - add option to turn PCI off (make OSv not even try to enumerate PCI) >>> - mark boot time end >> >> >>> Signed-off-by: Waldemar Kozaczuk <[email protected] <javascript:>> >>> --- >>> Makefile | 1 + >>> arch/x64/arch-setup.cc | 180 ++++++++++++++++++++++++++++++++++++----- >>> arch/x64/boot.S | 37 ++++++++- >>> arch/x64/loader.ld | 2 +- >>> arch/x64/power.cc | 14 +--- >>> arch/x64/smp.cc | 16 +++- >>> drivers/acpi.cc | 10 ++- >>> drivers/virtio-blk.cc | 8 +- >>> drivers/virtio-mmio.cc | 131 ++++++++++++++++++++++++++++++ >>> drivers/virtio-mmio.hh | 148 +++++++++++++++++++++++++++++++++ >>> drivers/virtio-net.cc | 7 ++ >>> drivers/virtio.cc | 7 +- >>> drivers/virtio.hh | 1 + >>> loader.cc | 5 +- >>> 14 files changed, 527 insertions(+), 40 deletions(-) >>> create mode 100644 drivers/virtio-mmio.cc >>> create mode 100644 drivers/virtio-mmio.hh >>> >>> diff --git a/Makefile b/Makefile >>> index ff9f2ed3..b905063c 100644 >>> --- a/Makefile >>> +++ b/Makefile >>> @@ -821,6 +821,7 @@ drivers += arch/$(arch)/pvclock-abi.o >>> drivers += drivers/virtio.o >>> drivers += drivers/virtio-pci-device.o >>> drivers += drivers/virtio-vring.o >>> +drivers += drivers/virtio-mmio.o >>> drivers += drivers/virtio-net.o >>> drivers += drivers/vmxnet3.o >>> drivers += drivers/vmxnet3-queues.o >>> diff --git a/arch/x64/arch-setup.cc b/arch/x64/arch-setup.cc >>> index 50500662..cf5adb15 100644 >>> --- a/arch/x64/arch-setup.cc >>> +++ b/arch/x64/arch-setup.cc >>> @@ -22,6 +22,19 @@ >>> #include <osv/commands.hh> >>> #include "dmi.hh" >>> >>> +// Not sure if Linux zero page is always located at this place >>> +// in memory or its address is passed in one of the registers >>> +// -> double check >>> +#define ZERO_PAGE_START 0x7000 >>> +#define SETUP_HEADER_OFFSET 0x1f1 // look at bootparam.h in linux >>> +#define BOOT_FLAG_OFFSET sizeof(u8) + 4 * sizeof(u16) + sizeof(u32) >>> + >>> +#define E820_ENTRIES_OFFSET 0x1e8 // look at bootparam.h in linux >>> +#define E820_TABLE_OFFSET 0x2d0 // look at bootparam.h in linux >>> + >>> +#define CMD_LINE_PTR_OFFSET sizeof(u8) * 5 + sizeof(u16) * 11 + >>> sizeof(u32) * 7 >>> +#define CMD_LINE_SIZE_OFFSET CMD_LINE_PTR_OFFSET + sizeof(u8) * 2 + >>> sizeof(u16) + sizeof(u32) * 3 >>> + >>> struct multiboot_info_type { >>> u32 flags; >>> u32 mem_lower; >>> @@ -61,12 +74,81 @@ struct e820ent { >>> u32 type; >>> } __attribute__((packed)); >>> >>> +struct _e820ent { >>> + u64 addr; >>> + u64 size; >>> + u32 type; >>> +} __attribute__((packed)); >>> + >>> osv_multiboot_info_type* osv_multiboot_info; >>> >>> -void parse_cmdline(multiboot_info_type& mb) >>> +struct mmio_device_info { >>> + u64 address; >>> + u64 size; >>> + unsigned int irq; >>> +}; >>> + >>> +//TODO: For now we are limiting number of mmio devices to two >>> +// Ideally we should be using somewhat more dynamic structure >>> +struct mmio_device_info mmio_device_info_entries[2]; >>> +int mmio_device_info_count = 0; >>> + >>> +#define VIRTIO_MMIO_DEVICE_CMDLINE_PREFIX "virtio_mmio.device=" >>> +char* parse_mmio_device_info(char *cmdline, mmio_device_info *info) { >>> + // [virtio_mmio.]device=<size>@<baseaddr>:<irq>[:<id>] >>> + char *prefix_pos = >>> strstr(cmdline,VIRTIO_MMIO_DEVICE_CMDLINE_PREFIX); >>> + if (!prefix_pos) >>> + return nullptr; >>> + >>> + char *size_pos = prefix_pos + >>> strlen(VIRTIO_MMIO_DEVICE_CMDLINE_PREFIX); >>> + if (sscanf(size_pos,"%ld", &info->size) != 1) >>> + return nullptr; >>> + >>> + char *at_pos = strstr(size_pos,"@"); >>> + if (!at_pos) >>> + return nullptr; >>> + >>> + switch(*(at_pos - 1)) { >>> + case 'k': >>> + case 'K': >>> + info->size = info->size * 1024; >>> + break; >>> + case 'm': >>> + case 'M': >>> + info->size = info->size * 1024 * 1024; >>> + break; >>> + default: >>> + break; >>> + } >>> + >>> + if (sscanf(at_pos, "@%lli:%u", &info->address, &info->irq) == 2) >>> + return prefix_pos; >>> + else >>> + return nullptr; >>> +} >>> + >>> +//void parse_cmdline(multiboot_info_type& mb) >>> +void parse_cmdline(char *cmdline) >>> { >>> - auto p = reinterpret_cast<char*>(mb.cmdline); >>> - osv::parse_cmdline(p); >>> + //auto p = reinterpret_cast<char*>(mb.cmdline); >>> + // We are assuming the mmio devices information is appended to the >>> + // command line (at least it is the case with the firecracker) so >>> + // once we parse those we strip it away so only plain OSv command >>> line >>> + // is left >>> + //TODO: There may be a smarter, better way to parse this >>> information >>> + char *virtio_device_info_pos = >>> parse_mmio_device_info(cmdline,mmio_device_info_entries); >>> + if (virtio_device_info_pos) { >>> + mmio_device_info_count++; >>> + *virtio_device_info_pos = 0; >>> + >>> + virtio_device_info_pos = >>> + parse_mmio_device_info(virtio_device_info_pos + >>> 1,mmio_device_info_entries + 1); >>> + if (virtio_device_info_pos) { >>> + mmio_device_info_count++; >>> + } >>> + } >>> + >>> + osv::parse_cmdline(cmdline); >>> } >>> >>> void setup_temporary_phys_map() >>> @@ -121,28 +203,64 @@ void arch_setup_free_memory() >>> { >>> static ulong edata; >>> asm ("movl $.edata, %0" : "=rm"(edata)); >>> - // copy to stack so we don't free it now >>> - auto omb = *osv_multiboot_info; >>> - auto mb = omb.mb; >>> - auto e820_buffer = alloca(mb.mmap_length); >>> - auto e820_size = mb.mmap_length; >>> - memcpy(e820_buffer, reinterpret_cast<void*>(mb.mmap_addr), >>> e820_size); >>> + >>> + void *zero_page = reinterpret_cast<void*>(ZERO_PAGE_START); >>> + void *setup_header = zero_page + SETUP_HEADER_OFFSET; >>> + >>> + // Grab command line from zero page >>> + u32 cmdline_ptr = *static_cast<u32*>(setup_header + >>> CMD_LINE_PTR_OFFSET); >>> + u32 cmdline_size = *static_cast<u32*>(setup_header + >>> CMD_LINE_SIZE_OFFSET); >>> + >>> + // Copy cmdline from zero page >>> + void* cmdline = reinterpret_cast<void*>((u64)cmdline_ptr); >>> + void *cmdline_copy = alloca(cmdline_size + 1); >>> + memcpy(cmdline_copy,cmdline,cmdline_size); >>> + ((char*)cmdline_copy)[cmdline_size] = 0; >>> + >>> + debug_early("Cmdline: "); >>> + debug_early((char*)cmdline_copy); >>> + debug_early("\n"); >>> + >>> + // Copy e820 information from zero page >>> + struct _e820ent *e820_table = static_cast<struct _e820ent >>> *>(zero_page + E820_TABLE_OFFSET); >>> + >>> + //TODO: We are assuming two entries but in reality >>> + //there could be more so this logic below needs to be a little >>> smarter >>> + auto e820_size = 48; >>> + auto e820_buffer = alloca(e820_size); >>> + { >>> + struct e820ent *lower = reinterpret_cast<struct >>> e820ent*>(e820_buffer); >>> + lower->ent_size = 20; >>> + lower->type = 1; >>> + lower->addr = e820_table[0].addr; >>> + lower->size = e820_table[0].size; >>> + >>> + struct e820ent *upper = lower + 1; >>> + upper->ent_size = 20; >>> + upper->type = 1; >>> + upper->addr = e820_table[1].addr; >>> + upper->size = e820_table[1].size; >>> + } >>> + >>> for_each_e820_entry(e820_buffer, e820_size, [] (e820ent ent) { >>> memory::phys_mem_size += ent.size; >>> }); >>> constexpr u64 initial_map = 1 << 30; // 1GB mapped by startup code >>> >>> - u64 time; >>> - time = omb.tsc_init_hi; >>> - time = (time << 32) | omb.tsc_init; >>> + //TODO: We are assuming that bootchart-wise we start here but >>> + // in reality it all starts at boot.S:start64_. However what >>> happens >>> + // before this points should take negligible amount of time, no? >>> + u64 time = 0; >>> + //time = omb.tsc_init_hi; >>> + //time = (time << 32) | omb.tsc_init; >>> boot_time.event(0, "", time ); >>> >>> - time = omb.tsc_disk_done_hi; >>> - time = (time << 32) | omb.tsc_disk_done; >>> + //time = omb.tsc_disk_done_hi; >>> + //time = (time << 32) | omb.tsc_disk_done; >>> boot_time.event(1, "disk read (real mode)", time ); >>> >>> - time = omb.tsc_uncompress_done_hi; >>> - time = (time << 32) | omb.tsc_uncompress_done; >>> + //time = omb.tsc_uncompress_done_hi; >>> + //time = (time << 32) | omb.tsc_uncompress_done; >>> boot_time.event(2, "uncompress lzloader.elf", time ); >>> >>> auto c = processor::cpuid(0x80000000); >>> @@ -185,7 +303,10 @@ void arch_setup_free_memory() >>> elf_size = edata - elf_phys; >>> mmu::linear_map(elf_start, elf_phys, elf_size, OSV_KERNEL_BASE); >>> // get rid of the command line, before low memory is unmapped >>> - parse_cmdline(mb); >>> + //parse_cmdline(mb); >>> + >>> + parse_cmdline((char*)cmdline_copy); >>> + >>> // now that we have some free memory, we can start mapping the rest >>> mmu::switch_to_runtime_page_tables(); >>> for_each_e820_entry(e820_buffer, e820_size, [] (e820ent ent) { >>> @@ -259,6 +380,7 @@ void arch_init_premain() >>> #include "drivers/virtio-scsi.hh" >>> #include "drivers/virtio-net.hh" >>> #include "drivers/virtio-rng.hh" >>> +#include "drivers/virtio-mmio.hh" >>> #include "drivers/xenplatform-pci.hh" >>> #include "drivers/ahci.hh" >>> #include "drivers/vmw-pvscsi.hh" >>> @@ -268,13 +390,33 @@ void arch_init_premain() >>> void arch_init_drivers() >>> { >>> // initialize panic drivers >>> - panic::pvpanic::probe_and_setup(); >>> + // pvpanic depends on ACPI which firecracker >>> + // does not suppot so we disable probing it altogether >>> + //TODO: Is there a way to detect if ACPI is available and >>> + //only then probe pvpanic? >>> + //panic::pvpanic::probe_and_setup(); >>> boot_time.event("pvpanic done"); >>> >>> // Enumerate PCI devices >>> - pci::pci_device_enumeration(); >>> + // PCI is not supported by firecracker >>> + //TODO: Is there a way to detect if PCI is present and only >>> enumerate >>> + //PCI devices then? Somehow even firecracker presents a bus with >>> + //some dummy devices. >>> + //pci::pci_device_enumeration(); >>> boot_time.event("pci enumerated"); >>> >>> + // Register any parsed virtio-mmio devices >>> + for (int d = 0; d < mmio_device_info_count; d++) { >>> + auto info = mmio_device_info_entries[d]; >>> + auto mmio_device = new virtio::mmio_device(info.address, >>> info.size, info.irq); >>> + if (mmio_device->parse_config()) { >>> + device_manager::instance()->register_device(mmio_device); >>> + } >>> + else { >>> + delete mmio_device; >>> + } >>> + } >>> + >>> // Initialize all drivers >>> hw::driver_manager* drvman = hw::driver_manager::instance(); >>> drvman->register_driver(virtio::blk::probe); >>> diff --git a/arch/x64/boot.S b/arch/x64/boot.S >>> index c4b97e2d..3bdc57ea 100644 >>> --- a/arch/x64/boot.S >>> +++ b/arch/x64/boot.S >>> @@ -97,9 +97,10 @@ start64: >>> sub %rdi, %rcx >>> xor %eax, %eax >>> rep stosb >>> + mov $0x200000, %rbp >>> mov %rbp, elf_header >>> # %ebx is set by boot16.S before running the loader >>> - mov %rbx, osv_multiboot_info >>> + //mov %rbx, osv_multiboot_info >>> lea init_stack_top, %rsp >>> call premain >>> mov __loader_argc, %edi >>> @@ -107,8 +108,42 @@ start64: >>> call main >>> .cfi_endproc >>> >>> +.code64 >>> +.global _start64 >>> +_start64: >>> +//TODO: Is there a way to switch to protected mode and then jump to >>> start32 >>> +//which would be even better than what we do below? >>> + //For whatever reason at this point (long mode?) we cannot >>> + //set gdt the way it is done in start32 but linux expects >>> + //similar one so whatever firecracker sets seems to be OK. >>> + //Maybe because OSv gdt has 32-bit entry >>> + //lgdt gdt_desc >>> + //mov $0x10, %eax >>> + //mov %eax, %ds >>> + //mov %eax, %es >>> + //mov %eax, %fs >>> + //mov %eax, %gs >>> + //mov %eax, %ss >>> + // Disable paging and enable PAE >>> + mov $BOOT_CR4, %eax >>> + mov %eax, %cr4 >>> + // Setup page tables >>> + lea ident_pt_l4, %eax >>> + mov %eax, %cr3 >>> + // Write contents of EDX:EAX to Model Specific Register specified >>> by ECX register >>> + mov $0xc0000080, %ecx >>> + mov $0x00000900, %eax >>> + xor %edx, %edx >>> + wrmsr >>> + // Enable paging >>> + mov $BOOT_CR0, %eax >>> + mov %eax, %cr0 >>> + jmp start64 >>> + >>> # The smp trampoline must be in the lower 1MB, so we manually relocate >>> # it to address 0 by subtracting smpboot from any offset >>> +//TODO: I am pretty sure we have some logic missing for SMP >>> +//given firecracker jumps us into long mode. Not sure how to handle it. >>> .data >>> .global smpboot >>> smpboot: >>> diff --git a/arch/x64/loader.ld b/arch/x64/loader.ld >>> index efe78d52..e08b310c 100644 >>> --- a/arch/x64/loader.ld >>> +++ b/arch/x64/loader.ld >>> @@ -108,4 +108,4 @@ PHDRS { >>> eh_frame PT_GNU_EH_FRAME; >>> note PT_NOTE; >>> } >>> -ENTRY(start32); >>> +ENTRY(_start64); >>> diff --git a/arch/x64/power.cc b/arch/x64/power.cc >>> index 81849335..534ffdf0 100644 >>> --- a/arch/x64/power.cc >>> +++ b/arch/x64/power.cc >>> @@ -27,17 +27,9 @@ void halt(void) >>> >>> void poweroff(void) >>> { >>> - ACPI_STATUS status = AcpiEnterSleepStatePrep(ACPI_STATE_S5); >>> - if (ACPI_FAILURE(status)) { >>> - debug("AcpiEnterSleepStatePrep failed: %s\n", >>> AcpiFormatException(status)); >>> - halt(); >>> - } >>> - status = AcpiEnterSleepState(ACPI_STATE_S5); >>> - if (ACPI_FAILURE(status)) { >>> - debug("AcpiEnterSleepState failed: %s\n", >>> AcpiFormatException(status)); >>> - halt(); >>> - } >>> - >>> + // Firecracker only supports this as away to shutdown the VM >>> + // Reset using the 8042 PS/2 Controller ("keyboard controller") >>> + processor::outb(0xfe, 0x64); >>> // We shouldn't get here on x86. >>> halt(); >>> } >>> diff --git a/arch/x64/smp.cc b/arch/x64/smp.cc >>> index 073ef206..4bdae0c6 100644 >>> --- a/arch/x64/smp.cc >>> +++ b/arch/x64/smp.cc >>> @@ -74,7 +74,21 @@ void parse_madt() >>> >>> void smp_init() >>> { >>> - parse_madt(); >>> + //Firecracker does not support ACPI so >>> + //there is no MADT table >>> + //parse_madt(); >>> + >>> + //TODO: This a nasty hack as we support >>> + // single vCPU. Eventually we should parse out equivalent >>> information >>> + // about all vCPUs from MP table (seems like more ancient way). >>> + // See >>> https://github.com/firecracker-microvm/firecracker/blob/7f29bca9ca197283275eab62fddc1c10ab580794/x86_64/src/mptable.rs >>> >>> + auto c = new sched::cpu(0); >>> + c->arch.apic_id = 0;//lapic->Id; >>> + c->arch.acpi_id = 0;//lapic->ProcessorId; >>> + c->arch.initstack.next = smp_stack_free; >>> + smp_stack_free = &c->arch.initstack; >>> + sched::cpus.push_back(c); >>> + >>> sched::current_cpu = sched::cpus[0]; >>> for (auto c : sched::cpus) { >>> c->incoming_wakeups = >>> aligned_array_new<sched::cpu::incoming_wakeup_queue>(sched::cpus.size()); >>> diff --git a/drivers/acpi.cc b/drivers/acpi.cc >>> index 506ca68f..006231f1 100644 >>> --- a/drivers/acpi.cc >>> +++ b/drivers/acpi.cc >>> @@ -605,7 +605,9 @@ void init() >>> >>> } >>> >>> -void __attribute__((constructor(init_prio::acpi))) acpi_init_early() >>> -{ >>> - XENPV_ALTERNATIVE({ acpi::early_init(); }, {}); >>> -} >>> +//TODO: Is there a way to detect if ACPI is available and do not even >>> +//try to load relevant information? Right now OSv depends on ACPI >>> available >>> +//void __attribute__((constructor(init_prio::acpi))) acpi_init_early() >>> +//{ >>> +// XENPV_ALTERNATIVE({ acpi::early_init(); }, {}); >>> +//} >>> diff --git a/drivers/virtio-blk.cc b/drivers/virtio-blk.cc >>> index 0bc6fb0f..fa4cba25 100644 >>> --- a/drivers/virtio-blk.cc >>> +++ b/drivers/virtio-blk.cc >>> @@ -115,7 +115,6 @@ bool blk::ack_irq() >>> blk::blk(virtio_device& virtio_dev) >>> : virtio_driver(virtio_dev), _ro(false) >>> { >>> - >>> _driver_name = "virtio-blk"; >>> _id = _instance++; >>> virtio_i("VIRTIO BLK INSTANCE %d", _id); >>> @@ -144,6 +143,13 @@ blk::blk(virtio_device& virtio_dev) >>> [=] { return this->ack_irq(); }, >>> [=] { t->wake(); }); >>> }; >>> + >>> + int_factory.create_gsi_edge_interrupt = [this,t]() { >>> + return new gsi_edge_interrupt( >>> + _dev.get_irq(), >>> + [=] { if (this->ack_irq()) t->wake(); }); >>> + }; >>> + >>> _dev.register_interrupt(int_factory); >>> >>> // Enable indirect descriptor >>> diff --git a/drivers/virtio-mmio.cc b/drivers/virtio-mmio.cc >>> new file mode 100644 >>> index 00000000..27d71848 >>> --- /dev/null >>> +++ b/drivers/virtio-mmio.cc >>> @@ -0,0 +1,131 @@ >>> +// >>> +// Created by wkozaczuk on 12/25/18. >>> +// >>> + >>> +#include <osv/debug.hh> >>> +#include "virtio-mmio.hh" >>> + >>> +namespace virtio { >>> + >>> +// This implements virtio-io mmio device (transport layer, modeled >>> after PSI). >>> +// Read here - >>> https://www.kraxel.org/virtio/virtio-v1.0-cs03-virtio-gpu.html#x1-1080002 >>> +hw_device_id mmio_device::get_id() >>> +{ >>> + return hw_device_id(0x1af4, _device_id); >>> +} >>> + >>> +u8 mmio_device::get_status() { >>> + return mmio_getl(_addr_mmio + VIRTIO_MMIO_STATUS) & 0xff; >>> +} >>> + >>> +void mmio_device::set_status(u8 status) { >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_STATUS, status); >>> +} >>> + >>> +u64 mmio_device::get_available_features() { >>> + u64 features; >>> + >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_DEVICE_FEATURES_SEL, 1); >>> + features = mmio_getl(_addr_mmio + VIRTIO_MMIO_DEVICE_FEATURES); >>> + features <<= 32; >>> + >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_DEVICE_FEATURES_SEL, 0); >>> + features |= mmio_getl(_addr_mmio + VIRTIO_MMIO_DEVICE_FEATURES); >>> + >>> + return features; >>> +} >>> + >>> +void mmio_device::set_enabled_features(u64 features) { >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_DRIVER_FEATURES_SEL, 1); >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_DRIVER_FEATURES, (u32)(features >>> >> 32)); >>> + >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_DRIVER_FEATURES_SEL, 0); >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_DRIVER_FEATURES, (u32)features); >>> +} >>> + >>> +void mmio_device::kick_queue(int queue_num) { >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_QUEUE_NOTIFY, queue_num); >>> +} >>> + >>> +void mmio_device::select_queue(int queue_num) { >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_QUEUE_SEL, queue_num); >>> + assert(!mmio_getl(_addr_mmio + VIRTIO_MMIO_QUEUE_READY)); >>> +} >>> + >>> +u16 mmio_device::get_queue_size() { >>> + return mmio_getl(_addr_mmio + VIRTIO_MMIO_QUEUE_NUM_MAX) & 0xffff; >>> +} >>> + >>> +void mmio_device::setup_queue(vring* queue) { >>> + // Set size >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_QUEUE_NUM, queue->size()); >>> + // >>> + // Pass addresses >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_QUEUE_DESC_LOW, >>> (u32)queue->get_desc_addr()); >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_QUEUE_DESC_HIGH, >>> (u32)(queue->get_desc_addr() >> 32)); >>> + >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_QUEUE_AVAIL_LOW, >>> (u32)queue->get_avail_addr()); >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_QUEUE_AVAIL_HIGH, >>> (u32)(queue->get_avail_addr() >> 32)); >>> + >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_QUEUE_USED_LOW, >>> (u32)queue->get_used_addr()); >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_QUEUE_USED_HIGH, >>> (u32)(queue->get_used_addr() >> 32)); >>> +} >>> + >>> +void mmio_device::activate_queue(int queue) >>> +{ >>> + // >>> + // Make it ready >>> + select_queue(queue); >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_QUEUE_READY, 1 ); >>> +} >>> + >>> +u8 mmio_device::read_and_ack_isr() { >>> + //TODO: we might need to guard this read and write with a mutex >>> + //to prevent two concurrent interrupts raised against same device >>> step >>> + //on each other. Is it possible? Spec does not seem to say anything >>> about it. >>> + unsigned long status = mmio_getl(_addr_mmio + >>> VIRTIO_MMIO_INTERRUPT_STATUS); >>> + //Sometimes this assert would be false (maybe of what I am talking >>> above). >>> + //assert(status & VIRTIO_MMIO_INT_VRING); >>> + mmio_setl(_addr_mmio + VIRTIO_MMIO_INTERRUPT_ACK, status); >>> + //return true; >>> + return (status & VIRTIO_MMIO_INT_VRING); >>> +} >>> + >>> +u8 mmio_device::read_config(u32 offset) { >>> + return mmio_getb(_addr_mmio + VIRTIO_MMIO_CONFIG + offset); >>> +} >>> + >>> +void mmio_device::register_interrupt(interrupt_factory irq_factory) >>> +{ >>> + _irq.reset(irq_factory.create_gsi_edge_interrupt()); >>> +} >>> + >>> +bool mmio_device::parse_config() { >>> + _addr_mmio = mmio_map(_address, _size); >>> + >>> + u32 magic = mmio_getl(_addr_mmio + VIRTIO_MMIO_MAGIC_VALUE); >>> + if (magic != ('v' | 'i' << 8 | 'r' << 16 | 't' << 24)) { >>> + return false; >>> + } >>> + >>> + // Check device version >>> + u32 version = mmio_getl(_addr_mmio + VIRTIO_MMIO_VERSION); >>> + if (version < 1 || version > 2) { >>> + debugf( "Version %ld not supported!\n", version); >>> + return false; >>> + } >>> + >>> + _device_id = mmio_getl(_addr_mmio + VIRTIO_MMIO_DEVICE_ID); >>> + if (_device_id == 0) { >>> + // >>> + // virtio-mmio device with an ID 0 is a (dummy) placeholder >>> + // with no function. End probing now with no error reported. >>> + debug( "Dummy virtio-mmio device detected!\n"); >>> + return false; >>> + } >>> + _vendor_id = mmio_getl(_addr_mmio + VIRTIO_MMIO_VENDOR_ID); >>> + >>> + debugf("Detected virtio-mmio device: (%ld,%ld)\n", _device_id, >>> _vendor_id); >>> + return true; >>> +} >>> +} >>> diff --git a/drivers/virtio-mmio.hh b/drivers/virtio-mmio.hh >>> new file mode 100644 >>> index 00000000..25c88f6a >>> --- /dev/null >>> +++ b/drivers/virtio-mmio.hh >>> @@ -0,0 +1,148 @@ >>> +// >>> +// Created by wkozaczuk on 12/25/18. >>> +// >>> + >>> +#ifndef VIRTIO_MMIO_DEVICE_HH >>> +#define VIRTIO_MMIO_DEVICE_HH >>> + >>> +#include <osv/types.h> >>> +#include <osv/mmio.hh> >>> +#include "virtio-device.hh" >>> +#include "virtio-vring.hh" >>> + >>> +using namespace hw; >>> + >>> +/* Magic value ("virt" string) - Read Only */ >>> +#define VIRTIO_MMIO_MAGIC_VALUE 0x000 >>> + >>> +/* Virtio device version - Read Only */ >>> +#define VIRTIO_MMIO_VERSION 0x004 >>> + >>> +/* Virtio device ID - Read Only */ >>> +#define VIRTIO_MMIO_DEVICE_ID 0x008 >>> + >>> +/* Virtio vendor ID - Read Only */ >>> +#define VIRTIO_MMIO_VENDOR_ID 0x00c >>> + >>> +/* Bitmask of the features supported by the device (host) >>> + * (32 bits per set) - Read Only */ >>> +#define VIRTIO_MMIO_DEVICE_FEATURES 0x010 >>> + >>> +/* Device (host) features set selector - Write Only */ >>> +#define VIRTIO_MMIO_DEVICE_FEATURES_SEL 0x014 >>> + >>> +/* Bitmask of features activated by the driver (guest) >>> + * (32 bits per set) - Write Only */ >>> +#define VIRTIO_MMIO_DRIVER_FEATURES 0x020 >>> + >>> +/* Activated features set selector - Write Only */ >>> +#define VIRTIO_MMIO_DRIVER_FEATURES_SEL 0x024 >>> + >>> +/* Queue selector - Write Only */ >>> +#define VIRTIO_MMIO_QUEUE_SEL 0x030 >>> + >>> +/* Maximum size of the currently selected queue - Read Only */ >>> +#define VIRTIO_MMIO_QUEUE_NUM_MAX 0x034 >>> + >>> +/* Queue size for the currently selected queue - Write Only */ >>> +#define VIRTIO_MMIO_QUEUE_NUM 0x038 >>> + >>> +/* Ready bit for the currently selected queue - Read Write */ >>> +#define VIRTIO_MMIO_QUEUE_READY 0x044 >>> + >>> +/* Queue notifier - Write Only */ >>> +#define VIRTIO_MMIO_QUEUE_NOTIFY 0x050 >>> + >>> +/* Interrupt status - Read Only */ >>> +#define VIRTIO_MMIO_INTERRUPT_STATUS 0x060 >>> + >>> +/* Interrupt acknowledge - Write Only */ >>> +#define VIRTIO_MMIO_INTERRUPT_ACK 0x064 >>> + >>> +/* Device status register - Read Write */ >>> +#define VIRTIO_MMIO_STATUS 0x070 >>> + >>> +/* Selected queue's Descriptor Table address, 64 bits in two halves */ >>> +#define VIRTIO_MMIO_QUEUE_DESC_LOW 0x080 >>> +#define VIRTIO_MMIO_QUEUE_DESC_HIGH 0x084 >>> + >>> +/* Selected queue's Available Ring address, 64 bits in two halves */ >>> +#define VIRTIO_MMIO_QUEUE_AVAIL_LOW 0x090 >>> +#define VIRTIO_MMIO_QUEUE_AVAIL_HIGH 0x094 >>> + >>> +/* Selected queue's Used Ring address, 64 bits in two halves */ >>> +#define VIRTIO_MMIO_QUEUE_USED_LOW 0x0a0 >>> +#define VIRTIO_MMIO_QUEUE_USED_HIGH 0x0a4 >>> + >>> +/* Configuration atomicity value */ >>> +#define VIRTIO_MMIO_CONFIG_GENERATION 0x0fc >>> + >>> +/* The config space is defined by each driver as >>> + * the per-driver configuration space - Read Write */ >>> +#define VIRTIO_MMIO_CONFIG 0x100 >>> + >>> +#define VIRTIO_MMIO_INT_VRING (1 << 0) >>> +#define VIRTIO_MMIO_INT_CONFIG (1 << 1) >>> + >>> +namespace virtio { >>> + >>> +class mmio_device : public virtio_device { >>> +public: >>> + mmio_device(u64 address, u64 size, unsigned int irq) : >>> + _address(address), _size(size), _irq_no(irq), >>> + _vendor_id(0), _device_id(0), _addr_mmio(0) {} >>> + >>> + virtual ~mmio_device() {} >>> + >>> + virtual hw_device_id get_id(); >>> + >>> + virtual void init() {} >>> + virtual void print() {} >>> + virtual void reset() {} >>> + >>> + virtual unsigned get_irq() { return _irq_no; } >>> + virtual u8 read_and_ack_isr(); >>> + virtual void register_interrupt(interrupt_factory irq_factory); >>> + >>> + virtual void select_queue(int queue); >>> + virtual u16 get_queue_size(); >>> + virtual void setup_queue(vring *queue); >>> + virtual void activate_queue(int queue); >>> + virtual void kick_queue(int queue); >>> + >>> + virtual u64 get_available_features(); >>> + virtual bool get_available_feature_bit(int bit) { return false; } >>> //TODO: Should be implemented ? >>> + >>> + virtual void set_enabled_features(u64 features); >>> + virtual u64 get_enabled_features() { return 0; } //TODO: Should be >>> implemented - function calling it in virtio.cc is not used? >>> + virtual bool get_enabled_feature_bit(int bit) { return false; } >>> //TODO: Should be implemented - function calling it in virtio.cc is not >>> used? >>> + >>> + virtual u8 get_status(); >>> + virtual void set_status(u8 status); >>> + >>> + virtual u8 read_config(u32 offset); >>> + virtual void write_config(u32 offset, u8 byte) {} //TODO: Implement >>> - function >>> + // calling it (virtio_driver::virtio_conf_write) is not used by >>> anything? >>> + >>> + virtual void dump_config() {} //TODO: Implement it >>> + >>> + virtual bool is_modern() { return true; }; >>> + virtual size_t get_vring_alignment() { return 4096; >>> }//VIRTIO_PCI_VRING_ALIGN; }; >>> + >>> + bool parse_config(); >>> + >>> +private: >>> + u64 _address; >>> + u64 _size; >>> + unsigned int _irq_no; >>> + //u64 _id; >>> + u16 _vendor_id; >>> + u16 _device_id; >>> + >>> + mmioaddr_t _addr_mmio; >>> + std::unique_ptr<gsi_edge_interrupt> _irq; >>> +}; >>> + >>> +} >>> + >>> +#endif //VIRTIO_MMIO_DEVICE_HH >>> diff --git a/drivers/virtio-net.cc b/drivers/virtio-net.cc >>> index ef2dff66..3522a80d 100644 >>> --- a/drivers/virtio-net.cc >>> +++ b/drivers/virtio-net.cc >>> @@ -316,6 +316,13 @@ net::net(virtio_device& dev) >>> [=] { return this->ack_irq(); }, >>> [=] { poll_task->wake(); }); >>> }; >>> + >>> + int_factory.create_gsi_edge_interrupt = [this,poll_task]() { >>> + return new gsi_edge_interrupt( >>> + _dev.get_irq(), >>> + [=] { if (this->ack_irq()) poll_task->wake(); }); >>> + }; >>> + >>> _dev.register_interrupt(int_factory); >>> >>> fill_rx_ring(); >>> diff --git a/drivers/virtio.cc b/drivers/virtio.cc >>> index 5ef0f94d..9b778522 100644 >>> --- a/drivers/virtio.cc >>> +++ b/drivers/virtio.cc >>> @@ -180,11 +180,16 @@ bool virtio_driver::get_device_feature_bit(int >>> bit) >>> void virtio_driver::set_guest_features(u64 features) >>> { >>> _dev.set_enabled_features(features); >>> + _enabled_features = features; >>> } >>> >>> bool virtio_driver::get_guest_feature_bit(int bit) >>> { >>> - return _dev.get_enabled_feature_bit(bit); >>> + if (_dev.is_modern()) //TODO: SO far only mmio - will work with >>> modern pci? >>> + return (_enabled_features & (1 << bit)) != 0; >>> + else >>> + return _dev.get_enabled_feature_bit(bit); >>> + //return _dev.get_enabled_feature_bit(bit); >>> } >>> >>> u8 virtio_driver::get_dev_status() >>> diff --git a/drivers/virtio.hh b/drivers/virtio.hh >>> index ef915e1c..4b59a91a 100644 >>> --- a/drivers/virtio.hh >>> +++ b/drivers/virtio.hh >>> @@ -114,6 +114,7 @@ protected: >>> bool _cap_indirect_buf; >>> bool _cap_event_idx = false; >>> static int _disk_idx; >>> + u64 _enabled_features; >>> }; >>> >>> template <typename T, u16 ID> >>> diff --git a/loader.cc b/loader.cc >>> index c29bf4c4..530e3816 100644 >>> --- a/loader.cc >>> +++ b/loader.cc >>> @@ -55,6 +55,8 @@ >>> #include "drivers/null.hh" >>> >>> #include "libc/network/__dns.hh" >>> +#include "early-console.hh" >>> +#include <processor.hh> >>> >>> using namespace osv; >>> using namespace osv::clock::literals; >>> @@ -426,6 +428,7 @@ void* do_main_thread(void *_main_args) >>> } >>> >>> boot_time.event("Total time"); >>> + processor::outb(123, 0x3f0); >>> >>> if (opt_bootchart) { >>> boot_time.print_chart(); >>> @@ -545,7 +548,7 @@ void main_cont(int loader_argc, char** loader_argv) >>> memory::enable_debug_allocator(); >>> >>> #ifndef AARCH64_PORT_STUB >>> - acpi::init(); >>> + //acpi::init(); >>> #endif /* !AARCH64_PORT_STUB */ >>> >>> if (sched::cpus.size() > sched::max_cpus) { >>> -- >>> 2.19.1 >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "OSv Development" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
