Yes quite substantial. On firecracker ZFS needs at least 50-60 ms to initialize on my machine. Whereas RoFS images takes 1 millisecond - the smallest native example takes 5-6 ms to boot including RoFS mount and ~10ms in total to execute (10 ms includes that 5-6 ms of boot time).
Sent from my iPhone > On Sep 20, 2019, at 15:53, zhiting zhu <[email protected]> wrote: > > Is there any difference on boot time between zfs and rofs? > >> On Fri, Sep 20, 2019 at 2:45 PM Henrique Fingler <[email protected]> wrote: >> I'll check that out. >> >>> Instead of detecting what hypervisor we are dealing with, we should simply >>> act accordingly based on what features have been negotiated and agreed >> >> Yep, you're right. Five minutes after I hit Post I remembered what >> "negotiate" means. Whoops. >> >>> Also, I have noticed with my simple patch OSv ends up allocating 256 >>> buffers on Firecracker >> >> That's why I was trying to force the size of the recv queue to one. But >> this can be done in a smarter way in net::fill_rx_ring() as you said. I'll >> hack around and see what comes up. >> It also seems that Firecracker has the machinery to implement >> VIRTIO_NET_F_MRG_RXBUF, but I don't know how complicated it would be to >> finish it. I might check that out in a few weeks when I have some free time. >> >> Thanks for all the pointers! >> >> >>> On Friday, September 20, 2019 at 1:58:42 PM UTC-5, Waldek Kozaczuk wrote: >>> >>> >>>> On Friday, September 20, 2019 at 8:56:35 AM UTC-4, Waldek Kozaczuk wrote: >>>> See my answers below. >>>> >>>>> On Thursday, September 19, 2019 at 11:34:56 PM UTC-4, Henrique Fingler >>>>> wrote: >>>>> I agree that this is mostly a thing that should be done on Firecracker. >>>>> For now, if there's a way to detect the hypervisor we can switch that. >>>>> Personally I'm only using Firecracker so I'll leave this in. >>>> Instead of detecting what hypervisor we are dealing with, we should simply >>>> act accordingly based on what features have been negotiated and agreed >>>> between OSv (driver) and hypervisor (device). We should simply follow the >>>> VirtIo spec as it says here: >>>> "5.1.6.3.1 Driver Requirements: Setting Up Receive Buffers >>>> If VIRTIO_NET_F_MRG_RXBUF is not negotiated: >>>> If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or >>>> VIRTIO_NET_F_GUEST_UFO are negotiated, the driver SHOULD populate the >>>> receive queue(s) with buffers of at least 65562 bytes. >>>> Otherwise, the driver SHOULD populate the receive queue(s) with buffers of >>>> at least 1526 bytes. >>>> If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer MUST be at greater >>>> than the size of the struct virtio_net_hdr." >>>> Something similar to what Linux does here - >>>> https://github.com/torvalds/linux/blob/0445971000375859008414f87e7c72fa0d809cf8/drivers/net/virtio_net.c#L3075-L3080. >>>> So only use 17 pages long buffers when we have to. One outstanding >>>> question is this - shall we allocate and use a single contiguous block of >>>> 17 pages of memory as a single slot in the vring or chain of 17 single >>>> page ones like for single large buffer? (the latter is what Linux seems to >>>> be doing). The slight advantage of chained one is that it will be easier >>>> to find 17 pages of memory than 68K contiguous one under pressure. But >>>> handling chained buffer is going to be more complicated. I think memory >>>> waste is the same. >>>> >>>> Pekka, Nadav, >>>> What do you think we should do? >>>> >>>> >>>>> >>>>> I wrote pretty much the same code but instead of malloc I used >>>>> memory::alloc_hugepage( but it got stuck at compilation when qemu was >>>>> started, do you happen to know the reason? I thought we also had to force >>>>> the length of the receiving queue to one, maybe that part was the one >>>>> breaking osv under qemu. >>>> >>>> Most likely you build the image with ZFS filesystem which at least now >>>> requires OSv to boot so that files can be uploaded to. You can avoid it by >>>> using Read-Only FS (fs=rofs). Either way, we should use >>>> VIRTIO_NET_F_MRG_RXBUF if QEMU offers it (which happens right now) and you >>>> patch should not affect this. >>> >>> Here is a capstan doc that should somewhat explain all 3 filesystems OSv >>> offers - >>> https://github.com/cloudius-systems/capstan/blob/master/Documentation/OsvFilesystem.md. >>> >>> >>>> >>>> >>>> >>>>> And the size I was allocating was 17 pages because the spec says 65562, >>>>> which is 16 pages plus 26 bytes. >>>> You are right about 17 pages. >>>>> Did you also disable VIRTIO_NET_F_MRG_RXBUF in the feature mask or no, >>>>> since Firecracker just ignores it? >>>> Firecracker "ignores" it in an sense that it is part of how features are >>>> negotiated, >>>>> >>>>> I'll patch that in and test it out. >>>>> >>>>> Thanks! >>>>> >>>>> >>>>>> On Thursday, September 19, 2019 at 9:58:49 PM UTC-5, Waldek Kozaczuk >>>>>> wrote: >>>>>> This patch seems to do the job: >>>>>> >>>>>> diff --git a/drivers/virtio-net.cc b/drivers/virtio-net.cc >>>>>> index e78fb3af..fe5f1ae0 100644 >>>>>> --- a/drivers/virtio-net.cc >>>>>> +++ b/drivers/virtio-net.cc >>>>>> @@ -375,6 +375,8 @@ void net::read_config() >>>>>> net_i("Features: %s=%d,%s=%d", "Host TSO ECN", _host_tso_ecn, >>>>>> "CSUM", _csum); >>>>>> net_i("Features: %s=%d,%s=%d", "Guest_csum", _guest_csum, "guest >>>>>> tso4", _guest_tso4); >>>>>> net_i("Features: %s=%d", "host tso4", _host_tso4); >>>>>> + >>>>>> + printf("VIRTIO_NET_F_MRG_RXBUF: %d\n", _mergeable_bufs); >>>>>> } >>>>>> >>>>>> /** >>>>>> @@ -591,16 +593,19 @@ void net::fill_rx_ring() >>>>>> vring* vq = _rxq.vqueue; >>>>>> >>>>>> while (vq->avail_ring_not_empty()) { >>>>>> - auto page = memory::alloc_page(); >>>>>> + //auto page = memory::alloc_page(); >>>>>> + auto page = malloc(16 * memory::page_size); >>>>>> >>>>>> vq->init_sg(); >>>>>> - vq->add_in_sg(page, memory::page_size); >>>>>> + vq->add_in_sg(page, memory::page_size * 16); >>>>>> if (!vq->add_buf(page)) { >>>>>> - memory::free_page(page); >>>>>> + //memory::free_page(page); >>>>>> + free(page); >>>>>> break; >>>>>> } >>>>>> added++; >>>>>> } >>>>>> + printf("net: Allocated %d pages\n", added * 16); >>>>>> >>>>>> trace_virtio_net_fill_rx_ring_added(_ifn->if_index, added); >>>>>> >>>>>> >>>>>> But for sure it is just a hack. I am not sure if we should actually >>>>>> allocate 16 pages in one shot (which I am doing here) vs create single >>>>>> chained buffer made of 16 pages. Not sure how we should extract data if >>>>>> chained. >>>>>> >>>>>> I have also found this (based on a comment in firecracker code) - >>>>>> https://bugs.chromium.org/p/chromium/issues/detail?id=753630. As you can >>>>>> see VIRTIO_NET_F_MRG_RXBUF is much more memory efficient and flexible >>>>>> which is what QEMU implements. >>>>>> >>>>>> I am interested in what others think how we should handle this properly. >>>>>> >>>>>> Either way I think it would not hurt creating an issue against >>>>>> Firecracker to ask supporting VIRTIO_NET_F_MRG_RXBUF. >>>>>> >>>>>> Waldek >>>>>> >>>>>>> On Thursday, September 19, 2019 at 6:59:22 PM UTC-4, Henrique Fingler >>>>>>> wrote: >>>>>>> I'm trying to check if it works on qemu, but scripts/run and capstan >>>>>>> run set the network differently than Firecracker's script. >>>>>>> With the regular user networking (no "-n") it works. When I try >>>>>>> running it with with "-n -b br0" or just "-n" the execution hangs after >>>>>>> printing OSv version. >>>>>>> >>>>>>> I'm trying to manually hack the allocation of a single but larger size >>>>>>> for the receive queue and disabling VIRTIO_NET_F_MRG_RXBUF on the >>>>>>> driver just to check what Firecracker does, but it seems that during >>>>>>> compilation a qemu instance of the unikernel is launched. Is this a >>>>>>> test? Can this be disabled? >>>>>>> >>>>>>> Also, is there a way to find which hypervisor OSv is running on top >>>>>>> of? This would help switching between feature sets in virtio-net. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> >>>>>>>> On Thursday, September 19, 2019 at 11:02:53 AM UTC-5, Waldek Kozaczuk >>>>>>>> wrote: >>>>>>>> Most likely it is a bug on OSv side. It could be in the virtio-net >>>>>>>> features negotiation logic - >>>>>>>> https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-net.cc#L351-L378 >>>>>>>> or >>>>>>>> https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-net.cc#L283-L297. >>>>>>>> >>>>>>>> >>>>>>>> I also saw this comment in firecracker code - >>>>>>>> https://github.com/firecracker-microvm/firecracker/blob/master/devices/src/virtio/net.rs#L153-L154 >>>>>>>> - which seems to indicate that VIRTIO_NET_F_MRG_RXBUF is NOT >>>>>>>> supported by firecracker - >>>>>>>> https://github.com/firecracker-microvm/firecracker/blob/f123988affa8f25683a7c26f7a48dd76e839a796/devices/src/virtio/net.rs#L705-L711? >>>>>>>> >>>>>>>> This section of VirtIO spec would apply then: >>>>>>>> >>>>>>>> "5.1.6.3.1 Driver Requirements: Setting Up Receive Buffers >>>>>>>> If VIRTIO_NET_F_MRG_RXBUF is not negotiated: >>>>>>>> If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or >>>>>>>> VIRTIO_NET_F_GUEST_UFO are negotiated, the driver SHOULD populate the >>>>>>>> receive queue(s) with buffers of at least 65562 bytes. >>>>>>>> Otherwise, the driver SHOULD populate the receive queue(s) with >>>>>>>> buffers of at least 1526 bytes. >>>>>>>> If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer MUST be at >>>>>>>> greater than the size of the struct virtio_net_hdr." >>>>>>>> >>>>>>>> This makes me think that our receive buffers are only 1 page (4096 >>>>>>>> bytes) large so whenever Firecracker tries to send a buffer bigger >>>>>>>> than that OSv bounces. Think this OSv code applies - >>>>>>>> https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-net.cc#L587-L609. >>>>>>>> It seems the virtio ring buffers are alway 1 page big - see >>>>>>>> alloc_page call. >>>>>>>> >>>>>>>> So maybe on OSv side we need to allow for bigger buffers (64K) when >>>>>>>> VIRTIO_NET_F_MRG_RXBUF is off which would require changes to >>>>>>>> drivers/virtio-vring.cc. I wonder if on QEMU this feature is on and >>>>>>>> that is why we never see this issue of QEMU, do we? It would be nice >>>>>>>> to run same Python program in qemu and see if VIRTIO_NET_F_MRG_RXBUF >>>>>>>> is on or off. >>>>>>>> >>>>>>>> This is all my speculation and I might be off so maybe others can shed >>>>>>>> more light on it. >>>>>>>> >>>>>>>> Waldek >>>>>>>> >>>>>>>>> On Thursday, September 19, 2019 at 12:09:19 AM UTC-4, Henrique >>>>>>>>> Fingler wrote: >>>>>>>>> How do I go about disabling GSO? >>>>>>>>> I think I found how to disable TSO (diff below), but I can't find >>>>>>>>> where to disable GSO. Disabling just TSO didn't fix it. >>>>>>>>> >>>>>>>>> The loop where Firecracker gets stuck (fn rx_single_frame) tries to >>>>>>>>> write an entire frame (7318 bytes) and it notices it doesn't fit into >>>>>>>>> all the descriptors of the guest. >>>>>>>>> It seems that if it fails to write the entire frame, it marks >>>>>>>>> descriptors as used, but retries to deliver the whole frame again. >>>>>>>>> Maybe the OSv buffer isn't big enough and FC just loops forever? >>>>>>>>> >>>>>>>>> >>>>>>>>> virtio-net.cc: >>>>>>>>> >>>>>>>>> | (1 << VIRTIO_NET_F_STATUS) \ >>>>>>>>> | (1 << VIRTIO_NET_F_CSUM) \ >>>>>>>>> | (1 << VIRTIO_NET_F_GUEST_CSUM) \ >>>>>>>>> - | (1 << VIRTIO_NET_F_GUEST_TSO4) \ >>>>>>>>> + | (0 << VIRTIO_NET_F_GUEST_TSO4) \ >>>>>>>>> | (1 << VIRTIO_NET_F_HOST_ECN) \ >>>>>>>>> - | (1 << VIRTIO_NET_F_HOST_TSO4) \ >>>>>>>>> + | (0 << VIRTIO_NET_F_HOST_TSO4) \ >>>>>>>>> | (1 << VIRTIO_NET_F_GUEST_ECN) >>>>>>>>> | (1 << VIRTIO_NET_F_GUEST_UFO) >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Wednesday, September 18, 2019 at 8:23:21 PM UTC-5, Asias He wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Thu, Sep 19, 2019 at 7:06 AM Henrique Fingler >>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>> First of all, thank you for being active and helping out users! >>>>>>>>>>> >>>>>>>>>>> Here's my setup: I'm building a python3 image, with a script that >>>>>>>>>>> does >>>>>>>>>>> >>>>>>>>>>> response = urllib.request.urlopen("http://<a 1mb file>") >>>>>>>>>>> >>>>>>>>>>> The execution just hangs for a few seconds, then a storm of >>>>>>>>>>> warnings from Firecracker show up: >>>>>>>>>>> >>>>>>>>>>> <A lot of the same warning> >>>>>>>>>>> 2019-09-18T17:50:36.841517975 >>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257] Receiving >>>>>>>>>>> buffer is too small to hold frame of current size >>>>>>>>>>> 2019-09-18T17:50:36.841529410 >>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257] Receiving >>>>>>>>>>> buffer is too small to hold frame of current size >>>>>>>>>>> 2019-09-18T17:50:36.841569665 >>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257] Receiving >>>>>>>>>>> buffer is too small to hold frame of current size >>>>>>>>>>> 2019-09-18T17:50:36.841584097 >>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257] Receiving >>>>>>>>>>> buffer is too small to hold frame of current size >>>>>>>>>>> 2019-09-18T17:50:36.841656060 >>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257] Receiving >>>>>>>>>>> buffer is too small to hold frame of current size >>>>>>>>>>> >>>>>>>>>>> This is coming from here: >>>>>>>>>>> https://github.com/firecracker-microvm/firecracker/blob/master/devices/src/virtio/net.rs >>>>>>>>>>> >>>>>>>>>>> If the file is smaller, let's say 256B, it works fine >>>>>>>>>>> >>>>>>>>>>> Could this be a bug in the virtio implementation of OSv or is it a >>>>>>>>>>> Firecraker thing? >>>>>>>>>>> I'll start to investigate the issue. I'm asking because you might >>>>>>>>>>> have seen this problem. >>>>>>>>>> >>>>>>>>>> Try disable gso/tso in osv viriot-net driver. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>>>> Groups "OSv Development" group. >>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>>> send an email to [email protected]. >>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>> https://groups.google.com/d/msgid/osv-dev/965f0cad-d074-4b18-b998-ffe5777851a2%40googlegroups.com. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Asias >> >> -- >> You received this message because you are subscribed to the Google Groups >> "OSv Development" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/osv-dev/64ae1bcf-9506-4a52-8ca6-4b0921981f9f%40googlegroups.com. > > -- > You received this message because you are subscribed to a topic in the Google > Groups "OSv Development" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/osv-dev/InlSKnJAfMQ/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/osv-dev/CA%2B3q14xYSikYznw1iCkxtO0%2BRqmcrUirShVU-e1_Pwpp3Zd1yw%40mail.gmail.com. -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/56900AC8-548A-4B18-98C0-48A3D75DCFC1%40gmail.com.
