Hi, I have just applied this patch to master. It seems to be working based on number of workloads I have tested with new automated scripts I have just added as well.
On Mon, Oct 7, 2019 at 22:26 Waldek Kozaczuk <[email protected]> wrote: > I have just sent a proper patch addressing this issue. Feel free to apply > it and test it. > > > On Sunday, October 6, 2019 at 2:30:11 PM UTC-4, Henrique Fingler wrote: >> >> Not yet, I'll probably do it today and post the results on this thread. >> >> On Sunday, October 6, 2019 at 1:16:01 PM UTC-5, Waldek Kozaczuk wrote: >>> >>> Did you try my latest patch from Friday that also frees memory? >>> >>> On Sun, Oct 6, 2019 at 14:14 Henrique Fingler <[email protected]> wrote: >>> >>>> That makes sense. I did try using the hugepage alloc that I had before >>>> but it still crashed. Thank you for the response! >>>> I'll try it out with *alloc_phys_contiguous_aligned.* >>>> >>>> >>>> On Friday, October 4, 2019 at 2:28:01 PM UTC-5, Waldek Kozaczuk wrote: >>>>> >>>>> Hi, >>>>> >>>>> So there were a couple of issues with my and your patch: >>>>> 1) We should NOT be using straight *malloc*() to allocate 17 pages of >>>>> memory. It needs to be page-aligned, so *aligned_alloc()* is the >>>>> correct choice. >>>>> 2) I missed recognizing that we also need to *free()* instead of >>>>> *free_page()* in the right place. >>>>> >>>>> Please see improved patch - better hack ;-). I still think we might be >>>>> allocating 17-too many 17-pages large buffers. But at least we recognize >>>>> VIRTIO_NET_F_MRG_RXBUF is off/on and choose correct page size accordingly >>>>> (more/less). >>>>> >>>>> diff --git a/drivers/virtio-net.cc b/drivers/virtio-net.cc >>>>> index e78fb3af..0df45dce 100644 >>>>> --- a/drivers/virtio-net.cc >>>>> +++ b/drivers/virtio-net.cc >>>>> @@ -63,6 +63,7 @@ extern int maxnic; >>>>> namespace virtio { >>>>> >>>>> int net::_instance = 0; >>>>> +bool net::use_large_buffer = false; >>>>> >>>>> #define net_tag "virtio-net" >>>>> #define net_d(...) tprintf_d(net_tag, __VA_ARGS__) >>>>> @@ -375,6 +376,9 @@ void net::read_config() >>>>> net_i("Features: %s=%d,%s=%d", "Host TSO ECN", _host_tso_ecn, >>>>> "CSUM", _csum); >>>>> net_i("Features: %s=%d,%s=%d", "Guest_csum", _guest_csum, "guest >>>>> tso4", _guest_tso4); >>>>> net_i("Features: %s=%d", "host tso4", _host_tso4); >>>>> + >>>>> + printf("VIRTIO_NET_F_MRG_RXBUF: %d\n", _mergeable_bufs); >>>>> + use_large_buffer = !_mergeable_bufs; >>>>> } >>>>> >>>>> /** >>>>> @@ -473,7 +477,10 @@ void net::receiver() >>>>> // Bad packet/buffer - discard and continue to the next >>>>> one >>>>> if (len < _hdr_size + ETHER_HDR_LEN) { >>>>> rx_drops++; >>>>> - memory::free_page(page); >>>>> + if (use_large_buffer) >>>>> + free(page); >>>>> + else >>>>> + memory::free_page(page); >>>>> >>>>> continue; >>>>> } >>>>> @@ -581,7 +588,13 @@ void net::free_buffer_and_refcnt(void* buffer, >>>>> void* refcnt) >>>>> void net::do_free_buffer(void* buffer) >>>>> { >>>>> buffer = align_down(buffer, page_size); >>>>> - memory::free_page(buffer); >>>>> + if (use_large_buffer) { >>>>> + printf("--> Freeing 17 pages: %p\n", buffer); >>>>> + free(buffer); >>>>> + } else { >>>>> + printf("--> Freeing single page: %p\n", buffer); >>>>> + memory::free_page(buffer); >>>>> + } >>>>> } >>>>> >>>>> void net::fill_rx_ring() >>>>> @@ -591,12 +604,23 @@ void net::fill_rx_ring() >>>>> vring* vq = _rxq.vqueue; >>>>> >>>>> while (vq->avail_ring_not_empty()) { >>>>> - auto page = memory::alloc_page(); >>>>> + void *page; >>>>> + int pages_num = use_large_buffer ? 17 : 1; >>>>> + if (use_large_buffer) { >>>>> + page = aligned_alloc(memory::page_size, pages_num * >>>>> memory::page_size); >>>>> + printf("--> Allocated 17 pages: %p\n", page); >>>>> + } else { >>>>> + page = memory::alloc_page(); >>>>> + printf("--> Allocated single page: %p\n", page); >>>>> + } >>>>> >>>>> vq->init_sg(); >>>>> - vq->add_in_sg(page, memory::page_size); >>>>> + vq->add_in_sg(page, pages_num * memory::page_size); >>>>> if (!vq->add_buf(page)) { >>>>> - memory::free_page(page); >>>>> + if (use_large_buffer) >>>>> + free(page); >>>>> + else >>>>> + memory::free_page(page); >>>>> break; >>>>> } >>>>> added++; >>>>> diff --git a/drivers/virtio-net.hh b/drivers/virtio-net.hh >>>>> index adc93b39..e6725231 100644 >>>>> --- a/drivers/virtio-net.hh >>>>> +++ b/drivers/virtio-net.hh >>>>> @@ -220,6 +220,7 @@ public: >>>>> static void free_buffer_and_refcnt(void* buffer, void* refcnt); >>>>> static void free_buffer(iovec iov) { >>>>> do_free_buffer(iov.iov_base); } >>>>> static void do_free_buffer(void* buffer); >>>>> + static bool use_large_buffer; >>>>> >>>>> bool ack_irq(); >>>>> >>>>> >>>>> I have tested it with your python example downloading 100 times 5MB >>>>> large file and all seems to be working fine. Feel free to remove debug >>>>> statements :-) >>>>> >>>>> Waldek >>>>> >>>>> On Thursday, September 26, 2019 at 6:12:27 PM UTC-4, Henrique Fingler >>>>> wrote: >>>>>> >>>>>> Waldek, I'm getting a general protection fault when doing some HTTP >>>>>> requests from OSv, do you think it might be related to the hack to make >>>>>> it >>>>>> work on Firecracker? >>>>>> >>>>>> Here's the MWE, a few requests go through, then it faults. >>>>>> >>>>>> import urllib.request >>>>>> for i in range (10): >>>>>> response = urllib.request.urlopen("http://192.168.0.20:9999/1.bin") >>>>>> response.read() >>>>>> >>>>>> Here's the trace: >>>>>> >>>>>> [registers] >>>>>> RIP: 0x00000000403e9fd6 <memory::page_range_allocator::remove_huge( >>>>>> memory::page_range&)+38> >>>>>> RFL: 0x0000000000010202 CS: 0x0000000000000008 SS: >>>>>> 0x0000000000000010 >>>>>> RAX: 0x000000000000000d RBX: 0xffff800003f2f000 RCX: >>>>>> 0x6d314e7578313731 RDX: 0x6270415369447065 >>>>>> RSI: 0xffff800003f2f000 RDI: 0xffff800003f2f008 RBP: >>>>>> 0xffff800000074e20 R8: 0x00000000000000fc >>>>>> R9: 0xffff80000094d7e8 R10: 0x0000000000003f40 R11: >>>>>> 0x1144029210842110 R12: 0x0000000040911300 >>>>>> R13: 0xffff80000094d7e8 R14: 0x0000000000003f40 R15: >>>>>> 0x1144029210842110 RSP: 0xffff800000074e00 >>>>>> general protection fault >>>>>> >>>>>> >>>>>> [backtrace] >>>>>> 0x000000004039cabc <general_protection+140> >>>>>> 0x0000000040399fa2 <???+1077518242> >>>>>> 0x00000000403e4d66 <memory::page_range_allocator::free(memory:: >>>>>> page_range*)+166> >>>>>> 0x00000000403e4ecb <memory::page_pool::l2::free_batch(memory:: >>>>>> page_pool::page_batch&)+91> >>>>>> 0x00000000403e5118 <memory::page_pool::l2::unfill()+504> >>>>>> 0x00000000403e6776 <memory::page_pool::l2::fill_thread()+358> >>>>>> 0x00000000403ea7db <std::_Function_handler<void (), memory::page_pool >>>>>> ::l2::l2()::{lambda()#1}>::_M_invoke(std::_Any_data const&)+11> >>>>>> 0x00000000403f9746 <thread_main_c+38> >>>>>> 0x000000004039af62 <???+1077522274> >>>>>> >>>>>> >>>>>> The hack in virtio-net.cc is similar to yours: >>>>>> >>>>>> diff --git a/drivers/virtio-net.cc b/drivers/virtio-net.cc >>>>>> index e78fb3af..0065e8d7 100644 >>>>>> --- a/drivers/virtio-net.cc >>>>>> +++ b/drivers/virtio-net.cc >>>>>> @@ -590,13 +590,31 @@ void net::fill_rx_ring() >>>>>> int added = 0; >>>>>> vring* vq = _rxq.vqueue; >>>>>> >>>>>> +#define HACKQ 1 >>>>>> + >>>>>> while (vq->avail_ring_not_empty()) { >>>>>> - auto page = memory::alloc_page(); >>>>>> + >>>>>> + #if HACKQ >>>>>> + auto page = malloc(17 * memory::page_size); >>>>>> + #else >>>>>> + auto page = memory::alloc_page(); >>>>>> + #endif >>>>>> >>>>>> vq->init_sg(); >>>>>> - vq->add_in_sg(page, memory::page_size); >>>>>> + >>>>>> + #if HACKQ >>>>>> + vq->add_in_sg(page, 17 * memory::page_size); >>>>>> + #else >>>>>> + vq->add_in_sg(page, memory::page_size); >>>>>> + #endif >>>>>> + >>>>>> if (!vq->add_buf(page)) { >>>>>> - memory::free_page(page); >>>>>> + #if HACKQ >>>>>> + free(page); >>>>>> + #else >>>>>> + memory::free_page(page); >>>>>> + #endif >>>>>> + >>>>>> break; >>>>>> } >>>>>> added++; >>>>>> >>>>>> >>>>>> Maybe it's related to the size of the buffer allocated for virtio? >>>>>> I'll try to force it to size one and see what happens. >>>>>> >>>>>> Best. >>>>>> >>>>>> >>>>>> On Friday, September 20, 2019 at 4:01:45 PM UTC-5, Waldek Kozaczuk >>>>>> wrote: >>>>>>> >>>>>>> Yes quite substantial. On firecracker ZFS needs at least 50-60 ms to >>>>>>> initialize on my machine. Whereas RoFS images takes 1 millisecond - the >>>>>>> smallest native example takes 5-6 ms to boot including RoFS mount and >>>>>>> ~10ms >>>>>>> in total to execute (10 ms includes that 5-6 ms of boot time). >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Sep 20, 2019, at 15:53, zhiting zhu <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> Is there any difference on boot time between zfs and rofs? >>>>>>> >>>>>>> On Fri, Sep 20, 2019 at 2:45 PM Henrique Fingler <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I'll check that out. >>>>>>>> >>>>>>>> Instead of detecting what hypervisor we are dealing with, we should >>>>>>>>> simply act accordingly based on what features have been negotiated and >>>>>>>>> agreed >>>>>>>> >>>>>>>> >>>>>>>> Yep, you're right. Five minutes after I hit Post I remembered what >>>>>>>> "negotiate" means. Whoops. >>>>>>>> >>>>>>>> Also, I have noticed with my simple patch OSv ends up allocating >>>>>>>>> 256 buffers on Firecracker >>>>>>>> >>>>>>>> >>>>>>>> That's why I was trying to force the size of the recv queue to >>>>>>>> one. But this can be done in a smarter way in net::fill_rx_ring() as >>>>>>>> you >>>>>>>> said. I'll hack around and see what comes up. >>>>>>>> It also seems that Firecracker has the machinery to implement >>>>>>>> VIRTIO_NET_F_MRG_RXBUF, >>>>>>>> but I don't know how complicated it would be to finish it. I might >>>>>>>> check >>>>>>>> that out in a few weeks when I have some free time. >>>>>>>> >>>>>>>> Thanks for all the pointers! >>>>>>>> >>>>>>>> >>>>>>>> On Friday, September 20, 2019 at 1:58:42 PM UTC-5, Waldek Kozaczuk >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Friday, September 20, 2019 at 8:56:35 AM UTC-4, Waldek Kozaczuk >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> See my answers below. >>>>>>>>>> >>>>>>>>>> On Thursday, September 19, 2019 at 11:34:56 PM UTC-4, Henrique >>>>>>>>>> Fingler wrote: >>>>>>>>>>> >>>>>>>>>>> I agree that this is mostly a thing that should be done on >>>>>>>>>>> Firecracker. For now, if there's a way to detect the hypervisor we >>>>>>>>>>> can >>>>>>>>>>> switch that. Personally I'm only using Firecracker so I'll leave >>>>>>>>>>> this in. >>>>>>>>>>> >>>>>>>>>> Instead of detecting what hypervisor we are dealing with, we >>>>>>>>>> should simply act accordingly based on what features have been >>>>>>>>>> negotiated >>>>>>>>>> and agreed between OSv (driver) and hypervisor (device). We should >>>>>>>>>> simply >>>>>>>>>> follow the VirtIo spec as it says here: >>>>>>>>>> "5.1.6.3.1 Driver Requirements: Setting Up Receive Buffers >>>>>>>>>> >>>>>>>>>> - If VIRTIO_NET_F_MRG_RXBUF is not negotiated: >>>>>>>>>> - If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or >>>>>>>>>> VIRTIO_NET_F_GUEST_UFO are negotiated, the driver SHOULD >>>>>>>>>> populate the >>>>>>>>>> receive queue(s) with buffers of at least 65562 bytes. >>>>>>>>>> - Otherwise, the driver SHOULD populate the receive >>>>>>>>>> queue(s) with buffers of at least 1526 bytes. >>>>>>>>>> - If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer MUST >>>>>>>>>> be at greater than the size of the struct virtio_net_hdr." >>>>>>>>>> >>>>>>>>>> Something similar to what Linux does here - >>>>>>>>>> https://github.com/torvalds/linux/blob/0445971000375859008414f87e7c72fa0d809cf8/drivers/net/virtio_net.c#L3075-L3080. >>>>>>>>>> So only use 17 pages long buffers when we have to. One outstanding >>>>>>>>>> question >>>>>>>>>> is this - shall we allocate and use a single contiguous block of 17 >>>>>>>>>> pages >>>>>>>>>> of memory as a* single slot *in the vring or *chain of 17 single >>>>>>>>>> page ones* like for single large buffer? (the latter is what >>>>>>>>>> Linux seems to be doing). The slight advantage of chained one is >>>>>>>>>> that it >>>>>>>>>> will be easier to find 17 pages of memory than 68K contiguous one >>>>>>>>>> under >>>>>>>>>> pressure. But handling chained buffer is going to be more >>>>>>>>>> complicated. I >>>>>>>>>> think memory waste is the same. >>>>>>>>>> >>>>>>>>>> Pekka, Nadav, >>>>>>>>>> What do you think we should do? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I wrote pretty much the same code but instead of *malloc *I >>>>>>>>>>> used *memory::alloc_hugepage( *but it got stuck at compilation >>>>>>>>>>> when qemu was started, do you happen to know the reason? I thought >>>>>>>>>>> we also >>>>>>>>>>> had to force the length of the receiving queue to one, maybe that >>>>>>>>>>> part was >>>>>>>>>>> the one breaking osv under qemu. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Most likely you build the image with ZFS filesystem which at >>>>>>>>>> least now requires OSv to boot so that files can be uploaded to. You >>>>>>>>>> can >>>>>>>>>> avoid it by using Read-Only FS (fs=rofs). Either way, we should use >>>>>>>>>> VIRTIO_NET_F_MRG_RXBUF >>>>>>>>>> if QEMU offers it (which happens right now) and you patch should not >>>>>>>>>> affect >>>>>>>>>> this. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Here is a capstan doc that should somewhat explain all 3 >>>>>>>>> filesystems OSv offers - >>>>>>>>> https://github.com/cloudius-systems/capstan/blob/master/Documentation/OsvFilesystem.md >>>>>>>>> . >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> And the size I was allocating was 17 pages because the spec says >>>>>>>>>>> 65562, which is 16 pages plus 26 bytes. >>>>>>>>>>> >>>>>>>>>> You are right about 17 pages. >>>>>>>>>> >>>>>>>>>>> Did you also disable VIRTIO_NET_F_MRG_RXBUF in the feature mask >>>>>>>>>>> or no, since Firecracker just ignores it? >>>>>>>>>>> >>>>>>>>>> Firecracker "ignores" it in an sense that it is part of how >>>>>>>>>> features are negotiated, >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'll patch that in and test it out. >>>>>>>>>>> >>>>>>>>>>> Thanks! >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thursday, September 19, 2019 at 9:58:49 PM UTC-5, Waldek >>>>>>>>>>> Kozaczuk wrote: >>>>>>>>>>>> >>>>>>>>>>>> This patch seems to do the job: >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/drivers/virtio-net.cc b/drivers/virtio-net.cc >>>>>>>>>>>> index e78fb3af..fe5f1ae0 100644 >>>>>>>>>>>> --- a/drivers/virtio-net.cc >>>>>>>>>>>> +++ b/drivers/virtio-net.cc >>>>>>>>>>>> @@ -375,6 +375,8 @@ void net::read_config() >>>>>>>>>>>> net_i("Features: %s=%d,%s=%d", "Host TSO ECN", >>>>>>>>>>>> _host_tso_ecn, "CSUM", _csum); >>>>>>>>>>>> net_i("Features: %s=%d,%s=%d", "Guest_csum", _guest_csum, >>>>>>>>>>>> "guest tso4", _guest_tso4); >>>>>>>>>>>> net_i("Features: %s=%d", "host tso4", _host_tso4); >>>>>>>>>>>> + >>>>>>>>>>>> + printf("VIRTIO_NET_F_MRG_RXBUF: %d\n", _mergeable_bufs); >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> /** >>>>>>>>>>>> @@ -591,16 +593,19 @@ void net::fill_rx_ring() >>>>>>>>>>>> vring* vq = _rxq.vqueue; >>>>>>>>>>>> >>>>>>>>>>>> while (vq->avail_ring_not_empty()) { >>>>>>>>>>>> - auto page = memory::alloc_page(); >>>>>>>>>>>> + //auto page = memory::alloc_page(); >>>>>>>>>>>> + auto page = malloc(16 * memory::page_size); >>>>>>>>>>>> >>>>>>>>>>>> vq->init_sg(); >>>>>>>>>>>> - vq->add_in_sg(page, memory::page_size); >>>>>>>>>>>> + vq->add_in_sg(page, memory::page_size * 16); >>>>>>>>>>>> if (!vq->add_buf(page)) { >>>>>>>>>>>> - memory::free_page(page); >>>>>>>>>>>> + //memory::free_page(page); >>>>>>>>>>>> + free(page); >>>>>>>>>>>> break; >>>>>>>>>>>> } >>>>>>>>>>>> added++; >>>>>>>>>>>> } >>>>>>>>>>>> + printf("net: Allocated %d pages\n", added * 16); >>>>>>>>>>>> >>>>>>>>>>>> trace_virtio_net_fill_rx_ring_added(_ifn->if_index, added); >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> But for sure it is just a hack. I am not sure if we should >>>>>>>>>>>> actually allocate 16 pages in one shot (which I am doing here) vs >>>>>>>>>>>> create >>>>>>>>>>>> single chained buffer made of 16 pages. Not sure how we should >>>>>>>>>>>> extract data >>>>>>>>>>>> if chained. >>>>>>>>>>>> >>>>>>>>>>>> I have also found this (based on a comment in firecracker code) >>>>>>>>>>>> - https://bugs.chromium.org/p/chromium/issues/detail?id=753630. >>>>>>>>>>>> As you can see VIRTIO_NET_F_MRG_RXBUF is much more memory >>>>>>>>>>>> efficient and >>>>>>>>>>>> flexible which is what QEMU implements. >>>>>>>>>>>> >>>>>>>>>>>> I am interested in what others think how we should handle this >>>>>>>>>>>> properly. >>>>>>>>>>>> >>>>>>>>>>>> Either way I think it would not hurt creating an issue against >>>>>>>>>>>> Firecracker to ask supporting VIRTIO_NET_F_MRG_RXBUF. >>>>>>>>>>>> >>>>>>>>>>>> Waldek >>>>>>>>>>>> >>>>>>>>>>>> On Thursday, September 19, 2019 at 6:59:22 PM UTC-4, Henrique >>>>>>>>>>>> Fingler wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I'm trying to check if it works on qemu, but scripts/run and >>>>>>>>>>>>> capstan run set the network differently than Firecracker's script. >>>>>>>>>>>>> With the regular user networking (no "-n") it works. When I >>>>>>>>>>>>> try running it with with "-n -b br0" or just "-n" the execution >>>>>>>>>>>>> hangs after >>>>>>>>>>>>> printing OSv version. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm trying to manually hack the allocation of a single but >>>>>>>>>>>>> larger size for the receive queue and disabling >>>>>>>>>>>>> VIRTIO_NET_F_MRG_RXBUF >>>>>>>>>>>>> on the driver just to check what Firecracker does, but it seems >>>>>>>>>>>>> that during >>>>>>>>>>>>> compilation a qemu instance of the unikernel is launched. Is this >>>>>>>>>>>>> a test? >>>>>>>>>>>>> Can this be disabled? >>>>>>>>>>>>> >>>>>>>>>>>>> Also, is there a way to find which hypervisor OSv is running >>>>>>>>>>>>> on top of? This would help switching between feature sets in >>>>>>>>>>>>> virtio-net. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks! >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thursday, September 19, 2019 at 11:02:53 AM UTC-5, Waldek >>>>>>>>>>>>> Kozaczuk wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Most likely it is a bug on OSv side. It could be in the >>>>>>>>>>>>>> virtio-net features negotiation logic - >>>>>>>>>>>>>> https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-net.cc#L351-L378 >>>>>>>>>>>>>> or >>>>>>>>>>>>>> https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-net.cc#L283-L297 >>>>>>>>>>>>>> . >>>>>>>>>>>>>> >>>>>>>>>>>>>> I also saw this comment in firecracker code - >>>>>>>>>>>>>> https://github.com/firecracker-microvm/firecracker/blob/master/devices/src/virtio/net.rs#L153-L154 >>>>>>>>>>>>>> - >>>>>>>>>>>>>> which seems to indicate that VIRTIO_NET_F_MRG_RXBUF is NOT >>>>>>>>>>>>>> supported by firecracker - >>>>>>>>>>>>>> https://github.com/firecracker-microvm/firecracker/blob/f123988affa8f25683a7c26f7a48dd76e839a796/devices/src/virtio/net.rs#L705-L711 >>>>>>>>>>>>>> ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> This section of VirtIO spec would apply then: >>>>>>>>>>>>>> >>>>>>>>>>>>>> "5.1.6.3.1 Driver Requirements: Setting Up Receive Buffers >>>>>>>>>>>>>> >>>>>>>>>>>>>> - If VIRTIO_NET_F_MRG_RXBUF is not negotiated: >>>>>>>>>>>>>> - If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 >>>>>>>>>>>>>> or VIRTIO_NET_F_GUEST_UFO are negotiated, the driver >>>>>>>>>>>>>> SHOULD populate the >>>>>>>>>>>>>> receive queue(s) with buffers of at least 65562 bytes. >>>>>>>>>>>>>> - Otherwise, the driver SHOULD populate the receive >>>>>>>>>>>>>> queue(s) with buffers of at least 1526 bytes. >>>>>>>>>>>>>> - If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer >>>>>>>>>>>>>> MUST be at greater than the size of the struct >>>>>>>>>>>>>> virtio_net_hdr." >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> This makes me think that our receive buffers are only 1 page >>>>>>>>>>>>>> (4096 bytes) large so whenever Firecracker tries to send a >>>>>>>>>>>>>> buffer bigger >>>>>>>>>>>>>> than that OSv bounces. Think this OSv code applies - >>>>>>>>>>>>>> https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-net.cc#L587-L609. >>>>>>>>>>>>>> It seems the virtio ring buffers are alway 1 page big - see >>>>>>>>>>>>>> alloc_page >>>>>>>>>>>>>> call. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So maybe on OSv side we need to allow for bigger buffers >>>>>>>>>>>>>> (64K) when VIRTIO_NET_F_MRG_RXBUF is off which would require >>>>>>>>>>>>>> changes to drivers/virtio-vring.cc. I wonder if on QEMU this >>>>>>>>>>>>>> feature is on and that is why we never see this issue of QEMU, >>>>>>>>>>>>>> do we? It >>>>>>>>>>>>>> would be nice to run same Python program in qemu and see if >>>>>>>>>>>>>> VIRTIO_NET_F_MRG_RXBUF >>>>>>>>>>>>>> is on or off. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is all my speculation and I might be off so maybe others >>>>>>>>>>>>>> can shed more light on it. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Waldek >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thursday, September 19, 2019 at 12:09:19 AM UTC-4, >>>>>>>>>>>>>> Henrique Fingler wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> How do I go about disabling GSO? >>>>>>>>>>>>>>> I think I found how to disable TSO (diff below), but I >>>>>>>>>>>>>>> can't find where to disable GSO. Disabling just TSO didn't fix >>>>>>>>>>>>>>> it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The loop where Firecracker gets stuck (fn rx_single_frame) >>>>>>>>>>>>>>> tries to write an entire frame (7318 bytes) and it notices it >>>>>>>>>>>>>>> doesn't fit >>>>>>>>>>>>>>> into all the descriptors of the guest. >>>>>>>>>>>>>>> It seems that if it fails to write the entire frame, it >>>>>>>>>>>>>>> marks descriptors as used, but retries to deliver the whole >>>>>>>>>>>>>>> frame again. >>>>>>>>>>>>>>> Maybe the OSv buffer isn't big enough and FC just loops forever? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> virtio-net.cc: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> | (1 << VIRTIO_NET_F_STATUS) \ >>>>>>>>>>>>>>> | (1 << VIRTIO_NET_F_CSUM) \ >>>>>>>>>>>>>>> | (1 << VIRTIO_NET_F_GUEST_CSUM) \ >>>>>>>>>>>>>>> - | (1 << VIRTIO_NET_F_GUEST_TSO4) \ >>>>>>>>>>>>>>> + | (0 << VIRTIO_NET_F_GUEST_TSO4) \ >>>>>>>>>>>>>>> | (1 << VIRTIO_NET_F_HOST_ECN) \ >>>>>>>>>>>>>>> - | (1 << VIRTIO_NET_F_HOST_TSO4) \ >>>>>>>>>>>>>>> + | (0 << VIRTIO_NET_F_HOST_TSO4) \ >>>>>>>>>>>>>>> | (1 << VIRTIO_NET_F_GUEST_ECN) >>>>>>>>>>>>>>> | (1 << VIRTIO_NET_F_GUEST_UFO) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wednesday, September 18, 2019 at 8:23:21 PM UTC-5, Asias >>>>>>>>>>>>>>> He wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Sep 19, 2019 at 7:06 AM Henrique Fingler < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> First of all, thank you for being active and helping out >>>>>>>>>>>>>>>>> users! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Here's my setup: I'm building a python3 image, with a >>>>>>>>>>>>>>>>> script that does >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> * response = urllib.request.urlopen("http://<a 1mb file>")* >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The execution just hangs for a few seconds, then a storm >>>>>>>>>>>>>>>>> of warnings from Firecracker show up: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> <A lot of the same warning> >>>>>>>>>>>>>>>>> 2019-09-18T17:50:36.841517975 >>>>>>>>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257] >>>>>>>>>>>>>>>>> Receiving buffer is too small to hold frame of current size >>>>>>>>>>>>>>>>> 2019-09-18T17:50:36.841529410 >>>>>>>>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257] >>>>>>>>>>>>>>>>> Receiving buffer is too small to hold frame of current size >>>>>>>>>>>>>>>>> 2019-09-18T17:50:36.841569665 >>>>>>>>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257] >>>>>>>>>>>>>>>>> Receiving buffer is too small to hold frame of current size >>>>>>>>>>>>>>>>> 2019-09-18T17:50:36.841584097 >>>>>>>>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257] >>>>>>>>>>>>>>>>> Receiving buffer is too small to hold frame of current size >>>>>>>>>>>>>>>>> 2019-09-18T17:50:36.841656060 >>>>>>>>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257] >>>>>>>>>>>>>>>>> Receiving buffer is too small to hold frame of current size >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This is coming from here: >>>>>>>>>>>>>>>>> https://github.com/firecracker-microvm/firecracker/blob/master/devices/src/virtio/net.rs >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If the file is smaller, let's say 256B, it works fine >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Could this be a bug in the virtio implementation of OSv >>>>>>>>>>>>>>>>> or is it a Firecraker thing? >>>>>>>>>>>>>>>>> I'll start to investigate the issue. I'm asking because >>>>>>>>>>>>>>>>> you might have seen this problem. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Try disable gso/tso in osv viriot-net driver. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>> You received this message because you are subscribed to >>>>>>>>>>>>>>>>> the Google Groups "OSv Development" group. >>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails >>>>>>>>>>>>>>>>> from it, send an email to [email protected]. >>>>>>>>>>>>>>>>> To view this discussion on the web visit >>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/osv-dev/965f0cad-d074-4b18-b998-ffe5777851a2%40googlegroups.com >>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/osv-dev/965f0cad-d074-4b18-b998-ffe5777851a2%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>>>>>>>>>> . >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Asias >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "OSv Development" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/osv-dev/64ae1bcf-9506-4a52-8ca6-4b0921981f9f%40googlegroups.com >>>>>>>> <https://groups.google.com/d/msgid/osv-dev/64ae1bcf-9506-4a52-8ca6-4b0921981f9f%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>> . >>>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to a topic in >>>>>>> the Google Groups "OSv Development" group. >>>>>>> To unsubscribe from this topic, visit >>>>>>> https://groups.google.com/d/topic/osv-dev/InlSKnJAfMQ/unsubscribe. >>>>>>> To unsubscribe from this group and all its topics, send an email to >>>>>>> [email protected]. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/osv-dev/CA%2B3q14xYSikYznw1iCkxtO0%2BRqmcrUirShVU-e1_Pwpp3Zd1yw%40mail.gmail.com >>>>>>> <https://groups.google.com/d/msgid/osv-dev/CA%2B3q14xYSikYznw1iCkxtO0%2BRqmcrUirShVU-e1_Pwpp3Zd1yw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>>> -- >>>> You received this message because you are subscribed to a topic in the >>>> Google Groups "OSv Development" group. >>>> To unsubscribe from this topic, visit >>>> https://groups.google.com/d/topic/osv-dev/InlSKnJAfMQ/unsubscribe. >>>> To unsubscribe from this group and all its topics, send an email to >>>> [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/osv-dev/ae21cb99-1106-4dc9-bae4-55be207ae989%40googlegroups.com >>>> <https://groups.google.com/d/msgid/osv-dev/ae21cb99-1106-4dc9-bae4-55be207ae989%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- > You received this message because you are subscribed to a topic in the > Google Groups "OSv Development" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/osv-dev/InlSKnJAfMQ/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/osv-dev/f2e53da2-cb48-4fd1-9d3d-5f0640386851%40googlegroups.com > <https://groups.google.com/d/msgid/osv-dev/f2e53da2-cb48-4fd1-9d3d-5f0640386851%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/CAL9cFfOkfC2MxzW7jC298%3Dv17zQn-vx8OiWmO8GdVDgtB_U27Q%40mail.gmail.com.
