Hi,

I have just applied this patch to master. It seems to be working based on
number of workloads I have tested with new automated scripts I have just
added as well.

On Mon, Oct 7, 2019 at 22:26 Waldek Kozaczuk <[email protected]> wrote:

> I have just sent a proper patch addressing this issue. Feel free to apply
> it and test it.
>
>
> On Sunday, October 6, 2019 at 2:30:11 PM UTC-4, Henrique Fingler wrote:
>>
>>  Not yet, I'll probably do it today and post the results on this thread.
>>
>> On Sunday, October 6, 2019 at 1:16:01 PM UTC-5, Waldek Kozaczuk wrote:
>>>
>>> Did you try my latest patch from Friday that also frees memory?
>>>
>>> On Sun, Oct 6, 2019 at 14:14 Henrique Fingler <[email protected]> wrote:
>>>
>>>>  That makes sense. I did try using the hugepage alloc that I had before
>>>> but it still crashed. Thank you for the response!
>>>>  I'll try it out with *alloc_phys_contiguous_aligned.*
>>>>
>>>>
>>>> On Friday, October 4, 2019 at 2:28:01 PM UTC-5, Waldek Kozaczuk wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> So there were a couple of issues with my and your patch:
>>>>> 1) We should NOT be using straight *malloc*() to allocate 17 pages of
>>>>> memory. It needs to be page-aligned, so *aligned_alloc()* is the
>>>>> correct choice.
>>>>> 2) I missed recognizing that we also need to *free()* instead of
>>>>> *free_page()* in the right place.
>>>>>
>>>>> Please see improved patch - better hack ;-). I still think we might be
>>>>> allocating 17-too many 17-pages large buffers. But at least we recognize
>>>>> VIRTIO_NET_F_MRG_RXBUF is off/on and choose correct page size accordingly
>>>>> (more/less).
>>>>>
>>>>> diff --git a/drivers/virtio-net.cc b/drivers/virtio-net.cc
>>>>> index e78fb3af..0df45dce 100644
>>>>> --- a/drivers/virtio-net.cc
>>>>> +++ b/drivers/virtio-net.cc
>>>>> @@ -63,6 +63,7 @@ extern int maxnic;
>>>>>  namespace virtio {
>>>>>
>>>>>  int net::_instance = 0;
>>>>> +bool net::use_large_buffer = false;
>>>>>
>>>>>  #define net_tag "virtio-net"
>>>>>  #define net_d(...)   tprintf_d(net_tag, __VA_ARGS__)
>>>>> @@ -375,6 +376,9 @@ void net::read_config()
>>>>>      net_i("Features: %s=%d,%s=%d", "Host TSO ECN", _host_tso_ecn,
>>>>> "CSUM", _csum);
>>>>>      net_i("Features: %s=%d,%s=%d", "Guest_csum", _guest_csum, "guest
>>>>> tso4", _guest_tso4);
>>>>>      net_i("Features: %s=%d", "host tso4", _host_tso4);
>>>>> +
>>>>> +    printf("VIRTIO_NET_F_MRG_RXBUF: %d\n", _mergeable_bufs);
>>>>> +    use_large_buffer = !_mergeable_bufs;
>>>>>  }
>>>>>
>>>>>  /**
>>>>> @@ -473,7 +477,10 @@ void net::receiver()
>>>>>              // Bad packet/buffer - discard and continue to the next
>>>>> one
>>>>>              if (len < _hdr_size + ETHER_HDR_LEN) {
>>>>>                  rx_drops++;
>>>>> -                memory::free_page(page);
>>>>> +                if (use_large_buffer)
>>>>> +                   free(page);
>>>>> +                else
>>>>> +                   memory::free_page(page);
>>>>>
>>>>>                  continue;
>>>>>              }
>>>>> @@ -581,7 +588,13 @@ void net::free_buffer_and_refcnt(void* buffer,
>>>>> void* refcnt)
>>>>>  void net::do_free_buffer(void* buffer)
>>>>>  {
>>>>>      buffer = align_down(buffer, page_size);
>>>>> -    memory::free_page(buffer);
>>>>> +    if (use_large_buffer) {
>>>>> +        printf("--> Freeing 17 pages: %p\n", buffer);
>>>>> +        free(buffer);
>>>>> +    } else {
>>>>> +        printf("--> Freeing single page: %p\n", buffer);
>>>>> +        memory::free_page(buffer);
>>>>> +    }
>>>>>  }
>>>>>
>>>>>  void net::fill_rx_ring()
>>>>> @@ -591,12 +604,23 @@ void net::fill_rx_ring()
>>>>>      vring* vq = _rxq.vqueue;
>>>>>
>>>>>      while (vq->avail_ring_not_empty()) {
>>>>> -        auto page = memory::alloc_page();
>>>>> +        void *page;
>>>>> +        int pages_num = use_large_buffer ? 17 : 1;
>>>>> +        if (use_large_buffer) {
>>>>> +            page = aligned_alloc(memory::page_size, pages_num *
>>>>> memory::page_size);
>>>>> +            printf("--> Allocated 17 pages: %p\n", page);
>>>>> +        } else {
>>>>> +            page = memory::alloc_page();
>>>>> +            printf("--> Allocated single page: %p\n", page);
>>>>> +        }
>>>>>
>>>>>          vq->init_sg();
>>>>> -        vq->add_in_sg(page, memory::page_size);
>>>>> +        vq->add_in_sg(page, pages_num * memory::page_size);
>>>>>          if (!vq->add_buf(page)) {
>>>>> -            memory::free_page(page);
>>>>> +            if (use_large_buffer)
>>>>> +               free(page);
>>>>> +            else
>>>>> +               memory::free_page(page);
>>>>>              break;
>>>>>          }
>>>>>          added++;
>>>>> diff --git a/drivers/virtio-net.hh b/drivers/virtio-net.hh
>>>>> index adc93b39..e6725231 100644
>>>>> --- a/drivers/virtio-net.hh
>>>>> +++ b/drivers/virtio-net.hh
>>>>> @@ -220,6 +220,7 @@ public:
>>>>>      static void free_buffer_and_refcnt(void* buffer, void* refcnt);
>>>>>      static void free_buffer(iovec iov) {
>>>>> do_free_buffer(iov.iov_base); }
>>>>>      static void do_free_buffer(void* buffer);
>>>>> +    static bool use_large_buffer;
>>>>>
>>>>>      bool ack_irq();
>>>>>
>>>>>
>>>>> I have tested it with your python example downloading 100 times 5MB
>>>>> large file and all seems to be working fine. Feel free to remove debug
>>>>> statements :-)
>>>>>
>>>>> Waldek
>>>>>
>>>>> On Thursday, September 26, 2019 at 6:12:27 PM UTC-4, Henrique Fingler
>>>>> wrote:
>>>>>>
>>>>>>  Waldek, I'm getting a general protection fault when doing some HTTP
>>>>>> requests from OSv, do you think it might be related to the hack to make 
>>>>>> it
>>>>>> work on Firecracker?
>>>>>>
>>>>>>  Here's the MWE, a few requests go through, then it faults.
>>>>>>
>>>>>> import urllib.request
>>>>>> for i in range (10):
>>>>>> response = urllib.request.urlopen("http://192.168.0.20:9999/1.bin";)
>>>>>> response.read()
>>>>>>
>>>>>>  Here's the trace:
>>>>>>
>>>>>> [registers]
>>>>>> RIP: 0x00000000403e9fd6 <memory::page_range_allocator::remove_huge(
>>>>>> memory::page_range&)+38>
>>>>>> RFL: 0x0000000000010202  CS:  0x0000000000000008  SS:
>>>>>> 0x0000000000000010
>>>>>> RAX: 0x000000000000000d  RBX: 0xffff800003f2f000  RCX:
>>>>>> 0x6d314e7578313731  RDX: 0x6270415369447065
>>>>>> RSI: 0xffff800003f2f000  RDI: 0xffff800003f2f008  RBP:
>>>>>> 0xffff800000074e20  R8:  0x00000000000000fc
>>>>>> R9:  0xffff80000094d7e8  R10: 0x0000000000003f40  R11:
>>>>>> 0x1144029210842110  R12: 0x0000000040911300
>>>>>> R13: 0xffff80000094d7e8  R14: 0x0000000000003f40  R15:
>>>>>> 0x1144029210842110  RSP: 0xffff800000074e00
>>>>>> general protection fault
>>>>>>
>>>>>>
>>>>>> [backtrace]
>>>>>> 0x000000004039cabc <general_protection+140>
>>>>>> 0x0000000040399fa2 <???+1077518242>
>>>>>> 0x00000000403e4d66 <memory::page_range_allocator::free(memory::
>>>>>> page_range*)+166>
>>>>>> 0x00000000403e4ecb <memory::page_pool::l2::free_batch(memory::
>>>>>> page_pool::page_batch&)+91>
>>>>>> 0x00000000403e5118 <memory::page_pool::l2::unfill()+504>
>>>>>> 0x00000000403e6776 <memory::page_pool::l2::fill_thread()+358>
>>>>>> 0x00000000403ea7db <std::_Function_handler<void (), memory::page_pool
>>>>>> ::l2::l2()::{lambda()#1}>::_M_invoke(std::_Any_data const&)+11>
>>>>>> 0x00000000403f9746 <thread_main_c+38>
>>>>>> 0x000000004039af62 <???+1077522274>
>>>>>>
>>>>>>
>>>>>>  The hack in virtio-net.cc is similar to yours:
>>>>>>
>>>>>> diff --git a/drivers/virtio-net.cc b/drivers/virtio-net.cc
>>>>>> index e78fb3af..0065e8d7 100644
>>>>>> --- a/drivers/virtio-net.cc
>>>>>> +++ b/drivers/virtio-net.cc
>>>>>> @@ -590,13 +590,31 @@ void net::fill_rx_ring()
>>>>>>      int added = 0;
>>>>>>      vring* vq = _rxq.vqueue;
>>>>>>
>>>>>> +#define HACKQ 1
>>>>>> +
>>>>>>      while (vq->avail_ring_not_empty()) {
>>>>>> -        auto page = memory::alloc_page();
>>>>>> +
>>>>>> +        #if HACKQ
>>>>>> +            auto page = malloc(17 * memory::page_size);
>>>>>> +        #else
>>>>>> +            auto page = memory::alloc_page();
>>>>>> +        #endif
>>>>>>
>>>>>>          vq->init_sg();
>>>>>> -        vq->add_in_sg(page, memory::page_size);
>>>>>> +
>>>>>> +        #if HACKQ
>>>>>> +            vq->add_in_sg(page, 17 * memory::page_size);
>>>>>> +        #else
>>>>>> +            vq->add_in_sg(page, memory::page_size);
>>>>>> +        #endif
>>>>>> +
>>>>>>          if (!vq->add_buf(page)) {
>>>>>> -            memory::free_page(page);
>>>>>> +            #if HACKQ
>>>>>> +                free(page);
>>>>>> +            #else
>>>>>> +                memory::free_page(page);
>>>>>> +            #endif
>>>>>> +
>>>>>>              break;
>>>>>>          }
>>>>>>          added++;
>>>>>>
>>>>>>
>>>>>>  Maybe it's related to the size of the buffer allocated for virtio?
>>>>>> I'll try to force it to size one and see what happens.
>>>>>>
>>>>>>  Best.
>>>>>>
>>>>>>
>>>>>> On Friday, September 20, 2019 at 4:01:45 PM UTC-5, Waldek Kozaczuk
>>>>>> wrote:
>>>>>>>
>>>>>>> Yes quite substantial. On firecracker ZFS needs at least 50-60 ms to
>>>>>>> initialize on my machine. Whereas RoFS images takes 1 millisecond - the
>>>>>>> smallest native example takes 5-6 ms to boot including RoFS mount and 
>>>>>>> ~10ms
>>>>>>> in total to execute (10 ms includes that 5-6 ms of boot time).
>>>>>>>
>>>>>>> Sent from my iPhone
>>>>>>>
>>>>>>> On Sep 20, 2019, at 15:53, zhiting zhu <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Is there any difference on boot time between zfs and rofs?
>>>>>>>
>>>>>>> On Fri, Sep 20, 2019 at 2:45 PM Henrique Fingler <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>  I'll check that out.
>>>>>>>>
>>>>>>>> Instead of detecting what hypervisor we are dealing with, we should
>>>>>>>>> simply act accordingly based on what features have been negotiated and
>>>>>>>>> agreed
>>>>>>>>
>>>>>>>>
>>>>>>>>  Yep, you're right. Five minutes after I hit Post I remembered what
>>>>>>>> "negotiate" means. Whoops.
>>>>>>>>
>>>>>>>> Also, I have noticed with my simple patch OSv ends up allocating
>>>>>>>>> 256 buffers on Firecracker
>>>>>>>>
>>>>>>>>
>>>>>>>>  That's why I was trying to force the size of the recv queue to
>>>>>>>> one. But this can be done in a smarter way in net::fill_rx_ring() as 
>>>>>>>> you
>>>>>>>> said. I'll hack around and see what comes up.
>>>>>>>>  It also seems that Firecracker has the machinery to implement 
>>>>>>>> VIRTIO_NET_F_MRG_RXBUF,
>>>>>>>> but I don't know how complicated it would be to finish it. I might 
>>>>>>>> check
>>>>>>>> that out in a few weeks when I have some free time.
>>>>>>>>
>>>>>>>>  Thanks for all the pointers!
>>>>>>>>
>>>>>>>>
>>>>>>>> On Friday, September 20, 2019 at 1:58:42 PM UTC-5, Waldek Kozaczuk
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Friday, September 20, 2019 at 8:56:35 AM UTC-4, Waldek Kozaczuk
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> See my answers below.
>>>>>>>>>>
>>>>>>>>>> On Thursday, September 19, 2019 at 11:34:56 PM UTC-4, Henrique
>>>>>>>>>> Fingler wrote:
>>>>>>>>>>>
>>>>>>>>>>>  I agree that this is mostly a thing that should be done on
>>>>>>>>>>> Firecracker. For now, if there's a way to detect the hypervisor we 
>>>>>>>>>>> can
>>>>>>>>>>> switch that. Personally I'm only using Firecracker so I'll leave 
>>>>>>>>>>> this in.
>>>>>>>>>>>
>>>>>>>>>> Instead of detecting what hypervisor we are dealing with, we
>>>>>>>>>> should simply act accordingly based on what features have been 
>>>>>>>>>> negotiated
>>>>>>>>>> and agreed between OSv (driver) and hypervisor (device). We should 
>>>>>>>>>> simply
>>>>>>>>>> follow the VirtIo spec as it says here:
>>>>>>>>>> "5.1.6.3.1 Driver Requirements: Setting Up Receive Buffers
>>>>>>>>>>
>>>>>>>>>>    - If VIRTIO_NET_F_MRG_RXBUF is not negotiated:
>>>>>>>>>>       - If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6 or
>>>>>>>>>>       VIRTIO_NET_F_GUEST_UFO are negotiated, the driver SHOULD 
>>>>>>>>>> populate the
>>>>>>>>>>       receive queue(s) with buffers of at least 65562 bytes.
>>>>>>>>>>       - Otherwise, the driver SHOULD populate the receive
>>>>>>>>>>       queue(s) with buffers of at least 1526 bytes.
>>>>>>>>>>    - If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer MUST
>>>>>>>>>>    be at greater than the size of the struct virtio_net_hdr."
>>>>>>>>>>
>>>>>>>>>> Something similar to what Linux does here -
>>>>>>>>>> https://github.com/torvalds/linux/blob/0445971000375859008414f87e7c72fa0d809cf8/drivers/net/virtio_net.c#L3075-L3080.
>>>>>>>>>> So only use 17 pages long buffers when we have to. One outstanding 
>>>>>>>>>> question
>>>>>>>>>> is this - shall we allocate and use a single contiguous block of 17 
>>>>>>>>>> pages
>>>>>>>>>> of memory as a* single slot *in the vring or *chain of 17 single
>>>>>>>>>> page ones* like for single large buffer? (the latter is what
>>>>>>>>>> Linux seems to be doing). The slight advantage of chained one is 
>>>>>>>>>> that it
>>>>>>>>>> will be easier to find 17 pages of memory than 68K contiguous one 
>>>>>>>>>> under
>>>>>>>>>> pressure. But handling chained buffer is going to be more 
>>>>>>>>>> complicated. I
>>>>>>>>>> think memory waste is the same.
>>>>>>>>>>
>>>>>>>>>> Pekka, Nadav,
>>>>>>>>>> What do you think we should do?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>  I wrote pretty much the same code but instead of *malloc *I
>>>>>>>>>>> used *memory::alloc_hugepage( *but it got stuck at compilation
>>>>>>>>>>> when qemu was started, do you happen to know the reason? I thought 
>>>>>>>>>>> we also
>>>>>>>>>>> had to force the length of the receiving queue to one, maybe that 
>>>>>>>>>>> part was
>>>>>>>>>>> the one breaking osv under qemu.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Most likely you build the image with ZFS filesystem which at
>>>>>>>>>> least now requires OSv to boot so that files can be uploaded to. You 
>>>>>>>>>> can
>>>>>>>>>> avoid it by using Read-Only FS (fs=rofs). Either way, we should use 
>>>>>>>>>> VIRTIO_NET_F_MRG_RXBUF
>>>>>>>>>> if QEMU offers it (which happens right now) and you patch should not 
>>>>>>>>>> affect
>>>>>>>>>> this.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here is a capstan doc that should somewhat explain all 3
>>>>>>>>> filesystems OSv offers -
>>>>>>>>> https://github.com/cloudius-systems/capstan/blob/master/Documentation/OsvFilesystem.md
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  And the size I was allocating was 17 pages because the spec says
>>>>>>>>>>> 65562, which is 16 pages plus 26 bytes.
>>>>>>>>>>>
>>>>>>>>>> You are right about 17 pages.
>>>>>>>>>>
>>>>>>>>>>>  Did you also disable VIRTIO_NET_F_MRG_RXBUF in the feature mask
>>>>>>>>>>> or no, since Firecracker just ignores it?
>>>>>>>>>>>
>>>>>>>>>> Firecracker "ignores" it in an sense that it is part of how
>>>>>>>>>> features are negotiated,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  I'll patch that in and test it out.
>>>>>>>>>>>
>>>>>>>>>>>  Thanks!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thursday, September 19, 2019 at 9:58:49 PM UTC-5, Waldek
>>>>>>>>>>> Kozaczuk wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> This patch seems to do the job:
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/virtio-net.cc b/drivers/virtio-net.cc
>>>>>>>>>>>> index e78fb3af..fe5f1ae0 100644
>>>>>>>>>>>> --- a/drivers/virtio-net.cc
>>>>>>>>>>>> +++ b/drivers/virtio-net.cc
>>>>>>>>>>>> @@ -375,6 +375,8 @@ void net::read_config()
>>>>>>>>>>>>      net_i("Features: %s=%d,%s=%d", "Host TSO ECN",
>>>>>>>>>>>> _host_tso_ecn, "CSUM", _csum);
>>>>>>>>>>>>      net_i("Features: %s=%d,%s=%d", "Guest_csum", _guest_csum,
>>>>>>>>>>>> "guest tso4", _guest_tso4);
>>>>>>>>>>>>      net_i("Features: %s=%d", "host tso4", _host_tso4);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    printf("VIRTIO_NET_F_MRG_RXBUF: %d\n", _mergeable_bufs);
>>>>>>>>>>>>  }
>>>>>>>>>>>>
>>>>>>>>>>>>  /**
>>>>>>>>>>>> @@ -591,16 +593,19 @@ void net::fill_rx_ring()
>>>>>>>>>>>>      vring* vq = _rxq.vqueue;
>>>>>>>>>>>>
>>>>>>>>>>>>      while (vq->avail_ring_not_empty()) {
>>>>>>>>>>>> -        auto page = memory::alloc_page();
>>>>>>>>>>>> +        //auto page = memory::alloc_page();
>>>>>>>>>>>> +        auto page = malloc(16 * memory::page_size);
>>>>>>>>>>>>
>>>>>>>>>>>>          vq->init_sg();
>>>>>>>>>>>> -        vq->add_in_sg(page, memory::page_size);
>>>>>>>>>>>> +        vq->add_in_sg(page, memory::page_size * 16);
>>>>>>>>>>>>          if (!vq->add_buf(page)) {
>>>>>>>>>>>> -            memory::free_page(page);
>>>>>>>>>>>> +            //memory::free_page(page);
>>>>>>>>>>>> +            free(page);
>>>>>>>>>>>>              break;
>>>>>>>>>>>>          }
>>>>>>>>>>>>          added++;
>>>>>>>>>>>>      }
>>>>>>>>>>>> +    printf("net: Allocated %d pages\n", added * 16);
>>>>>>>>>>>>
>>>>>>>>>>>>      trace_virtio_net_fill_rx_ring_added(_ifn->if_index, added);
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> But for sure it is just a hack. I am not sure if we should
>>>>>>>>>>>> actually allocate 16 pages in one shot (which I am doing here) vs 
>>>>>>>>>>>> create
>>>>>>>>>>>> single chained buffer made of 16 pages. Not sure how we should 
>>>>>>>>>>>> extract data
>>>>>>>>>>>> if chained.
>>>>>>>>>>>>
>>>>>>>>>>>> I have also found this (based on a comment in firecracker code)
>>>>>>>>>>>> - https://bugs.chromium.org/p/chromium/issues/detail?id=753630.
>>>>>>>>>>>> As you can see VIRTIO_NET_F_MRG_RXBUF is much more memory 
>>>>>>>>>>>> efficient and
>>>>>>>>>>>> flexible which is what QEMU implements.
>>>>>>>>>>>>
>>>>>>>>>>>> I am interested in what others think how we should handle this
>>>>>>>>>>>> properly.
>>>>>>>>>>>>
>>>>>>>>>>>> Either way I think it would not hurt creating an issue against
>>>>>>>>>>>> Firecracker to ask supporting VIRTIO_NET_F_MRG_RXBUF.
>>>>>>>>>>>>
>>>>>>>>>>>> Waldek
>>>>>>>>>>>>
>>>>>>>>>>>> On Thursday, September 19, 2019 at 6:59:22 PM UTC-4, Henrique
>>>>>>>>>>>> Fingler wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>  I'm trying to check if it works on qemu, but scripts/run and
>>>>>>>>>>>>> capstan run set the network differently than Firecracker's script.
>>>>>>>>>>>>>  With the regular user networking (no "-n") it works. When I
>>>>>>>>>>>>> try running it with with "-n -b br0" or just "-n" the execution 
>>>>>>>>>>>>> hangs after
>>>>>>>>>>>>> printing OSv version.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  I'm trying to manually hack the allocation of a single but
>>>>>>>>>>>>> larger size for the receive queue and disabling 
>>>>>>>>>>>>> VIRTIO_NET_F_MRG_RXBUF
>>>>>>>>>>>>> on the driver just to check what Firecracker does, but it seems 
>>>>>>>>>>>>> that during
>>>>>>>>>>>>> compilation a qemu instance of the unikernel is launched. Is this 
>>>>>>>>>>>>> a test?
>>>>>>>>>>>>> Can this be disabled?
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Also, is there a way to find which hypervisor OSv is running
>>>>>>>>>>>>> on top of? This would help switching between feature sets in 
>>>>>>>>>>>>> virtio-net.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Thanks!
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thursday, September 19, 2019 at 11:02:53 AM UTC-5, Waldek
>>>>>>>>>>>>> Kozaczuk wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Most likely it is a bug on OSv side. It could be in the
>>>>>>>>>>>>>> virtio-net features negotiation logic -
>>>>>>>>>>>>>> https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-net.cc#L351-L378
>>>>>>>>>>>>>>  or
>>>>>>>>>>>>>> https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-net.cc#L283-L297
>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I also saw this comment in firecracker code -
>>>>>>>>>>>>>> https://github.com/firecracker-microvm/firecracker/blob/master/devices/src/virtio/net.rs#L153-L154
>>>>>>>>>>>>>>  -
>>>>>>>>>>>>>> which seems to indicate that VIRTIO_NET_F_MRG_RXBUF is NOT
>>>>>>>>>>>>>> supported by firecracker -
>>>>>>>>>>>>>> https://github.com/firecracker-microvm/firecracker/blob/f123988affa8f25683a7c26f7a48dd76e839a796/devices/src/virtio/net.rs#L705-L711
>>>>>>>>>>>>>> ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This section of VirtIO spec would apply then:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "5.1.6.3.1 Driver Requirements: Setting Up Receive Buffers
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    - If VIRTIO_NET_F_MRG_RXBUF is not negotiated:
>>>>>>>>>>>>>>       - If VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6
>>>>>>>>>>>>>>       or VIRTIO_NET_F_GUEST_UFO are negotiated, the driver 
>>>>>>>>>>>>>> SHOULD populate the
>>>>>>>>>>>>>>       receive queue(s) with buffers of at least 65562 bytes.
>>>>>>>>>>>>>>       - Otherwise, the driver SHOULD populate the receive
>>>>>>>>>>>>>>       queue(s) with buffers of at least 1526 bytes.
>>>>>>>>>>>>>>    - If VIRTIO_NET_F_MRG_RXBUF is negotiated, each buffer
>>>>>>>>>>>>>>    MUST be at greater than the size of the struct 
>>>>>>>>>>>>>> virtio_net_hdr."
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This makes me think that our receive buffers are only 1 page
>>>>>>>>>>>>>> (4096 bytes) large so whenever Firecracker tries to send a 
>>>>>>>>>>>>>> buffer bigger
>>>>>>>>>>>>>> than that OSv bounces. Think this OSv code applies -
>>>>>>>>>>>>>> https://github.com/cloudius-systems/osv/blob/master/drivers/virtio-net.cc#L587-L609.
>>>>>>>>>>>>>> It seems the virtio ring buffers are alway 1 page big - see 
>>>>>>>>>>>>>> alloc_page
>>>>>>>>>>>>>> call.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So maybe on OSv side we need to allow for bigger buffers
>>>>>>>>>>>>>> (64K) when VIRTIO_NET_F_MRG_RXBUF is off which would require
>>>>>>>>>>>>>> changes to drivers/virtio-vring.cc. I wonder if on QEMU this
>>>>>>>>>>>>>> feature is on and that is why we never see this issue of QEMU, 
>>>>>>>>>>>>>> do we? It
>>>>>>>>>>>>>> would be nice to run same Python program in qemu and see if 
>>>>>>>>>>>>>> VIRTIO_NET_F_MRG_RXBUF
>>>>>>>>>>>>>> is on or off.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is all my speculation and I might be off so maybe others
>>>>>>>>>>>>>> can shed more light on it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Waldek
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thursday, September 19, 2019 at 12:09:19 AM UTC-4,
>>>>>>>>>>>>>> Henrique Fingler wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  How do I go about disabling GSO?
>>>>>>>>>>>>>>>  I think I found how to disable TSO (diff below), but I
>>>>>>>>>>>>>>> can't find where to disable GSO. Disabling just TSO didn't fix 
>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  The loop where Firecracker gets stuck (fn rx_single_frame)
>>>>>>>>>>>>>>> tries to write an entire frame (7318 bytes) and it notices it 
>>>>>>>>>>>>>>> doesn't fit
>>>>>>>>>>>>>>> into all the descriptors of the guest.
>>>>>>>>>>>>>>>  It seems that if it fails to write the entire frame, it
>>>>>>>>>>>>>>> marks descriptors as used, but retries to deliver the whole 
>>>>>>>>>>>>>>> frame again.
>>>>>>>>>>>>>>> Maybe the OSv buffer isn't big enough and FC just loops forever?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> virtio-net.cc:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    | (1 << VIRTIO_NET_F_STATUS)     \
>>>>>>>>>>>>>>>                   | (1 << VIRTIO_NET_F_CSUM)       \
>>>>>>>>>>>>>>>                   | (1 << VIRTIO_NET_F_GUEST_CSUM) \
>>>>>>>>>>>>>>> -                 | (1 << VIRTIO_NET_F_GUEST_TSO4) \
>>>>>>>>>>>>>>> +                 | (0 << VIRTIO_NET_F_GUEST_TSO4) \
>>>>>>>>>>>>>>>                   | (1 << VIRTIO_NET_F_HOST_ECN)   \
>>>>>>>>>>>>>>> -                 | (1 << VIRTIO_NET_F_HOST_TSO4)  \
>>>>>>>>>>>>>>> +                 | (0 << VIRTIO_NET_F_HOST_TSO4)  \
>>>>>>>>>>>>>>>                   | (1 << VIRTIO_NET_F_GUEST_ECN)
>>>>>>>>>>>>>>>                   | (1 << VIRTIO_NET_F_GUEST_UFO)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  Thanks!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wednesday, September 18, 2019 at 8:23:21 PM UTC-5, Asias
>>>>>>>>>>>>>>> He wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Sep 19, 2019 at 7:06 AM Henrique Fingler <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  First of all, thank you for being active and helping out
>>>>>>>>>>>>>>>>> users!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Here's my setup: I'm building a python3 image, with a
>>>>>>>>>>>>>>>>> script that does
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * response = urllib.request.urlopen("http://<a 1mb file>")*
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  The execution just hangs for a few seconds, then a storm
>>>>>>>>>>>>>>>>> of warnings from Firecracker show up:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> <A lot of the same warning>
>>>>>>>>>>>>>>>>> 2019-09-18T17:50:36.841517975
>>>>>>>>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257]
>>>>>>>>>>>>>>>>> Receiving buffer is too small to hold frame of current size
>>>>>>>>>>>>>>>>> 2019-09-18T17:50:36.841529410
>>>>>>>>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257]
>>>>>>>>>>>>>>>>> Receiving buffer is too small to hold frame of current size
>>>>>>>>>>>>>>>>> 2019-09-18T17:50:36.841569665
>>>>>>>>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257]
>>>>>>>>>>>>>>>>> Receiving buffer is too small to hold frame of current size
>>>>>>>>>>>>>>>>> 2019-09-18T17:50:36.841584097
>>>>>>>>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257]
>>>>>>>>>>>>>>>>> Receiving buffer is too small to hold frame of current size
>>>>>>>>>>>>>>>>> 2019-09-18T17:50:36.841656060
>>>>>>>>>>>>>>>>> [anonymous-instance:WARN:devices/src/virtio/net.rs:257]
>>>>>>>>>>>>>>>>> Receiving buffer is too small to hold frame of current size
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  This is coming from here:
>>>>>>>>>>>>>>>>> https://github.com/firecracker-microvm/firecracker/blob/master/devices/src/virtio/net.rs
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  If the file is smaller, let's say 256B, it works fine
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Could this be a bug in the virtio implementation of OSv
>>>>>>>>>>>>>>>>> or is it a Firecraker thing?
>>>>>>>>>>>>>>>>>  I'll start to investigate the issue. I'm asking because
>>>>>>>>>>>>>>>>> you might have seen this problem.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Try disable gso/tso in osv viriot-net driver.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  Thanks!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> You received this message because you are subscribed to
>>>>>>>>>>>>>>>>> the Google Groups "OSv Development" group.
>>>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails
>>>>>>>>>>>>>>>>> from it, send an email to [email protected].
>>>>>>>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/osv-dev/965f0cad-d074-4b18-b998-ffe5777851a2%40googlegroups.com
>>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/osv-dev/965f0cad-d074-4b18-b998-ffe5777851a2%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Asias
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "OSv Development" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/osv-dev/64ae1bcf-9506-4a52-8ca6-4b0921981f9f%40googlegroups.com
>>>>>>>> <https://groups.google.com/d/msgid/osv-dev/64ae1bcf-9506-4a52-8ca6-4b0921981f9f%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to a topic in
>>>>>>> the Google Groups "OSv Development" group.
>>>>>>> To unsubscribe from this topic, visit
>>>>>>> https://groups.google.com/d/topic/osv-dev/InlSKnJAfMQ/unsubscribe.
>>>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>>>> [email protected].
>>>>>>> To view this discussion on the web visit
>>>>>>> https://groups.google.com/d/msgid/osv-dev/CA%2B3q14xYSikYznw1iCkxtO0%2BRqmcrUirShVU-e1_Pwpp3Zd1yw%40mail.gmail.com
>>>>>>> <https://groups.google.com/d/msgid/osv-dev/CA%2B3q14xYSikYznw1iCkxtO0%2BRqmcrUirShVU-e1_Pwpp3Zd1yw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>>> --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "OSv Development" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/osv-dev/InlSKnJAfMQ/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/osv-dev/ae21cb99-1106-4dc9-bae4-55be207ae989%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/osv-dev/ae21cb99-1106-4dc9-bae4-55be207ae989%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "OSv Development" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/osv-dev/InlSKnJAfMQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/f2e53da2-cb48-4fd1-9d3d-5f0640386851%40googlegroups.com
> <https://groups.google.com/d/msgid/osv-dev/f2e53da2-cb48-4fd1-9d3d-5f0640386851%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CAL9cFfOkfC2MxzW7jC298%3Dv17zQn-vx8OiWmO8GdVDgtB_U27Q%40mail.gmail.com.

Reply via email to