Re: [PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop
On 2018年07月12日 13:24, Michael S. Tsirkin wrote: On Thu, Jul 12, 2018 at 01:21:03PM +0800, Jason Wang wrote: On 2018年07月12日 11:34, Michael S. Tsirkin wrote: On Thu, Jul 12, 2018 at 11:26:12AM +0800, Jason Wang wrote: On 2018年07月11日 19:59, Michael S. Tsirkin wrote: On Wed, Jul 11, 2018 at 01:12:59PM +0800, Jason Wang wrote: On 2018年07月11日 11:49, Tonghao Zhang wrote: On Wed, Jul 11, 2018 at 10:56 AM Jason Wang wrote: On 2018年07月04日 12:31, xiangxia.m@gmail.com wrote: From: Tonghao Zhang This patches improve the guest receive and transmit performance. On the handle_tx side, we poll the sock receive queue at the same time. handle_rx do that in the same way. For more performance report, see patch 4. v4 -> v5: fix some issues v3 -> v4: fix some issues v2 -> v3: This patches are splited from previous big patch: http://patchwork.ozlabs.org/patch/934673/ Tonghao Zhang (4): vhost: lock the vqs one by one net: vhost: replace magic number of lock annotation net: vhost: factor out busy polling logic to vhost_net_busy_poll() net: vhost: add rx busy polling in tx path drivers/vhost/net.c | 108 -- drivers/vhost/vhost.c | 24 --- 2 files changed, 67 insertions(+), 65 deletions(-) Hi, any progress on the new version? I plan to send a new series of packed virtqueue support of vhost. If you plan to send it soon, I can wait. Otherwise, I will send my series. I rebase the codes. and find there is no improvement anymore, the patches of makita may solve the problem. jason you may send your patches, and I will do some research on busypoll. I see. Maybe you can try some bi-directional traffic. Btw, lots of optimizations could be done for busy polling. E.g integrating with host NAPI busy polling or a 100% busy polling vhost_net. You're welcome to work or propose new ideas. Thanks It seems clear we do need adaptive polling. Yes. The difficulty with NAPI polling is it can't access guest memory easily. But maybe get_user_pages on the polled memory+NAPI polling can work. You mean something like zerocopy? Looks like we can do busy polling without it. I mean something like https://patchwork.kernel.org/patch/8707511/. Thanks How does this patch work? vhost_vq_avail_empty can sleep, you are calling it within an rcu read side critical section. Ok, I get your meaning. I have patches to access vring through get_user_pages + vmap() which should help here. (And it increase PPS about 10%-20%). Remember you must mark it as dirty on unpin too ... Ok. That's not the only problem btw, another one is that the CPU time spent polling isn't accounted with the VM. Yes, but it's not the 'issue' of this patch. Yes it is. polling within thread context accounts CPU correctly. And I believe cgroup can help? Thanks cgroups are what's broken by polling in irq context. But I think the NAPI busy polling is still done in process context. Thanks Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop
On Thu, Jul 12, 2018 at 01:21:03PM +0800, Jason Wang wrote: > > > On 2018年07月12日 11:34, Michael S. Tsirkin wrote: > > On Thu, Jul 12, 2018 at 11:26:12AM +0800, Jason Wang wrote: > > > > > > On 2018年07月11日 19:59, Michael S. Tsirkin wrote: > > > > On Wed, Jul 11, 2018 at 01:12:59PM +0800, Jason Wang wrote: > > > > > On 2018年07月11日 11:49, Tonghao Zhang wrote: > > > > > > On Wed, Jul 11, 2018 at 10:56 AM Jason Wang > > > > > > wrote: > > > > > > > On 2018年07月04日 12:31, xiangxia.m@gmail.com wrote: > > > > > > > > From: Tonghao Zhang > > > > > > > > > > > > > > > > This patches improve the guest receive and transmit performance. > > > > > > > > On the handle_tx side, we poll the sock receive queue at the > > > > > > > > same time. > > > > > > > > handle_rx do that in the same way. > > > > > > > > > > > > > > > > For more performance report, see patch 4. > > > > > > > > > > > > > > > > v4 -> v5: > > > > > > > > fix some issues > > > > > > > > > > > > > > > > v3 -> v4: > > > > > > > > fix some issues > > > > > > > > > > > > > > > > v2 -> v3: > > > > > > > > This patches are splited from previous big patch: > > > > > > > > http://patchwork.ozlabs.org/patch/934673/ > > > > > > > > > > > > > > > > Tonghao Zhang (4): > > > > > > > > vhost: lock the vqs one by one > > > > > > > > net: vhost: replace magic number of lock annotation > > > > > > > > net: vhost: factor out busy polling logic to > > > > > > > > vhost_net_busy_poll() > > > > > > > > net: vhost: add rx busy polling in tx path > > > > > > > > > > > > > > > > drivers/vhost/net.c | 108 > > > > > > > > -- > > > > > > > > drivers/vhost/vhost.c | 24 --- > > > > > > > > 2 files changed, 67 insertions(+), 65 deletions(-) > > > > > > > > > > > > > > > Hi, any progress on the new version? > > > > > > > > > > > > > > I plan to send a new series of packed virtqueue support of vhost. > > > > > > > If you > > > > > > > plan to send it soon, I can wait. Otherwise, I will send my > > > > > > > series. > > > > > > I rebase the codes. and find there is no improvement anymore, the > > > > > > patches of makita may solve the problem. jason you may send your > > > > > > patches, and I will do some research on busypoll. > > > > > I see. Maybe you can try some bi-directional traffic. > > > > > > > > > > Btw, lots of optimizations could be done for busy polling. E.g > > > > > integrating > > > > > with host NAPI busy polling or a 100% busy polling vhost_net. You're > > > > > welcome > > > > > to work or propose new ideas. > > > > > > > > > > Thanks > > > > It seems clear we do need adaptive polling. > > > Yes. > > > > > > >The difficulty with NAPI > > > > polling is it can't access guest memory easily. But maybe > > > > get_user_pages on the polled memory+NAPI polling can work. > > > You mean something like zerocopy? Looks like we can do busy polling > > > without > > > it. I mean something like https://patchwork.kernel.org/patch/8707511/. > > > > > > Thanks > > How does this patch work? vhost_vq_avail_empty can sleep, > > you are calling it within an rcu read side critical section. > > Ok, I get your meaning. I have patches to access vring through > get_user_pages + vmap() which should help here. (And it increase PPS about > 10%-20%). Remember you must mark it as dirty on unpin too ... > > > > That's not the only problem btw, another one is that the > > CPU time spent polling isn't accounted with the VM. > > > Yes, but it's not the 'issue' of this patch. Yes it is. polling within thread context accounts CPU correctly. > And I believe cgroup can help? > > Thanks cgroups are what's broken by polling in irq context. > > > > > > > > > Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop
On 2018年07月12日 11:34, Michael S. Tsirkin wrote: On Thu, Jul 12, 2018 at 11:26:12AM +0800, Jason Wang wrote: On 2018年07月11日 19:59, Michael S. Tsirkin wrote: On Wed, Jul 11, 2018 at 01:12:59PM +0800, Jason Wang wrote: On 2018年07月11日 11:49, Tonghao Zhang wrote: On Wed, Jul 11, 2018 at 10:56 AM Jason Wang wrote: On 2018年07月04日 12:31, xiangxia.m@gmail.com wrote: From: Tonghao Zhang This patches improve the guest receive and transmit performance. On the handle_tx side, we poll the sock receive queue at the same time. handle_rx do that in the same way. For more performance report, see patch 4. v4 -> v5: fix some issues v3 -> v4: fix some issues v2 -> v3: This patches are splited from previous big patch: http://patchwork.ozlabs.org/patch/934673/ Tonghao Zhang (4): vhost: lock the vqs one by one net: vhost: replace magic number of lock annotation net: vhost: factor out busy polling logic to vhost_net_busy_poll() net: vhost: add rx busy polling in tx path drivers/vhost/net.c | 108 -- drivers/vhost/vhost.c | 24 --- 2 files changed, 67 insertions(+), 65 deletions(-) Hi, any progress on the new version? I plan to send a new series of packed virtqueue support of vhost. If you plan to send it soon, I can wait. Otherwise, I will send my series. I rebase the codes. and find there is no improvement anymore, the patches of makita may solve the problem. jason you may send your patches, and I will do some research on busypoll. I see. Maybe you can try some bi-directional traffic. Btw, lots of optimizations could be done for busy polling. E.g integrating with host NAPI busy polling or a 100% busy polling vhost_net. You're welcome to work or propose new ideas. Thanks It seems clear we do need adaptive polling. Yes. The difficulty with NAPI polling is it can't access guest memory easily. But maybe get_user_pages on the polled memory+NAPI polling can work. You mean something like zerocopy? Looks like we can do busy polling without it. I mean something like https://patchwork.kernel.org/patch/8707511/. Thanks How does this patch work? vhost_vq_avail_empty can sleep, you are calling it within an rcu read side critical section. Ok, I get your meaning. I have patches to access vring through get_user_pages + vmap() which should help here. (And it increase PPS about 10%-20%). That's not the only problem btw, another one is that the CPU time spent polling isn't accounted with the VM. Yes, but it's not the 'issue' of this patch. And I believe cgroup can help? Thanks Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop
On Thu, Jul 12, 2018 at 11:26:12AM +0800, Jason Wang wrote: > > > On 2018年07月11日 19:59, Michael S. Tsirkin wrote: > > On Wed, Jul 11, 2018 at 01:12:59PM +0800, Jason Wang wrote: > > > > > > On 2018年07月11日 11:49, Tonghao Zhang wrote: > > > > On Wed, Jul 11, 2018 at 10:56 AM Jason Wang wrote: > > > > > > > > > > On 2018年07月04日 12:31, xiangxia.m@gmail.com wrote: > > > > > > From: Tonghao Zhang > > > > > > > > > > > > This patches improve the guest receive and transmit performance. > > > > > > On the handle_tx side, we poll the sock receive queue at the same > > > > > > time. > > > > > > handle_rx do that in the same way. > > > > > > > > > > > > For more performance report, see patch 4. > > > > > > > > > > > > v4 -> v5: > > > > > > fix some issues > > > > > > > > > > > > v3 -> v4: > > > > > > fix some issues > > > > > > > > > > > > v2 -> v3: > > > > > > This patches are splited from previous big patch: > > > > > > http://patchwork.ozlabs.org/patch/934673/ > > > > > > > > > > > > Tonghao Zhang (4): > > > > > > vhost: lock the vqs one by one > > > > > > net: vhost: replace magic number of lock annotation > > > > > > net: vhost: factor out busy polling logic to > > > > > > vhost_net_busy_poll() > > > > > > net: vhost: add rx busy polling in tx path > > > > > > > > > > > > drivers/vhost/net.c | 108 > > > > > > -- > > > > > > drivers/vhost/vhost.c | 24 --- > > > > > > 2 files changed, 67 insertions(+), 65 deletions(-) > > > > > > > > > > > Hi, any progress on the new version? > > > > > > > > > > I plan to send a new series of packed virtqueue support of vhost. If > > > > > you > > > > > plan to send it soon, I can wait. Otherwise, I will send my series. > > > > I rebase the codes. and find there is no improvement anymore, the > > > > patches of makita may solve the problem. jason you may send your > > > > patches, and I will do some research on busypoll. > > > I see. Maybe you can try some bi-directional traffic. > > > > > > Btw, lots of optimizations could be done for busy polling. E.g integrating > > > with host NAPI busy polling or a 100% busy polling vhost_net. You're > > > welcome > > > to work or propose new ideas. > > > > > > Thanks > > It seems clear we do need adaptive polling. > > Yes. > > > The difficulty with NAPI > > polling is it can't access guest memory easily. But maybe > > get_user_pages on the polled memory+NAPI polling can work. > > You mean something like zerocopy? Looks like we can do busy polling without > it. I mean something like https://patchwork.kernel.org/patch/8707511/. > > Thanks How does this patch work? vhost_vq_avail_empty can sleep, you are calling it within an rcu read side critical section. That's not the only problem btw, another one is that the CPU time spent polling isn't accounted with the VM. > > > > > > > Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v35 1/5] mm: support to get hints of free page blocks
On 07/12/2018 10:30 AM, Linus Torvalds wrote: On Wed, Jul 11, 2018 at 7:17 PM Wei Wang wrote: Would it be better to remove __GFP_THISNODE? We actually want to get all the guest free pages (from all the nodes). Maybe. Or maybe it would be better to have the memory balloon logic be per-node? Maybe you don't want to remove too much memory from one node? I think it's one of those "play with it" things. I don't think that's the big issue, actually. I think the real issue is how to react quickly and gracefully to "oops, I'm trying to give memory away, but now the guest wants it back" while you're in the middle of trying to create that 2TB list of pages. OK. virtio-balloon has already registered an oom notifier (virtballoon_oom_notify). I plan to add some control there. If oom happens, - stop the page allocation; - immediately give back the allocated pages to mm. Best, Wei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v35 1/5] mm: support to get hints of free page blocks
On Wed, Jul 11, 2018 at 7:17 PM Wei Wang wrote: > > Would it be better to remove __GFP_THISNODE? We actually want to get all > the guest free pages (from all the nodes). Maybe. Or maybe it would be better to have the memory balloon logic be per-node? Maybe you don't want to remove too much memory from one node? I think it's one of those "play with it" things. I don't think that's the big issue, actually. I think the real issue is how to react quickly and gracefully to "oops, I'm trying to give memory away, but now the guest wants it back" while you're in the middle of trying to create that 2TB list of pages. IOW, I think the real work is in whatever tuning for the righ tbehavior. But I'm just guessing. Linus ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v35 1/5] mm: support to get hints of free page blocks
On 07/12/2018 12:23 AM, Linus Torvalds wrote: On Wed, Jul 11, 2018 at 2:21 AM Michal Hocko wrote: We already have an interface for that. alloc_pages(GFP_NOWAIT, MAX_ORDER -1). So why do we need any array based interface? That was actually my original argument in the original thread - that the only new interface people might want is one that just tells how many of those MAX_ORDER-1 pages there are. See the thread in v33 with the subject "[PATCH v33 1/4] mm: add a function to get free page blocks" and look for me suggesting just using #define GFP_MINFLAGS (__GFP_NORETRY | __GFP_NOWARN | __GFP_THISNODE | __GFP_NOMEMALLOC) Would it be better to remove __GFP_THISNODE? We actually want to get all the guest free pages (from all the nodes). Best, Wei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v35 1/5] mm: support to get hints of free page blocks
On Wed, Jul 11, 2018 at 01:09:49PM +0200, Michal Hocko wrote: > But let me note that I am not really convinced how this (or previous) > approach will really work in most workloads. We tend to cache heavily so > there is rarely any memory free. It might be that it's worth flushing the cache when VM is migrating. Or maybe we should implement virtio-tmem or add transcendent memory support to the balloon. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v35 1/5] mm: support to get hints of free page blocks
On Wed, Jul 11, 2018 at 2:21 AM Michal Hocko wrote: > > We already have an interface for that. alloc_pages(GFP_NOWAIT, MAX_ORDER -1). > So why do we need any array based interface? That was actually my original argument in the original thread - that the only new interface people might want is one that just tells how many of those MAX_ORDER-1 pages there are. See the thread in v33 with the subject "[PATCH v33 1/4] mm: add a function to get free page blocks" and look for me suggesting just using #define GFP_MINFLAGS (__GFP_NORETRY | __GFP_NOWARN | __GFP_THISNODE | __GFP_NOMEMALLOC) struct page *page = alloc_pages(GFP_MINFLAGS, MAX_ORDER-1); for this all. But I could also see an argument for "allocate N pages of size MAX_ORDER-1", with some small N, simply because I can see the advantage of not taking and releasing the locking and looking up the zone individually N times. If you want to get gigabytes of memory (or terabytes), doing it in bigger chunks than one single maximum-sized page sounds fairly reasonable. I just don't think that "thousands of pages" is reasonable. But "tens of max-sized pages" sounds fair enough to me, and it would certainly not be a pain for the VM. So I'm open to new interfaces. I just want those new interfaces to make sense, and be low latency and simple for the VM to do. I'm objecting to the incredibly baroque and heavy-weight one that can return near-infinite amounts of memory. The real advantage of jjuist the existing "alloc_pages()" model is that I think the ballooning people can use that to *test* things out. If it turns out that taking and releasing the VM locks is a big cost, we can see if a batch interface that allows you to get tens of pages at the same time is worth it. So yes, I'd suggest starting with just the existing alloc_pages. Maybe it's not enough, but it should be good enough for testing. Linus ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v35 1/5] mm: support to get hints of free page blocks
On Wed 11-07-18 13:55:15, Wang, Wei W wrote: > On Wednesday, July 11, 2018 7:10 PM, Michal Hocko wrote: > > On Wed 11-07-18 18:52:45, Wei Wang wrote: > > > On 07/11/2018 05:21 PM, Michal Hocko wrote: > > > > On Tue 10-07-18 18:44:34, Linus Torvalds wrote: > > > > [...] > > > > > That was what I tried to encourage with actually removing the > > > > > pages form the page list. That would be an _incremental_ > > > > > interface. You can remove MAX_ORDER-1 pages one by one (or a > > > > > hundred at a time), and mark them free for ballooning that way. > > > > > And if you still feel you have tons of free memory, just continue > > removing more pages from the free list. > > > > We already have an interface for that. alloc_pages(GFP_NOWAIT, > > MAX_ORDER -1). > > > > So why do we need any array based interface? > > > > > > Yes, I'm trying to get free pages directly via alloc_pages, so there > > > will be no new mm APIs. > > > > OK. The above was just a rough example. In fact you would need a more > > complex gfp mask. I assume you only want to balloon only memory directly > > usable by the kernel so it will be > > (GFP_KERNEL | __GFP_NOWARN) & ~__GFP_RECLAIM > > Sounds good to me, thanks. > > > > > > I plan to let free page allocation stop when the remaining system free > > > memory becomes close to min_free_kbytes (prevent swapping). > > > > ~__GFP_RECLAIM will make sure you are allocate as long as there is any > > memory without reclaim. It will not even poke the kswapd to do the > > background work. So I do not think you would need much more than that. > > "close to min_free_kbytes" - I meant when doing the allocations, we > intentionally reserve some small amount of memory, e.g. 2 free page > blocks of "MAX_ORDER - 1". So when other applications happen to do > some allocation, they may easily get some from the reserved memory > left on the free list. Without that reserved memory, other allocation > may cause the system free memory below the WMARK[MIN], and kswapd > would start to do swapping. This is actually just a small optimization > to reduce the probability of causing swapping (nice to have, but not > mandatary because we will allocate free page blocks one by one). I really have hard time to follow you here. Nothing outside of the core MM proper should play with watermarks. > > But let me note that I am not really convinced how this (or previous) > > approach will really work in most workloads. We tend to cache heavily so > > there is rarely any memory free. > > With less free memory, the improvement becomes less, but should be > nicer than no optimization. For example, the Linux build workload > would cause 4~5 GB (out of 8GB) memory to be used as page cache at the > final stage, there is still ~44% live migration time reduction. But most systems will stay somewhere around the high watermark if there is any page cache activity. Especially after a longer uptime. -- Michal Hocko SUSE Labs ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH v35 1/5] mm: support to get hints of free page blocks
On Wednesday, July 11, 2018 7:10 PM, Michal Hocko wrote: > On Wed 11-07-18 18:52:45, Wei Wang wrote: > > On 07/11/2018 05:21 PM, Michal Hocko wrote: > > > On Tue 10-07-18 18:44:34, Linus Torvalds wrote: > > > [...] > > > > That was what I tried to encourage with actually removing the > > > > pages form the page list. That would be an _incremental_ > > > > interface. You can remove MAX_ORDER-1 pages one by one (or a > > > > hundred at a time), and mark them free for ballooning that way. > > > > And if you still feel you have tons of free memory, just continue > removing more pages from the free list. > > > We already have an interface for that. alloc_pages(GFP_NOWAIT, > MAX_ORDER -1). > > > So why do we need any array based interface? > > > > Yes, I'm trying to get free pages directly via alloc_pages, so there > > will be no new mm APIs. > > OK. The above was just a rough example. In fact you would need a more > complex gfp mask. I assume you only want to balloon only memory directly > usable by the kernel so it will be > (GFP_KERNEL | __GFP_NOWARN) & ~__GFP_RECLAIM Sounds good to me, thanks. > > > I plan to let free page allocation stop when the remaining system free > > memory becomes close to min_free_kbytes (prevent swapping). > > ~__GFP_RECLAIM will make sure you are allocate as long as there is any > memory without reclaim. It will not even poke the kswapd to do the > background work. So I do not think you would need much more than that. "close to min_free_kbytes" - I meant when doing the allocations, we intentionally reserve some small amount of memory, e.g. 2 free page blocks of "MAX_ORDER - 1". So when other applications happen to do some allocation, they may easily get some from the reserved memory left on the free list. Without that reserved memory, other allocation may cause the system free memory below the WMARK[MIN], and kswapd would start to do swapping. This is actually just a small optimization to reduce the probability of causing swapping (nice to have, but not mandatary because we will allocate free page blocks one by one). > But let me note that I am not really convinced how this (or previous) > approach will really work in most workloads. We tend to cache heavily so > there is rarely any memory free. With less free memory, the improvement becomes less, but should be nicer than no optimization. For example, the Linux build workload would cause 4~5 GB (out of 8GB) memory to be used as page cache at the final stage, there is still ~44% live migration time reduction. Since we have many cloud customers interested in this feature, I think we can let them test the usefulness. Best, Wei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next v5 0/4] net: vhost: improve performance when enable busyloop
On Wed, Jul 11, 2018 at 01:12:59PM +0800, Jason Wang wrote: > > > On 2018年07月11日 11:49, Tonghao Zhang wrote: > > On Wed, Jul 11, 2018 at 10:56 AM Jason Wang wrote: > > > > > > > > > On 2018年07月04日 12:31, xiangxia.m@gmail.com wrote: > > > > From: Tonghao Zhang > > > > > > > > This patches improve the guest receive and transmit performance. > > > > On the handle_tx side, we poll the sock receive queue at the same time. > > > > handle_rx do that in the same way. > > > > > > > > For more performance report, see patch 4. > > > > > > > > v4 -> v5: > > > > fix some issues > > > > > > > > v3 -> v4: > > > > fix some issues > > > > > > > > v2 -> v3: > > > > This patches are splited from previous big patch: > > > > http://patchwork.ozlabs.org/patch/934673/ > > > > > > > > Tonghao Zhang (4): > > > > vhost: lock the vqs one by one > > > > net: vhost: replace magic number of lock annotation > > > > net: vhost: factor out busy polling logic to vhost_net_busy_poll() > > > > net: vhost: add rx busy polling in tx path > > > > > > > >drivers/vhost/net.c | 108 > > > > -- > > > >drivers/vhost/vhost.c | 24 --- > > > >2 files changed, 67 insertions(+), 65 deletions(-) > > > > > > > Hi, any progress on the new version? > > > > > > I plan to send a new series of packed virtqueue support of vhost. If you > > > plan to send it soon, I can wait. Otherwise, I will send my series. > > I rebase the codes. and find there is no improvement anymore, the > > patches of makita may solve the problem. jason you may send your > > patches, and I will do some research on busypoll. > > I see. Maybe you can try some bi-directional traffic. > > Btw, lots of optimizations could be done for busy polling. E.g integrating > with host NAPI busy polling or a 100% busy polling vhost_net. You're welcome > to work or propose new ideas. > > Thanks It seems clear we do need adaptive polling. The difficulty with NAPI polling is it can't access guest memory easily. But maybe get_user_pages on the polled memory+NAPI polling can work. > > > > > Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v35 1/5] mm: support to get hints of free page blocks
On Wed 11-07-18 18:52:45, Wei Wang wrote: > On 07/11/2018 05:21 PM, Michal Hocko wrote: > > On Tue 10-07-18 18:44:34, Linus Torvalds wrote: > > [...] > > > That was what I tried to encourage with actually removing the pages > > > form the page list. That would be an _incremental_ interface. You can > > > remove MAX_ORDER-1 pages one by one (or a hundred at a time), and mark > > > them free for ballooning that way. And if you still feel you have tons > > > of free memory, just continue removing more pages from the free list. > > We already have an interface for that. alloc_pages(GFP_NOWAIT, MAX_ORDER > > -1). > > So why do we need any array based interface? > > Yes, I'm trying to get free pages directly via alloc_pages, so there will be > no new mm APIs. OK. The above was just a rough example. In fact you would need a more complex gfp mask. I assume you only want to balloon only memory directly usable by the kernel so it will be (GFP_KERNEL | __GFP_NOWARN) & ~__GFP_RECLAIM > I plan to let free page allocation stop when the remaining system free > memory becomes close to min_free_kbytes (prevent swapping). ~__GFP_RECLAIM will make sure you are allocate as long as there is any memory without reclaim. It will not even poke the kswapd to do the background work. So I do not think you would need much more than that. But let me note that I am not really convinced how this (or previous) approach will really work in most workloads. We tend to cache heavily so there is rarely any memory free. -- Michal Hocko SUSE Labs ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v35 1/5] mm: support to get hints of free page blocks
On 07/11/2018 05:21 PM, Michal Hocko wrote: On Tue 10-07-18 18:44:34, Linus Torvalds wrote: [...] That was what I tried to encourage with actually removing the pages form the page list. That would be an _incremental_ interface. You can remove MAX_ORDER-1 pages one by one (or a hundred at a time), and mark them free for ballooning that way. And if you still feel you have tons of free memory, just continue removing more pages from the free list. We already have an interface for that. alloc_pages(GFP_NOWAIT, MAX_ORDER -1). So why do we need any array based interface? Yes, I'm trying to get free pages directly via alloc_pages, so there will be no new mm APIs. I plan to let free page allocation stop when the remaining system free memory becomes close to min_free_kbytes (prevent swapping). Best, Wei ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v35 1/5] mm: support to get hints of free page blocks
On Tue 10-07-18 18:44:34, Linus Torvalds wrote: [...] > That was what I tried to encourage with actually removing the pages > form the page list. That would be an _incremental_ interface. You can > remove MAX_ORDER-1 pages one by one (or a hundred at a time), and mark > them free for ballooning that way. And if you still feel you have tons > of free memory, just continue removing more pages from the free list. We already have an interface for that. alloc_pages(GFP_NOWAIT, MAX_ORDER -1). So why do we need any array based interface? -- Michal Hocko SUSE Labs ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization