Re: [PATCH net-next v6 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()

2018-07-23 Thread Toshiaki Makita
On 2018/07/24 12:28, Tonghao Zhang wrote:
> On Tue, Jul 24, 2018 at 10:53 AM Toshiaki Makita
>  wrote:
>>
>> On 2018/07/24 2:31, Tonghao Zhang wrote:
>>> On Mon, Jul 23, 2018 at 10:20 PM Toshiaki Makita
>>>  wrote:

 On 18/07/23 (月) 21:43, Tonghao Zhang wrote:
> On Mon, Jul 23, 2018 at 5:58 PM Toshiaki Makita
>  wrote:
>>
>> On 2018/07/22 3:04, xiangxia.m@gmail.com wrote:
>>> From: Tonghao Zhang 
>>>
>>> Factor out generic busy polling logic and will be
>>> used for in tx path in the next patch. And with the patch,
>>> qemu can set differently the busyloop_timeout for rx queue.
>>>
>>> Signed-off-by: Tonghao Zhang 
>>> ---
>> ...
>>> +static void vhost_net_busy_poll_vq_check(struct vhost_net *net,
>>> +  struct vhost_virtqueue *rvq,
>>> +  struct vhost_virtqueue *tvq,
>>> +  bool rx)
>>> +{
>>> + struct socket *sock = rvq->private_data;
>>> +
>>> + if (rx) {
>>> + if (!vhost_vq_avail_empty(>dev, tvq)) {
>>> + vhost_poll_queue(>poll);
>>> + } else if (unlikely(vhost_enable_notify(>dev, tvq))) 
>>> {
>>> + vhost_disable_notify(>dev, tvq);
>>> + vhost_poll_queue(>poll);
>>> + }
>>> + } else if ((sock && sk_has_rx_data(sock->sk)) &&
>>> + !vhost_vq_avail_empty(>dev, rvq)) {
>>> + vhost_poll_queue(>poll);
>>
>> Now we wait for vq_avail for rx as well, I think you cannot skip
>> vhost_enable_notify() on tx. Probably you might want to do:
> I think vhost_enable_notify is needed.
>
>> } else if (sock && sk_has_rx_data(sock->sk)) {
>>  if (!vhost_vq_avail_empty(>dev, rvq)) {
>>  vhost_poll_queue(>poll);
>>  } else if (unlikely(vhost_enable_notify(>dev, rvq))) {
>>  vhost_disable_notify(>dev, rvq);
>>  vhost_poll_queue(>poll);
>>  }
>> }
> As Jason review as before, we only want rx kick when packet is pending at
> socket but we're out of available buffers. So we just enable notify,
> but not poll it ?
>
>  } else if ((sock && sk_has_rx_data(sock->sk)) &&
>  !vhost_vq_avail_empty(>dev, rvq)) {
>  vhost_poll_queue(>poll);
>  else {
>  vhost_enable_notify(>dev, rvq);
>  }

 When vhost_enable_notify() returns true the avail becomes non-empty
 while we are enabling notify. We may delay the rx process if we don't
 check the return value of vhost_enable_notify().
>>> I got it thanks.
>> Also it's better to care vhost_net_disable_vq()/vhost_net_enable_vq() on 
>> tx?
> I cant find why it is better, if necessary, we can do it.

 The reason is pretty simple... we are busypolling the socket so we don't
 need rx wakeups during it?
>>> OK, but one question, how about rx? do we use the
>>> vhost_net_disable_vq/vhost_net_ensable_vq on rx ?
>>
>> If we are busypolling the sock tx buf? I'm not sure if polling it
>> improves the performance.
> Not the sock tx buff, when we are busypolling in handle_rx, we will
> check the tx vring via  vhost_vq_avail_empty.
> So, should we the disable tvq, e.g. vhost_net_disable_vq(net, tvq)?> --

When you want to stop vq kicks from the guest you should call
vhost_disable_notify() and when you want to stop vq wakeups from the
socket you should call vhost_net_disable_vq().

You are polling vq_avail so you want to stop vq kicks thus
vhost_disable_notify() is needed and it is already called.

-- 
Toshiaki Makita

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v6 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()

2018-07-23 Thread Tonghao Zhang
On Tue, Jul 24, 2018 at 10:53 AM Toshiaki Makita
 wrote:
>
> On 2018/07/24 2:31, Tonghao Zhang wrote:
> > On Mon, Jul 23, 2018 at 10:20 PM Toshiaki Makita
> >  wrote:
> >>
> >> On 18/07/23 (月) 21:43, Tonghao Zhang wrote:
> >>> On Mon, Jul 23, 2018 at 5:58 PM Toshiaki Makita
> >>>  wrote:
> 
>  On 2018/07/22 3:04, xiangxia.m@gmail.com wrote:
> > From: Tonghao Zhang 
> >
> > Factor out generic busy polling logic and will be
> > used for in tx path in the next patch. And with the patch,
> > qemu can set differently the busyloop_timeout for rx queue.
> >
> > Signed-off-by: Tonghao Zhang 
> > ---
>  ...
> > +static void vhost_net_busy_poll_vq_check(struct vhost_net *net,
> > +  struct vhost_virtqueue *rvq,
> > +  struct vhost_virtqueue *tvq,
> > +  bool rx)
> > +{
> > + struct socket *sock = rvq->private_data;
> > +
> > + if (rx) {
> > + if (!vhost_vq_avail_empty(>dev, tvq)) {
> > + vhost_poll_queue(>poll);
> > + } else if (unlikely(vhost_enable_notify(>dev, tvq))) 
> > {
> > + vhost_disable_notify(>dev, tvq);
> > + vhost_poll_queue(>poll);
> > + }
> > + } else if ((sock && sk_has_rx_data(sock->sk)) &&
> > + !vhost_vq_avail_empty(>dev, rvq)) {
> > + vhost_poll_queue(>poll);
> 
>  Now we wait for vq_avail for rx as well, I think you cannot skip
>  vhost_enable_notify() on tx. Probably you might want to do:
> >>> I think vhost_enable_notify is needed.
> >>>
>  } else if (sock && sk_has_rx_data(sock->sk)) {
>   if (!vhost_vq_avail_empty(>dev, rvq)) {
>   vhost_poll_queue(>poll);
>   } else if (unlikely(vhost_enable_notify(>dev, rvq))) {
>   vhost_disable_notify(>dev, rvq);
>   vhost_poll_queue(>poll);
>   }
>  }
> >>> As Jason review as before, we only want rx kick when packet is pending at
> >>> socket but we're out of available buffers. So we just enable notify,
> >>> but not poll it ?
> >>>
> >>>  } else if ((sock && sk_has_rx_data(sock->sk)) &&
> >>>  !vhost_vq_avail_empty(>dev, rvq)) {
> >>>  vhost_poll_queue(>poll);
> >>>  else {
> >>>  vhost_enable_notify(>dev, rvq);
> >>>  }
> >>
> >> When vhost_enable_notify() returns true the avail becomes non-empty
> >> while we are enabling notify. We may delay the rx process if we don't
> >> check the return value of vhost_enable_notify().
> > I got it thanks.
>  Also it's better to care vhost_net_disable_vq()/vhost_net_enable_vq() on 
>  tx?
> >>> I cant find why it is better, if necessary, we can do it.
> >>
> >> The reason is pretty simple... we are busypolling the socket so we don't
> >> need rx wakeups during it?
> > OK, but one question, how about rx? do we use the
> > vhost_net_disable_vq/vhost_net_ensable_vq on rx ?
>
> If we are busypolling the sock tx buf? I'm not sure if polling it
> improves the performance.
Not the sock tx buff, when we are busypolling in handle_rx, we will
check the tx vring via  vhost_vq_avail_empty.
So, should we the disable tvq, e.g. vhost_net_disable_vq(net, tvq)?> --
> Toshiaki Makita
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v6 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()

2018-07-23 Thread Toshiaki Makita
On 2018/07/24 2:31, Tonghao Zhang wrote:
> On Mon, Jul 23, 2018 at 10:20 PM Toshiaki Makita
>  wrote:
>>
>> On 18/07/23 (月) 21:43, Tonghao Zhang wrote:
>>> On Mon, Jul 23, 2018 at 5:58 PM Toshiaki Makita
>>>  wrote:

 On 2018/07/22 3:04, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
>
> Factor out generic busy polling logic and will be
> used for in tx path in the next patch. And with the patch,
> qemu can set differently the busyloop_timeout for rx queue.
>
> Signed-off-by: Tonghao Zhang 
> ---
 ...
> +static void vhost_net_busy_poll_vq_check(struct vhost_net *net,
> +  struct vhost_virtqueue *rvq,
> +  struct vhost_virtqueue *tvq,
> +  bool rx)
> +{
> + struct socket *sock = rvq->private_data;
> +
> + if (rx) {
> + if (!vhost_vq_avail_empty(>dev, tvq)) {
> + vhost_poll_queue(>poll);
> + } else if (unlikely(vhost_enable_notify(>dev, tvq))) {
> + vhost_disable_notify(>dev, tvq);
> + vhost_poll_queue(>poll);
> + }
> + } else if ((sock && sk_has_rx_data(sock->sk)) &&
> + !vhost_vq_avail_empty(>dev, rvq)) {
> + vhost_poll_queue(>poll);

 Now we wait for vq_avail for rx as well, I think you cannot skip
 vhost_enable_notify() on tx. Probably you might want to do:
>>> I think vhost_enable_notify is needed.
>>>
 } else if (sock && sk_has_rx_data(sock->sk)) {
  if (!vhost_vq_avail_empty(>dev, rvq)) {
  vhost_poll_queue(>poll);
  } else if (unlikely(vhost_enable_notify(>dev, rvq))) {
  vhost_disable_notify(>dev, rvq);
  vhost_poll_queue(>poll);
  }
 }
>>> As Jason review as before, we only want rx kick when packet is pending at
>>> socket but we're out of available buffers. So we just enable notify,
>>> but not poll it ?
>>>
>>>  } else if ((sock && sk_has_rx_data(sock->sk)) &&
>>>  !vhost_vq_avail_empty(>dev, rvq)) {
>>>  vhost_poll_queue(>poll);
>>>  else {
>>>  vhost_enable_notify(>dev, rvq);
>>>  }
>>
>> When vhost_enable_notify() returns true the avail becomes non-empty
>> while we are enabling notify. We may delay the rx process if we don't
>> check the return value of vhost_enable_notify().
> I got it thanks.
 Also it's better to care vhost_net_disable_vq()/vhost_net_enable_vq() on 
 tx?
>>> I cant find why it is better, if necessary, we can do it.
>>
>> The reason is pretty simple... we are busypolling the socket so we don't
>> need rx wakeups during it?
> OK, but one question, how about rx? do we use the
> vhost_net_disable_vq/vhost_net_ensable_vq on rx ?

If we are busypolling the sock tx buf? I'm not sure if polling it
improves the performance.

-- 
Toshiaki Makita

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v36 2/5] virtio_balloon: replace oom notifier with shrinker

2018-07-23 Thread Wei Wang

On 07/23/2018 10:13 PM, Michael S. Tsirkin wrote:

vb->vb_dev_info.inode->i_mapping->a_ops = _aops;
   #endif
+   err = virtio_balloon_register_shrinker(vb);
+   if (err)
+   goto out_del_vqs;
So we can get scans before device is ready. Leak will fail
then. Why not register later after device is ready?

Probably no.

- it would be better not to set device ready when register_shrinker failed.

That's very rare so I won't be too worried.


Just a little confused with the point here. "very rare" means it still 
could happen (even it's a corner case), and if that happens, we got 
something wrong functionally. So it will be a bug if we change like 
that, right?


Still couldn't understand the reason of changing shrinker_register after 
device_ready (the original oom notifier was registered before setting 
device ready too)?
(I think the driver won't get shrinker_scan called if device isn't ready 
because of the reasons below)



- When the device isn't ready, ballooning won't happen, that is,
vb->num_pages will be 0, which results in shrinker_count=0 and shrinker_scan
won't be called.


Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Call for Papers - ICITS'19 - Quito, Ecuador

2018-07-23 Thread Maria
* Proceedings by Springer. Indexed in Scopus, ISI, etc.



ICITS'19 - The 2019 International Conference on Information Technology & Systems

6 - 8 February 2019, Quito, Ecuador

http://www.icits.me/ 

--   --   
--   


ICITS'19 - The 2019 International Conference on Information Technology & 
Systems, to be held at Quito, Ecuador, 6 - 8 February 2019, is an international 
forum for researchers and practitioners to present and discuss the most recent 
innovations, trends, results, experiences and concerns in the several 
perspectives of Information Technology & Systems.

We are pleased to invite you to submit your papers to ICITS'19. They can be 
written in English, Spanish or Portuguese. All submissions will be reviewed on 
the basis of relevance, originality, importance and clarity.



Topics

Submitted papers should be related with one or more of the main themes proposed 
for the Conference:

A) Information and Knowledge Management (IKM);

B) Organizational Models and Information Systems (OMIS);

C) Software and Systems Modeling (SSM);

D) Software Systems, Architectures, Applications and Tools (SSAAT);

E) Multimedia Systems and Applications (MSA);

F) Computer Networks, Mobility and Pervasive Systems (CNMPS);

G) Intelligent and Decision Support Systems (IDSS);

H) Big Data Analytics and Applications (BDAA);

I) Human-Computer Interaction (HCI);

J) Ethics, Computers and Security (ECS)

K) Health Informatics (HIS);

L) Information Technologies in Education (ITE);

M) Cybersecurity and Cyber-defense.



Submission and Decision

Submitted papers written in English (until 10-page limit) must comply with the 
format of Advances in Intelligent Systems and Computing series (see 
Instructions for Authors at Springer Website 
 or download a DOC example 
), must not have been published before, 
not be under review for any other conference or publication and not include any 
information leading to the authors’ identification. Therefore, the authors’ 
names, affiliations and bibliographic references should not be included in the 
version for evaluation by the Scientific Committee. This information should 
only be included in the camera-ready version, saved in Word or Latex format and 
also in PDF format. These files must be accompanied by the Consent to Publish 
form  filled out, in a ZIP file, and 
uploaded at the conference management system.

Submitted papers written in Spanish or Portuguese (until 15-page limit) must 
comply with the format of RISTI  - Revista Ibérica de 
Sistemas e Tecnologias de Informação (download instructions/template for 
authors in Spanish  or Portuguese 
), must not have been published before, 
not be under review for any other conference or publication and not include any 
information leading to the authors’ identification. Therefore, the authors’ 
names, affiliations and bibliographic references should not be included in the 
version for evaluation by the Scientific Committee. This information should 
only be included in the camera-ready version, saved in Word. These file must be 
uploaded at the conference management system in a ZIP file.

All papers will be subjected to a “double-blind review” by at least two members 
of the Scientific Committee.

Based on Scientific Committee evaluation, a paper can be rejected or accepted 
by the Conference Chairs. In the later case, it can be accepted as paper or 
poster.

The authors of papers accepted as posters must build and print a poster to be 
exhibited during the Conference. This poster must follow an A1 or A2 vertical 
format. The Conference can includes Work Sessions where these posters are 
presented and orally discussed, with a 7 minute limit per poster.

The authors of accepted papers will have 15 minutes to present their work in a 
Conference Work Session; approximately 5 minutes of discussion will follow each 
presentation.



Publication and Indexing

To ensure that an accepted paper is published, at least one of the authors must 
be fully registered by the 9th of October 2018, and the paper must comply with 
the suggested layout and page-limit. Additionally, all recommended changes must 
be addressed by the authors before they submit the camera-ready version.

No more than one paper per registration will be published. An extra fee must be 
paid for publication of additional papers, with a maximum of one additional 
paper per registration. One registration permits only the participation of one 
author in the conference.

Papers written in English and accepted and registered will be published in 
Proceedings by Springer, in a book of the Advances in Intelligent Systems and 

Re: [PATCH net-next v6 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()

2018-07-23 Thread Tonghao Zhang
On Mon, Jul 23, 2018 at 10:20 PM Toshiaki Makita
 wrote:
>
> On 18/07/23 (月) 21:43, Tonghao Zhang wrote:
> > On Mon, Jul 23, 2018 at 5:58 PM Toshiaki Makita
> >  wrote:
> >>
> >> On 2018/07/22 3:04, xiangxia.m@gmail.com wrote:
> >>> From: Tonghao Zhang 
> >>>
> >>> Factor out generic busy polling logic and will be
> >>> used for in tx path in the next patch. And with the patch,
> >>> qemu can set differently the busyloop_timeout for rx queue.
> >>>
> >>> Signed-off-by: Tonghao Zhang 
> >>> ---
> >> ...
> >>> +static void vhost_net_busy_poll_vq_check(struct vhost_net *net,
> >>> +  struct vhost_virtqueue *rvq,
> >>> +  struct vhost_virtqueue *tvq,
> >>> +  bool rx)
> >>> +{
> >>> + struct socket *sock = rvq->private_data;
> >>> +
> >>> + if (rx) {
> >>> + if (!vhost_vq_avail_empty(>dev, tvq)) {
> >>> + vhost_poll_queue(>poll);
> >>> + } else if (unlikely(vhost_enable_notify(>dev, tvq))) {
> >>> + vhost_disable_notify(>dev, tvq);
> >>> + vhost_poll_queue(>poll);
> >>> + }
> >>> + } else if ((sock && sk_has_rx_data(sock->sk)) &&
> >>> + !vhost_vq_avail_empty(>dev, rvq)) {
> >>> + vhost_poll_queue(>poll);
> >>
> >> Now we wait for vq_avail for rx as well, I think you cannot skip
> >> vhost_enable_notify() on tx. Probably you might want to do:
> > I think vhost_enable_notify is needed.
> >
> >> } else if (sock && sk_has_rx_data(sock->sk)) {
> >>  if (!vhost_vq_avail_empty(>dev, rvq)) {
> >>  vhost_poll_queue(>poll);
> >>  } else if (unlikely(vhost_enable_notify(>dev, rvq))) {
> >>  vhost_disable_notify(>dev, rvq);
> >>  vhost_poll_queue(>poll);
> >>  }
> >> }
> > As Jason review as before, we only want rx kick when packet is pending at
> > socket but we're out of available buffers. So we just enable notify,
> > but not poll it ?
> >
> >  } else if ((sock && sk_has_rx_data(sock->sk)) &&
> >  !vhost_vq_avail_empty(>dev, rvq)) {
> >  vhost_poll_queue(>poll);
> >  else {
> >  vhost_enable_notify(>dev, rvq);
> >  }
>
> When vhost_enable_notify() returns true the avail becomes non-empty
> while we are enabling notify. We may delay the rx process if we don't
> check the return value of vhost_enable_notify().
I got it thanks.
> >> Also it's better to care vhost_net_disable_vq()/vhost_net_enable_vq() on 
> >> tx?
> > I cant find why it is better, if necessary, we can do it.
>
> The reason is pretty simple... we are busypolling the socket so we don't
> need rx wakeups during it?
OK, but one question, how about rx? do we use the
vhost_net_disable_vq/vhost_net_ensable_vq on rx ?
> --
> Toshiaki Makita
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v36 0/5] Virtio-balloon: support free page reporting

2018-07-23 Thread Dr. David Alan Gilbert
* Michael S. Tsirkin (m...@redhat.com) wrote:
> On Fri, Jul 20, 2018 at 04:33:00PM +0800, Wei Wang wrote:
> > This patch series is separated from the previous "Virtio-balloon
> > Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT,  
> > implemented by this series enables the virtio-balloon driver to report
> > hints of guest free pages to the host. It can be used to accelerate live
> > migration of VMs. Here is an introduction of this usage:
> > 
> > Live migration needs to transfer the VM's memory from the source machine
> > to the destination round by round. For the 1st round, all the VM's memory
> > is transferred. From the 2nd round, only the pieces of memory that were
> > written by the guest (after the 1st round) are transferred. One method
> > that is popularly used by the hypervisor to track which part of memory is
> > written is to write-protect all the guest memory.
> > 
> > This feature enables the optimization by skipping the transfer of guest
> > free pages during VM live migration. It is not concerned that the memory
> > pages are used after they are given to the hypervisor as a hint of the
> > free pages, because they will be tracked by the hypervisor and transferred
> > in the subsequent round if they are used and written.
> > 
> > * Tests
> > - Test Environment
> > Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> > Guest: 8G RAM, 4 vCPU
> > Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> > 
> > - Test Results
> > - Idle Guest Live Migration Time (results are averaged over 10 runs):
> > - Optimization v.s. Legacy = 409ms vs 1757ms --> ~77% reduction
> > (setting page poisoning zero and enabling ksm don't affect the
> >  comparison result)
> > - Guest with Linux Compilation Workload (make bzImage -j4):
> > - Live Migration Time (average)
> >   Optimization v.s. Legacy = 1407ms v.s. 2528ms --> ~44% reduction
> > - Linux Compilation Time
> >   Optimization v.s. Legacy = 5min4s v.s. 5min12s
> >   --> no obvious difference
> 
> I'd like to see dgilbert's take on whether this kind of gain
> justifies adding a PV interfaces, and what kind of guest workload
> is appropriate.
> 
> Cc'd.

Well, 44% is great ... although the measurement is a bit weird.

a) A 2 second downtime is very large; 300-500ms is more normal
b) I'm not sure what the 'average' is  - is that just between a bunch of
repeated migrations?
c) What load was running in the guest during the live migration?

An interesting measurement to add would be to do the same test but
with a VM with a lot more RAM but the same load;  you'd hope the gain
would be even better.
It would be interesting, especially because the users who are interested
are people creating VMs allocated with lots of extra memory (for the
worst case) but most of the time migrating when it's fairly idle.

Dave

> 
> 
> > ChangeLog:
> > v35->v36:
> > - remove the mm patch, as Linus has a suggestion to get free page
> >   addresses via allocation, instead of reading from the free page
> >   list.
> > - virtio-balloon:
> > - replace oom notifier with shrinker;
> > - the guest to host communication interface remains the same as
> >   v32.
> > - allocate free page blocks and send to host one by one, and free
> >   them after sending all the pages.
> > 
> > For ChangeLogs from v22 to v35, please reference
> > https://lwn.net/Articles/759413/
> > 
> > For ChangeLogs before v21, please reference
> > https://lwn.net/Articles/743660/
> > 
> > Wei Wang (5):
> >   virtio-balloon: remove BUG() in init_vqs
> >   virtio_balloon: replace oom notifier with shrinker
> >   virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
> >   mm/page_poison: expose page_poisoning_enabled to kernel modules
> >   virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
> > 
> >  drivers/virtio/virtio_balloon.c | 456 
> > ++--
> >  include/uapi/linux/virtio_balloon.h |   7 +
> >  mm/page_poison.c|   6 +
> >  3 files changed, 394 insertions(+), 75 deletions(-)
> > 
> > -- 
> > 2.7.4
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v36 2/5] virtio_balloon: replace oom notifier with shrinker

2018-07-23 Thread Michael S. Tsirkin
On Mon, Jul 23, 2018 at 06:30:46PM +0800, Wei Wang wrote:
> On 07/22/2018 10:48 PM, Michael S. Tsirkin wrote:
> > On Fri, Jul 20, 2018 at 04:33:02PM +0800, Wei Wang wrote:
> > > +static unsigned long virtio_balloon_shrinker_scan(struct shrinker 
> > > *shrinker,
> > > +   struct shrink_control *sc)
> > > +{
> > > + unsigned long pages_to_free = balloon_pages_to_shrink,
> > > +   pages_freed = 0;
> > > + struct virtio_balloon *vb = container_of(shrinker,
> > > + struct virtio_balloon, shrinker);
> > > +
> > > + /*
> > > +  * One invocation of leak_balloon can deflate at most
> > > +  * VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
> > > +  * multiple times to deflate pages till reaching
> > > +  * balloon_pages_to_shrink pages.
> > > +  */
> > > + while (vb->num_pages && pages_to_free) {
> > > + pages_to_free = balloon_pages_to_shrink - pages_freed;
> > > + pages_freed += leak_balloon(vb, pages_to_free);
> > > + }
> > > + update_balloon_size(vb);
> > Are you sure that this is never called if count returned 0?
> 
> Yes. Please see do_shrink_slab, it just returns if count is 0.
> 
> > 
> > > +
> > > + return pages_freed / VIRTIO_BALLOON_PAGES_PER_PAGE;
> > > +}
> > > +
> > > +static unsigned long virtio_balloon_shrinker_count(struct shrinker 
> > > *shrinker,
> > > +struct shrink_control *sc)
> > > +{
> > > + struct virtio_balloon *vb = container_of(shrinker,
> > > + struct virtio_balloon, shrinker);
> > > +
> > > + /*
> > > +  * We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to handle the
> > > +  * case when shrinker needs to be invoked to relieve memory pressure.
> > > +  */
> > > + if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
> > > + return 0;
> > So why not skip notifier registration when deflate on oom
> > is clear?
> 
> Sounds good, thanks.
> 
> 
> > vb->vb_dev_info.inode->i_mapping->a_ops = _aops;
> >   #endif
> > +   err = virtio_balloon_register_shrinker(vb);
> > +   if (err)
> > +   goto out_del_vqs;
> > So we can get scans before device is ready. Leak will fail
> > then. Why not register later after device is ready?
> 
> Probably no.
> 
> - it would be better not to set device ready when register_shrinker failed.

That's very rare so I won't be too worried.

> - When the device isn't ready, ballooning won't happen, that is,
> vb->num_pages will be 0, which results in shrinker_count=0 and shrinker_scan
> won't be called.
> 
> So I think it would be better to have shrinker registered before
> device_ready.
> 
> Best,
> Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v36 0/5] Virtio-balloon: support free page reporting

2018-07-23 Thread Michael S. Tsirkin
On Fri, Jul 20, 2018 at 04:33:00PM +0800, Wei Wang wrote:
> This patch series is separated from the previous "Virtio-balloon
> Enhancement" series. The new feature, VIRTIO_BALLOON_F_FREE_PAGE_HINT,  
> implemented by this series enables the virtio-balloon driver to report
> hints of guest free pages to the host. It can be used to accelerate live
> migration of VMs. Here is an introduction of this usage:
> 
> Live migration needs to transfer the VM's memory from the source machine
> to the destination round by round. For the 1st round, all the VM's memory
> is transferred. From the 2nd round, only the pieces of memory that were
> written by the guest (after the 1st round) are transferred. One method
> that is popularly used by the hypervisor to track which part of memory is
> written is to write-protect all the guest memory.
> 
> This feature enables the optimization by skipping the transfer of guest
> free pages during VM live migration. It is not concerned that the memory
> pages are used after they are given to the hypervisor as a hint of the
> free pages, because they will be tracked by the hypervisor and transferred
> in the subsequent round if they are used and written.
> 
> * Tests
> - Test Environment
> Host: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> Guest: 8G RAM, 4 vCPU
> Migration setup: migrate_set_speed 100G, migrate_set_downtime 2 second
> 
> - Test Results
> - Idle Guest Live Migration Time (results are averaged over 10 runs):
> - Optimization v.s. Legacy = 409ms vs 1757ms --> ~77% reduction
>   (setting page poisoning zero and enabling ksm don't affect the
>  comparison result)
> - Guest with Linux Compilation Workload (make bzImage -j4):
> - Live Migration Time (average)
>   Optimization v.s. Legacy = 1407ms v.s. 2528ms --> ~44% reduction
> - Linux Compilation Time
>   Optimization v.s. Legacy = 5min4s v.s. 5min12s
>   --> no obvious difference

I'd like to see dgilbert's take on whether this kind of gain
justifies adding a PV interfaces, and what kind of guest workload
is appropriate.

Cc'd.


> ChangeLog:
> v35->v36:
> - remove the mm patch, as Linus has a suggestion to get free page
>   addresses via allocation, instead of reading from the free page
>   list.
> - virtio-balloon:
> - replace oom notifier with shrinker;
> - the guest to host communication interface remains the same as
>   v32.
>   - allocate free page blocks and send to host one by one, and free
>   them after sending all the pages.
> 
> For ChangeLogs from v22 to v35, please reference
> https://lwn.net/Articles/759413/
> 
> For ChangeLogs before v21, please reference
> https://lwn.net/Articles/743660/
> 
> Wei Wang (5):
>   virtio-balloon: remove BUG() in init_vqs
>   virtio_balloon: replace oom notifier with shrinker
>   virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>   mm/page_poison: expose page_poisoning_enabled to kernel modules
>   virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
> 
>  drivers/virtio/virtio_balloon.c | 456 
> ++--
>  include/uapi/linux/virtio_balloon.h |   7 +
>  mm/page_poison.c|   6 +
>  3 files changed, 394 insertions(+), 75 deletions(-)
> 
> -- 
> 2.7.4
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v6 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()

2018-07-23 Thread Tonghao Zhang
On Mon, Jul 23, 2018 at 5:58 PM Toshiaki Makita
 wrote:
>
> On 2018/07/22 3:04, xiangxia.m@gmail.com wrote:
> > From: Tonghao Zhang 
> >
> > Factor out generic busy polling logic and will be
> > used for in tx path in the next patch. And with the patch,
> > qemu can set differently the busyloop_timeout for rx queue.
> >
> > Signed-off-by: Tonghao Zhang 
> > ---
> ...
> > +static void vhost_net_busy_poll_vq_check(struct vhost_net *net,
> > +  struct vhost_virtqueue *rvq,
> > +  struct vhost_virtqueue *tvq,
> > +  bool rx)
> > +{
> > + struct socket *sock = rvq->private_data;
> > +
> > + if (rx) {
> > + if (!vhost_vq_avail_empty(>dev, tvq)) {
> > + vhost_poll_queue(>poll);
> > + } else if (unlikely(vhost_enable_notify(>dev, tvq))) {
> > + vhost_disable_notify(>dev, tvq);
> > + vhost_poll_queue(>poll);
> > + }
> > + } else if ((sock && sk_has_rx_data(sock->sk)) &&
> > + !vhost_vq_avail_empty(>dev, rvq)) {
> > + vhost_poll_queue(>poll);
>
> Now we wait for vq_avail for rx as well, I think you cannot skip
> vhost_enable_notify() on tx. Probably you might want to do:
I think vhost_enable_notify is needed.

> } else if (sock && sk_has_rx_data(sock->sk)) {
> if (!vhost_vq_avail_empty(>dev, rvq)) {
> vhost_poll_queue(>poll);
> } else if (unlikely(vhost_enable_notify(>dev, rvq))) {
> vhost_disable_notify(>dev, rvq);
> vhost_poll_queue(>poll);
> }
> }
As Jason review as before, we only want rx kick when packet is pending at
socket but we're out of available buffers. So we just enable notify,
but not poll it ?

} else if ((sock && sk_has_rx_data(sock->sk)) &&
!vhost_vq_avail_empty(>dev, rvq)) {
vhost_poll_queue(>poll);
else {
vhost_enable_notify(>dev, rvq);
}
> Also it's better to care vhost_net_disable_vq()/vhost_net_enable_vq() on tx?
I cant find why it is better, if necessary, we can do it.
> --
> Toshiaki Makita
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 4.4 018/107] x86/paravirt: Make native_save_fl() extern inline

2018-07-23 Thread Greg Kroah-Hartman
4.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Nick Desaulniers 

commit d0a8d9378d16eb3c69bd8e6d23779fbdbee3a8c7 upstream.

native_save_fl() is marked static inline, but by using it as
a function pointer in arch/x86/kernel/paravirt.c, it MUST be outlined.

paravirt's use of native_save_fl() also requires that no GPRs other than
%rax are clobbered.

Compilers have different heuristics which they use to emit stack guard
code, the emittance of which can break paravirt's callee saved assumption
by clobbering %rcx.

Marking a function definition extern inline means that if this version
cannot be inlined, then the out-of-line version will be preferred. By
having the out-of-line version be implemented in assembly, it cannot be
instrumented with a stack protector, which might violate custom calling
conventions that code like paravirt rely on.

The semantics of extern inline has changed since gnu89. This means that
folks using GCC versions >= 5.1 may see symbol redefinition errors at
link time for subdirs that override KBUILD_CFLAGS (making the C standard
used implicit) regardless of this patch. This has been cleaned up
earlier in the patch set, but is left as a note in the commit message
for future travelers.

Reports:
 https://lkml.org/lkml/2018/5/7/534
 https://github.com/ClangBuiltLinux/linux/issues/16

Discussion:
 https://bugs.llvm.org/show_bug.cgi?id=37512
 https://lkml.org/lkml/2018/5/24/1371

Thanks to the many folks that participated in the discussion.

[Backport for 4.4. 4.4 is missing commit 784d5699eddc "x86: move exports to
actual definitions" which doesn't apply cleanly, and not really worth
backporting IMO. It's simpler to change this patch from upstream:
  + #include 
rather than
  + #include ]

Debugged-by: Alistair Strachan 
Debugged-by: Matthias Kaehlcke 
Suggested-by: Arnd Bergmann 
Suggested-by: H. Peter Anvin 
Suggested-by: Tom Stellar 
Reported-by: Sedat Dilek 
Tested-by: Sedat Dilek 
Signed-off-by: Nick Desaulniers 
Acked-by: Juergen Gross 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: a...@redhat.com
Cc: akata...@vmware.com
Cc: a...@linux-foundation.org
Cc: andrea.pa...@amarulasolutions.com
Cc: ard.biesheu...@linaro.org
Cc: aryabi...@virtuozzo.com
Cc: astrac...@google.com
Cc: boris.ostrov...@oracle.com
Cc: brijesh.si...@amd.com
Cc: caoj.f...@cn.fujitsu.com
Cc: ge...@linux-m68k.org
Cc: ghackm...@google.com
Cc: gre...@linuxfoundation.org
Cc: jan.kis...@siemens.com
Cc: jarkko.sakki...@linux.intel.com
Cc: j...@perches.com
Cc: jpoim...@redhat.com
Cc: keesc...@google.com
Cc: kirill.shute...@linux.intel.com
Cc: kstew...@linuxfoundation.org
Cc: linux-...@vger.kernel.org
Cc: linux-kbu...@vger.kernel.org
Cc: manojgu...@google.com
Cc: mawil...@microsoft.com
Cc: michal.l...@markovi.net
Cc: mj...@google.com
Cc: m...@chromium.org
Cc: pombreda...@nexb.com
Cc: rient...@google.com
Cc: rost...@goodmis.org
Cc: thomas.lenda...@amd.com
Cc: tw...@google.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.dea...@arm.com
Cc: yamada.masah...@socionext.com
Link: http://lkml.kernel.org/r/20180621162324.36656-4-ndesaulni...@google.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/x86/include/asm/irqflags.h |2 +-
 arch/x86/kernel/Makefile|1 +
 arch/x86/kernel/irqflags.S  |   26 ++
 3 files changed, 28 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -8,7 +8,7 @@
  * Interrupt control:
  */
 
-static inline unsigned long native_save_fl(void)
+extern inline unsigned long native_save_fl(void)
 {
unsigned long flags;
 
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -41,6 +41,7 @@ obj-y += alternative.o i8253.o pci-nom
 obj-y  += tsc.o tsc_msr.o io_delay.o rtc.o
 obj-y  += pci-iommu_table.o
 obj-y  += resource.o
+obj-y  += irqflags.o
 
 obj-y  += process.o
 obj-y  += fpu/
--- /dev/null
+++ b/arch/x86/kernel/irqflags.S
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include 
+#include 
+#include 
+
+/*
+ * unsigned long native_save_fl(void)
+ */
+ENTRY(native_save_fl)
+   pushf
+   pop %_ASM_AX
+   ret
+ENDPROC(native_save_fl)
+EXPORT_SYMBOL(native_save_fl)
+
+/*
+ * void native_restore_fl(unsigned long flags)
+ * %eax/%rdi: flags
+ */
+ENTRY(native_restore_fl)
+   push %_ASM_ARG1
+   popf
+   ret
+ENDPROC(native_restore_fl)
+EXPORT_SYMBOL(native_restore_fl)


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v36 2/5] virtio_balloon: replace oom notifier with shrinker

2018-07-23 Thread Wei Wang

On 07/22/2018 10:48 PM, Michael S. Tsirkin wrote:

On Fri, Jul 20, 2018 at 04:33:02PM +0800, Wei Wang wrote:
  
+static unsigned long virtio_balloon_shrinker_scan(struct shrinker *shrinker,

+ struct shrink_control *sc)
+{
+   unsigned long pages_to_free = balloon_pages_to_shrink,
+ pages_freed = 0;
+   struct virtio_balloon *vb = container_of(shrinker,
+   struct virtio_balloon, shrinker);
+
+   /*
+* One invocation of leak_balloon can deflate at most
+* VIRTIO_BALLOON_ARRAY_PFNS_MAX balloon pages, so we call it
+* multiple times to deflate pages till reaching
+* balloon_pages_to_shrink pages.
+*/
+   while (vb->num_pages && pages_to_free) {
+   pages_to_free = balloon_pages_to_shrink - pages_freed;
+   pages_freed += leak_balloon(vb, pages_to_free);
+   }
+   update_balloon_size(vb);

Are you sure that this is never called if count returned 0?


Yes. Please see do_shrink_slab, it just returns if count is 0.




+
+   return pages_freed / VIRTIO_BALLOON_PAGES_PER_PAGE;
+}
+
+static unsigned long virtio_balloon_shrinker_count(struct shrinker *shrinker,
+  struct shrink_control *sc)
+{
+   struct virtio_balloon *vb = container_of(shrinker,
+   struct virtio_balloon, shrinker);
+
+   /*
+* We continue to use VIRTIO_BALLOON_F_DEFLATE_ON_OOM to handle the
+* case when shrinker needs to be invoked to relieve memory pressure.
+*/
+   if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
+   return 0;

So why not skip notifier registration when deflate on oom
is clear?


Sounds good, thanks.



vb->vb_dev_info.inode->i_mapping->a_ops = _aops;
  #endif
+   err = virtio_balloon_register_shrinker(vb);
+   if (err)
+   goto out_del_vqs;
  
So we can get scans before device is ready. Leak will fail

then. Why not register later after device is ready?


Probably no.

- it would be better not to set device ready when register_shrinker failed.
- When the device isn't ready, ballooning won't happen, that is, 
vb->num_pages will be 0, which results in shrinker_count=0 and 
shrinker_scan won't be called.


So I think it would be better to have shrinker registered before 
device_ready.


Best,
Wei
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v6 3/4] net: vhost: factor out busy polling logic to vhost_net_busy_poll()

2018-07-23 Thread Toshiaki Makita
On 2018/07/22 3:04, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> Factor out generic busy polling logic and will be
> used for in tx path in the next patch. And with the patch,
> qemu can set differently the busyloop_timeout for rx queue.
> 
> Signed-off-by: Tonghao Zhang 
> ---
...
> +static void vhost_net_busy_poll_vq_check(struct vhost_net *net,
> +  struct vhost_virtqueue *rvq,
> +  struct vhost_virtqueue *tvq,
> +  bool rx)
> +{
> + struct socket *sock = rvq->private_data;
> +
> + if (rx) {
> + if (!vhost_vq_avail_empty(>dev, tvq)) {
> + vhost_poll_queue(>poll);
> + } else if (unlikely(vhost_enable_notify(>dev, tvq))) {
> + vhost_disable_notify(>dev, tvq);
> + vhost_poll_queue(>poll);
> + }
> + } else if ((sock && sk_has_rx_data(sock->sk)) &&
> + !vhost_vq_avail_empty(>dev, rvq)) {
> + vhost_poll_queue(>poll);

Now we wait for vq_avail for rx as well, I think you cannot skip
vhost_enable_notify() on tx. Probably you might want to do:

} else if (sock && sk_has_rx_data(sock->sk)) {
if (!vhost_vq_avail_empty(>dev, rvq)) {
vhost_poll_queue(>poll);
} else if (unlikely(vhost_enable_notify(>dev, rvq))) {
vhost_disable_notify(>dev, rvq);
vhost_poll_queue(>poll);
}
}

Also it's better to care vhost_net_disable_vq()/vhost_net_enable_vq() on tx?

-- 
Toshiaki Makita

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-23 Thread Michael S. Tsirkin
On Mon, Jul 23, 2018 at 11:58:23AM +0530, Anshuman Khandual wrote:
> On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote:
> > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
> >> This patch series is the follow up on the discussions we had before about
> >> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
> >> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
> >> were suggestions about doing away with two different paths of transactions
> >> with the host/QEMU, first being the direct GPA and the other being the DMA
> >> API based translations.
> >>
> >> First patch attempts to create a direct GPA mapping based DMA operations
> >> structure called 'virtio_direct_dma_ops' with exact same implementation
> >> of the direct GPA path which virtio core currently has but just wrapped in
> >> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
> >> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
> >> existing semantics. The second patch does exactly that inside the function
> >> virtio_finalize_features(). The third patch removes the default direct GPA
> >> path from virtio core forcing it to use DMA API callbacks for all devices.
> >> Now with that change, every device must have a DMA operations structure
> >> associated with it. The fourth patch adds an additional hook which gives
> >> the platform an opportunity to do yet another override if required. This
> >> platform hook can be used on POWER Ultravisor based protected guests to
> >> load up SWIOTLB DMA callbacks to do the required (as discussed previously
> >> in the above mentioned thread how host is allowed to access only parts of
> >> the guest GPA range) bounce buffering into the shared memory for all I/O
> >> scatter gather buffers to be consumed on the host side.
> >>
> >> Please go through these patches and review whether this approach broadly
> >> makes sense. I will appreciate suggestions, inputs, comments regarding
> >> the patches or the approach in general. Thank you.
> > I like how patches 1-3 look. Could you test performance
> > with/without to see whether the extra indirection through
> > use of DMA ops causes a measurable slow-down?
> 
> I ran this simple DD command 10 times where /dev/vda is a virtio block
> device of 10GB size.
> 
> dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct
> 
> With and without patches bandwidth which has a bit wide range does not
> look that different from each other.
> 
> Without patches
> ===
> 
> -- 1 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s
> -- 2 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s
> -- 3 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s
> -- 4 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s
> -- 5 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s
> -- 6 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s
> -- 7 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s
> -- 8 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s
> -- 9 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s
> -- 10 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s
> 
> 
> With patches
> 
> 
> -- 1 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s
> -- 2 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s
> -- 3 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s
> -- 4 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s
> -- 5 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s
> -- 6 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s
> -- 7 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s
> -- 8 -
> 1024+0 records in
> 1024+0 records out
> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.99149 s, 4.3 GB/s
> -- 9 -
> 1024+0 

Patch "x86/paravirt: Make native_save_fl() extern inline" has been added to the 4.4-stable tree

2018-07-23 Thread gregkh


This is a note to let you know that I've just added the patch titled

x86/paravirt: Make native_save_fl() extern inline

to the 4.4-stable tree which can be found at:

http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
 x86-paravirt-make-native_save_fl-extern-inline.patch
and it can be found in the queue-4.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.


>From d0a8d9378d16eb3c69bd8e6d23779fbdbee3a8c7 Mon Sep 17 00:00:00 2001
From: Nick Desaulniers 
Date: Thu, 21 Jun 2018 09:23:24 -0700
Subject: x86/paravirt: Make native_save_fl() extern inline

From: Nick Desaulniers 

commit d0a8d9378d16eb3c69bd8e6d23779fbdbee3a8c7 upstream.

native_save_fl() is marked static inline, but by using it as
a function pointer in arch/x86/kernel/paravirt.c, it MUST be outlined.

paravirt's use of native_save_fl() also requires that no GPRs other than
%rax are clobbered.

Compilers have different heuristics which they use to emit stack guard
code, the emittance of which can break paravirt's callee saved assumption
by clobbering %rcx.

Marking a function definition extern inline means that if this version
cannot be inlined, then the out-of-line version will be preferred. By
having the out-of-line version be implemented in assembly, it cannot be
instrumented with a stack protector, which might violate custom calling
conventions that code like paravirt rely on.

The semantics of extern inline has changed since gnu89. This means that
folks using GCC versions >= 5.1 may see symbol redefinition errors at
link time for subdirs that override KBUILD_CFLAGS (making the C standard
used implicit) regardless of this patch. This has been cleaned up
earlier in the patch set, but is left as a note in the commit message
for future travelers.

Reports:
 https://lkml.org/lkml/2018/5/7/534
 https://github.com/ClangBuiltLinux/linux/issues/16

Discussion:
 https://bugs.llvm.org/show_bug.cgi?id=37512
 https://lkml.org/lkml/2018/5/24/1371

Thanks to the many folks that participated in the discussion.

[Backport for 4.4. 4.4 is missing commit 784d5699eddc "x86: move exports to
actual definitions" which doesn't apply cleanly, and not really worth
backporting IMO. It's simpler to change this patch from upstream:
  + #include 
rather than
  + #include ]

Debugged-by: Alistair Strachan 
Debugged-by: Matthias Kaehlcke 
Suggested-by: Arnd Bergmann 
Suggested-by: H. Peter Anvin 
Suggested-by: Tom Stellar 
Reported-by: Sedat Dilek 
Tested-by: Sedat Dilek 
Signed-off-by: Nick Desaulniers 
Acked-by: Juergen Gross 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: a...@redhat.com
Cc: akata...@vmware.com
Cc: a...@linux-foundation.org
Cc: andrea.pa...@amarulasolutions.com
Cc: ard.biesheu...@linaro.org
Cc: aryabi...@virtuozzo.com
Cc: astrac...@google.com
Cc: boris.ostrov...@oracle.com
Cc: brijesh.si...@amd.com
Cc: caoj.f...@cn.fujitsu.com
Cc: ge...@linux-m68k.org
Cc: ghackm...@google.com
Cc: gre...@linuxfoundation.org
Cc: jan.kis...@siemens.com
Cc: jarkko.sakki...@linux.intel.com
Cc: j...@perches.com
Cc: jpoim...@redhat.com
Cc: keesc...@google.com
Cc: kirill.shute...@linux.intel.com
Cc: kstew...@linuxfoundation.org
Cc: linux-...@vger.kernel.org
Cc: linux-kbu...@vger.kernel.org
Cc: manojgu...@google.com
Cc: mawil...@microsoft.com
Cc: michal.l...@markovi.net
Cc: mj...@google.com
Cc: m...@chromium.org
Cc: pombreda...@nexb.com
Cc: rient...@google.com
Cc: rost...@goodmis.org
Cc: thomas.lenda...@amd.com
Cc: tw...@google.com
Cc: virtualization@lists.linux-foundation.org
Cc: will.dea...@arm.com
Cc: yamada.masah...@socionext.com
Link: http://lkml.kernel.org/r/20180621162324.36656-4-ndesaulni...@google.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/x86/include/asm/irqflags.h |2 +-
 arch/x86/kernel/Makefile|1 +
 arch/x86/kernel/irqflags.S  |   26 ++
 3 files changed, 28 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -8,7 +8,7 @@
  * Interrupt control:
  */
 
-static inline unsigned long native_save_fl(void)
+extern inline unsigned long native_save_fl(void)
 {
unsigned long flags;
 
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -41,6 +41,7 @@ obj-y += alternative.o i8253.o pci-nom
 obj-y  += tsc.o tsc_msr.o io_delay.o rtc.o
 obj-y  += pci-iommu_table.o
 obj-y  += resource.o
+obj-y  += irqflags.o
 
 obj-y  += process.o
 obj-y  += fpu/
--- /dev/null
+++ b/arch/x86/kernel/irqflags.S
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include 
+#include 
+#include 
+
+/*
+ * unsigned long native_save_fl(void)
+ */
+ENTRY(native_save_fl)
+   pushf
+   pop %_ASM_AX
+   ret
+ENDPROC(native_save_fl)

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-23 Thread Anshuman Khandual
On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote:
> On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote:
>> This patch series is the follow up on the discussions we had before about
>> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation
>> for virito devices (https://patchwork.kernel.org/patch/10417371/). There
>> were suggestions about doing away with two different paths of transactions
>> with the host/QEMU, first being the direct GPA and the other being the DMA
>> API based translations.
>>
>> First patch attempts to create a direct GPA mapping based DMA operations
>> structure called 'virtio_direct_dma_ops' with exact same implementation
>> of the direct GPA path which virtio core currently has but just wrapped in
>> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of
>> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the
>> existing semantics. The second patch does exactly that inside the function
>> virtio_finalize_features(). The third patch removes the default direct GPA
>> path from virtio core forcing it to use DMA API callbacks for all devices.
>> Now with that change, every device must have a DMA operations structure
>> associated with it. The fourth patch adds an additional hook which gives
>> the platform an opportunity to do yet another override if required. This
>> platform hook can be used on POWER Ultravisor based protected guests to
>> load up SWIOTLB DMA callbacks to do the required (as discussed previously
>> in the above mentioned thread how host is allowed to access only parts of
>> the guest GPA range) bounce buffering into the shared memory for all I/O
>> scatter gather buffers to be consumed on the host side.
>>
>> Please go through these patches and review whether this approach broadly
>> makes sense. I will appreciate suggestions, inputs, comments regarding
>> the patches or the approach in general. Thank you.
> I like how patches 1-3 look. Could you test performance
> with/without to see whether the extra indirection through
> use of DMA ops causes a measurable slow-down?

I ran this simple DD command 10 times where /dev/vda is a virtio block
device of 10GB size.

dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct

With and without patches bandwidth which has a bit wide range does not
look that different from each other.

Without patches
===

-- 1 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s
-- 2 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s
-- 3 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s
-- 4 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s
-- 5 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s
-- 6 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s
-- 7 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s
-- 8 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s
-- 9 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s
-- 10 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s


With patches


-- 1 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s
-- 2 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s
-- 3 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s
-- 4 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s
-- 5 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s
-- 6 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s
-- 7 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s
-- 8 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.99149 s, 4.3 GB/s
-- 9 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.67647 s, 1.5 GB/s
-- 10 -
1024+0 records in
1024+0 records out
8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.93957 s, 2.9 GB/s

Does this look okay ?

___
Virtualization mailing