Re: Fw: Benchmarking for vhost polling patch

2015-01-17 Thread Razya Ladelsky
  
  Our suggestion would be to use the maximum (a large enough) value,
  so that vhost is polling 100% of the time.
 
  The polling optimization mainly addresses users who want to maximize 
their 
  performance, even on the expense of wasting cpu cycles. The maximum 
value 
  will produce the biggest impact on performance.
 
 *Everyone* is interested in getting maximum performance from
 their systems.
 

Maybe so, but not everyone is willing to pay the price.
That is also the reason why this optimization should not be enabled by 
default. 

  However, using the maximum default value will be valuable even for 
users 
  who care more about the normalized throughput/cpu criteria. Such 
users, 
  interested in a finer tuning of the polling timeout need to look for 
an 
  optimal timeout value for their system. The maximum value serves as 
the 
  upper limit of the range that needs to be searched for such optimal 
  timeout value.
 
 Number of users who are going to do this kind of tuning
 can be counted on one hand.
 

If the optimization is not enabled by default, the default value is almost 
irrelevant, because when users turn on the feature they should understand 
that there's an associated cost and they have to tune their system if they 
want to get the maximum benefit (depending how they define their maximum 
benefit).
The maximum value is a good starting point that will work in most cases 
and can be used to start the tuning. 

  
   There are some cases where networking stack already
   exposes low-level hardware detail to userspace, e.g.
   tcp polling configuration. If we can't come up with
   a way to abstract hardware, maybe we can at least tie
   it to these existing controls rather than introducing
   new ones?
   
  
  We've spent time thinking about the possible interfaces that 
  could be appropriate for such an optimization(including tcp polling).
  We think that using the ioctl as interface to configure the virtual 
  device/vhost, 
  in the same manner that e.g. SET_NET_BACKEND is configured, makes a 
lot of 
  sense, and
  is consistent with the existing mechanism. 
  
  Thanks,
  Razya
 
 guest is giving up it's share of CPU for benefit of vhost, right?
 So maybe exposing this to guest is appropriate, and then
 add e.g. an ethtool interface for guest admin to set this.
 

The decision making of whether to turn polling on (and with what rate)
should be made by the system administrator, who has a broad view of the 
system and workload, and not by the guest administrator.
Polling should be a tunable parameter from the host side, the guest should 
not be aware of it.
The guest is not necessarily giving up its time. It may be that there's 
just an extra dedicated core or free cpu cycles on a different cpu.
We provide a mechanism and an interface that can be tuned by some other 
program to implement its policy.
This patch is all about the mechanism and not the policy of how to use it.

Thank you,
Razya 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: Benchmarking for vhost polling patch

2015-01-14 Thread Razya Ladelsky
Michael S. Tsirkin m...@redhat.com wrote on 12/01/2015 12:36:13 PM:

 From: Michael S. Tsirkin m...@redhat.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL
 Cc: Alex Glikson/Haifa/IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, 
 Yossi Kuperman1/Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL, 
 abel.gor...@gmail.com, kvm@vger.kernel.org, Eyal 
Moscovici/Haifa/IBM@IBMIL
 Date: 12/01/2015 12:36 PM
 Subject: Re: Fw: Benchmarking for vhost polling patch
 
 On Sun, Jan 11, 2015 at 02:44:17PM +0200, Razya Ladelsky wrote:
   Hi Razya,
   Thanks for the update.
   So that's reasonable I think, and I think it makes sense
   to keep working on this in isolation - it's more
   manageable at this size.
   
   The big questions in my mind:
   - What happens if system is lightly loaded?
 E.g. a ping/pong benchmark. How much extra CPU are
 we wasting?
   - We see the best performance on your system is with 10usec worth of 

  polling.
 It's OK to be able to tune it for best performance, but
 most people don't have the time or the inclination.
 So what would be the best value for other CPUs?
  
  The extra cpu waste vs throughput gains depends on the polling timeout 

  value(poll_stop_idle).
  The best value to chose is dependant on the workload and the system 
  hardware and configuration.
  There is nothing that we can say about this value in advance. The 
system's 
  manager/administrator should use this optimization with the awareness 
that 
  polling
  consumes extra cpu cycles, as documented. 
  
   - Should this be tunable from usespace per vhost instance?
 Why is it only tunable globally?
  
  It should be tunable per vhost thread.
  We can do it in a subsequent patch.
 
 So I think whether the patchset is appropriate upstream
 will depend exactly on coming up with a reasonable
 interface for enabling and tuning the functionality.
 

How about adding a new ioctl for each vhost device that 
sets the poll_stop_idle (the timeout)? 
This should be aligned with the QEMU way of doing things.

 I was hopeful some reasonable default value can be
 derived from e.g. cost of the exit.
 If that is not the case, it becomes that much harder
 for users to select good default values.
 

Our suggestion would be to use the maximum (a large enough) value,
so that vhost is polling 100% of the time.
The polling optimization mainly addresses users who want to maximize their 
performance, even on the expense of wasting cpu cycles. The maximum value 
will produce the biggest impact on performance.
However, using the maximum default value will be valuable even for users 
who care more about the normalized throughput/cpu criteria. Such users, 
interested in a finer tuning of the polling timeout need to look for an 
optimal timeout value for their system. The maximum value serves as the 
upper limit of the range that needs to be searched for such optimal 
timeout value.


 There are some cases where networking stack already
 exposes low-level hardware detail to userspace, e.g.
 tcp polling configuration. If we can't come up with
 a way to abstract hardware, maybe we can at least tie
 it to these existing controls rather than introducing
 new ones?
 

We've spent time thinking about the possible interfaces that 
could be appropriate for such an optimization(including tcp polling).
We think that using the ioctl as interface to configure the virtual 
device/vhost, 
in the same manner that e.g. SET_NET_BACKEND is configured, makes a lot of 
sense, and
is consistent with the existing mechanism. 

Thanks,
Razya



 
   - How bad is it if you don't pin vhost and vcpu threads?
 Is the scheduler smart enough to pull them apart?
   - What happens in overcommit scenarios? Does polling make things
 much worse?
 Clearly polling will work worse if e.g. vhost and vcpu
 share the host cpu. How can we avoid conflicts?
   
 For two last questions, better cooperation with host scheduler 
will
 likely help here.
 See e.g. 
  http://thread.gmane.org/gmane.linux.kernel/1771791/focus=1772505
 I'm currently looking at pushing something similar upstream,
 if it goes in vhost polling can do something similar.
   
   Any data points to shed light on these questions?
  
  I ran a simple apache benchmark, with an over commit scenario, where 
both 
  the vcpu and vhost share the same core.
  In some cases (c4 in my testcases) polling surprisingly produced a 
better 
  throughput.
 
 Likely because latency is hurt, so you get better batching?
 
  Therefore, it is hard to predict how the polling will impact 
performance 
  in advance. 
 
 If it's so hard, users will struggle to configure this properly.
 Looks like an argument for us developers to do the hard work,
 and expose simpler controls to users?
 
  It is up to whoever is using this optimization to use it wisely.
  Thanks,
  Razya 
  
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo

Re: Fw: Benchmarking for vhost polling patch

2015-01-11 Thread Razya Ladelsky
 Hi Razya,
 Thanks for the update.
 So that's reasonable I think, and I think it makes sense
 to keep working on this in isolation - it's more
 manageable at this size.
 
 The big questions in my mind:
 - What happens if system is lightly loaded?
   E.g. a ping/pong benchmark. How much extra CPU are
   we wasting?
 - We see the best performance on your system is with 10usec worth of 
polling.
   It's OK to be able to tune it for best performance, but
   most people don't have the time or the inclination.
   So what would be the best value for other CPUs?

The extra cpu waste vs throughput gains depends on the polling timeout 
value(poll_stop_idle).
The best value to chose is dependant on the workload and the system 
hardware and configuration.
There is nothing that we can say about this value in advance. The system's 
manager/administrator should use this optimization with the awareness that 
polling
consumes extra cpu cycles, as documented. 

 - Should this be tunable from usespace per vhost instance?
   Why is it only tunable globally?

It should be tunable per vhost thread.
We can do it in a subsequent patch.

 - How bad is it if you don't pin vhost and vcpu threads?
   Is the scheduler smart enough to pull them apart?
 - What happens in overcommit scenarios? Does polling make things
   much worse?
   Clearly polling will work worse if e.g. vhost and vcpu
   share the host cpu. How can we avoid conflicts?
 
   For two last questions, better cooperation with host scheduler will
   likely help here.
   See e.g.  
http://thread.gmane.org/gmane.linux.kernel/1771791/focus=1772505
   I'm currently looking at pushing something similar upstream,
   if it goes in vhost polling can do something similar.
 
 Any data points to shed light on these questions?

I ran a simple apache benchmark, with an over commit scenario, where both 
the vcpu and vhost share the same core.
In some cases (c4 in my testcases) polling surprisingly produced a better 
throughput.
Therefore, it is hard to predict how the polling will impact performance 
in advance. 
It is up to whoever is using this optimization to use it wisely.
Thanks,
Razya 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: Benchmarking for vhost polling patch

2015-01-01 Thread Razya Ladelsky
Hi Michael,
Just a follow up on the polling patch numbers,..
Please let me know if you find these numbers satisfying enough to continue 
with submitting this patch.
Otherwise - we'll have this patch submitted as part of the larger Elvis 
patch set rather than independently.
Thank you,
Razya 

- Forwarded by Razya Ladelsky/Haifa/IBM on 01/01/2015 09:37 AM -

From:   Razya Ladelsky/Haifa/IBM@IBMIL
To: m...@redhat.com
Cc: 
Date:   25/11/2014 02:43 PM
Subject:Re: Benchmarking for vhost polling patch
Sent by:kvm-ow...@vger.kernel.org



Hi Michael,

 Hi Razya,
 On the netperf benchmark, it looks like polling=10 gives a modest but
 measureable gain.  So from that perspective it might be worth it if it's
 not too much code, though we'll need to spend more time checking the
 macro effect - we barely moved the needle on the macro benchmark and
 that is suspicious.

I ran memcached with various values for the key  value arguments, and 
managed to see a bigger impact of polling than when I used the default 
values,
Here are the numbers:

key=250 TPS  netvhost vm   TPS/cpu  TPS/CPU
value=2048   rate   util  util  change

polling=0   101540   103.0  46   100   695.47
polling=5   136747   123.0  83   100   747.25   0.074440609
polling=7   140722   125.7  84   100   764.79   0.099663658
polling=10  141719   126.3  87   100   757.85   0.089688003
polling=15  142430   127.1  90   100   749.63   0.077863015
polling=25  146347   128.7  95   100   750.49   0.079107993
polling=50  150882   131.1  100  100   754.41   0.084733701

Macro benchmarks are less I/O intensive than the micro benchmark, which is 
why 
we can expect less impact for polling as compared to netperf. 
However, as shown above, we managed to get 10% TPS/CPU improvement with 
the 
polling patch.

 Is there a chance you are actually trading latency for throughput?
 do you observe any effect on latency?

No.

 How about trying some other benchmark, e.g. NFS?
 

Tried, but didn't have enough I/O produced (vhost was at most at 15% util)

 
 Also, I am wondering:
 
 since vhost thread is polling in kernel anyway, shouldn't
 we try and poll the host NIC?
 that would likely reduce at least the latency significantly,
 won't it?
 

Yes, it could be a great addition at some point, but needs a thorough 
investigation. In any case, not a part of this patch...

Thanks,
Razya

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Benchmarking for vhost polling patch

2014-11-25 Thread Razya Ladelsky
Hi Michael,

 Hi Razya,
 On the netperf benchmark, it looks like polling=10 gives a modest but
 measureable gain.  So from that perspective it might be worth it if it's
 not too much code, though we'll need to spend more time checking the
 macro effect - we barely moved the needle on the macro benchmark and
 that is suspicious.

I ran memcached with various values for the key  value arguments, and 
managed to see a bigger impact of polling than when I used the default values,
Here are the numbers:

key=250 TPS  netvhost vm   TPS/cpu  TPS/CPU
value=2048   rate   util  util  change

polling=0   101540   103.0  46   100   695.47
polling=5   136747   123.0  83   100   747.25   0.074440609
polling=7   140722   125.7  84   100   764.79   0.099663658
polling=10  141719   126.3  87   100   757.85   0.089688003
polling=15  142430   127.1  90   100   749.63   0.077863015
polling=25  146347   128.7  95   100   750.49   0.079107993
polling=50  150882   131.1  100  100   754.41   0.084733701

Macro benchmarks are less I/O intensive than the micro benchmark, which is why 
we can expect less impact for polling as compared to netperf. 
However, as shown above, we managed to get 10% TPS/CPU improvement with the 
polling patch.

 Is there a chance you are actually trading latency for throughput?
 do you observe any effect on latency?

No.

 How about trying some other benchmark, e.g. NFS?
 

Tried, but didn't have enough I/O produced (vhost was at most at 15% util)

 
 Also, I am wondering:
 
 since vhost thread is polling in kernel anyway, shouldn't
 we try and poll the host NIC?
 that would likely reduce at least the latency significantly,
 won't it?
 

Yes, it could be a great addition at some point, but needs a thorough 
investigation. In any case, not a part of this patch...

Thanks,
Razya

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Benchmarking for vhost polling patch

2014-11-16 Thread Razya Ladelsky
Razya Ladelsky/Haifa/IBM@IBMIL wrote on 29/10/2014 02:38:31 PM:

 From: Razya Ladelsky/Haifa/IBM@IBMIL
 To: m...@redhat.com
 Cc: Razya Ladelsky/Haifa/IBM@IBMIL, Alex Glikson/Haifa/IBM@IBMIL, 
 Eran Raichstein/Haifa/IBM@IBMIL, Yossi Kuperman1/Haifa/IBM@IBMIL, 
 Joel Nider/Haifa/IBM@IBMIL, abel.gor...@gmail.com, kvm@vger.kernel.org
 Date: 29/10/2014 02:38 PM
 Subject: Benchmarking for vhost polling patch
 
 Hi Michael,
 
 Following the polling patch thread: http://marc.info/?
 l=kvmm=140853271510179w=2, 
 I changed poll_stop_idle to be counted in micro seconds, and carried out 

 experiments using varying sizes of this value. 
 
 If it makes sense to you, I will continue with the other changes 
 requested for 
 the patch.
 
 Thank you,
 Razya
 
 

Dear Michael,
I'm still interested in hearing your opinion about these numbers 
http://marc.info/?l=kvmm=141458631532669w=2, 
and whether it is worthwhile to continue with the polling patch.
Thank you,
Razya 


 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Benchmarking for vhost polling patch

2014-11-09 Thread Razya Ladelsky
Razya Ladelsky/Haifa/IBM@IBMIL wrote on 29/10/2014 02:38:31 PM:

 From: Razya Ladelsky/Haifa/IBM@IBMIL
 To: m...@redhat.com
 Cc: Razya Ladelsky/Haifa/IBM@IBMIL, Alex Glikson/Haifa/IBM@IBMIL, 
 Eran Raichstein/Haifa/IBM@IBMIL, Yossi Kuperman1/Haifa/IBM@IBMIL, 
 Joel Nider/Haifa/IBM@IBMIL, abel.gor...@gmail.com, kvm@vger.kernel.org
 Date: 29/10/2014 02:38 PM
 Subject: Benchmarking for vhost polling patch
 
 Hi Michael,
 
 Following the polling patch thread: http://marc.info/?
 l=kvmm=140853271510179w=2, 
 I changed poll_stop_idle to be counted in micro seconds, and carried out 

 experiments using varying sizes of this value. 
 
 If it makes sense to you, I will continue with the other changes 
 requested for 
 the patch.
 
 Thank you,
 Razya
 
 

Hi Michael,
Have you had the chance to look into these numbers?
Thank you,
Razya 


 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Benchmarking for vhost polling patch

2014-10-30 Thread Razya Ladelsky
Zhang Haoyu zhan...@sangfor.com wrote on 30/10/2014 01:30:08 PM:

 From: Zhang Haoyu zhan...@sangfor.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL, mst m...@redhat.com
 Cc: Razya Ladelsky/Haifa/IBM@IBMIL, kvm kvm@vger.kernel.org
 Date: 30/10/2014 01:30 PM
 Subject: Re: Benchmarking for vhost polling patch
 
  Hi Michael,
  
  Following the polling patch thread: http://marc.info/?
 l=kvmm=140853271510179w=2, 
  I changed poll_stop_idle to be counted in micro seconds, and carried 
out 
  experiments using varying sizes of this value. The setup for 
 netperf consisted of 
  1 vm and 1 vhost , each running on their own dedicated core.
  
 Could you provide your changing code?
 
 Thanks,
 Zhang Haoyu
 
Hi Zhang,
Do you mean the change in code for poll_stop_idle?
Thanks,
Razya

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Benchmarking for vhost polling patch

2014-10-29 Thread Razya Ladelsky
Hi Michael,

Following the polling patch thread: 
http://marc.info/?l=kvmm=140853271510179w=2, 
I changed poll_stop_idle to be counted in micro seconds, and carried out 
experiments using varying sizes of this value. The setup for netperf consisted 
of 
1 vm and 1 vhost , each running on their own dedicated core.
  
Here are the  numbers for netperf (micro benchmark):

polling|Send |Throughput|Utilization |S. Demand   
|vhost|exits|throughput|throughput
mode   |Msg  |  |Send  Recv  |Send  Recv  |util |/sec | /cpu |   
/cpu
   |Size |  |local remote|local remote| | |  |% 
change
   |bytes|10^6bits/s|  %%|us/KB us/KB |  %  | |  |
-
NoPolling  64   1054.11   99.97 3.01  7.78  3.74   38.80  92K7.60
Polling=1  64   1036.67   99.97 2.93  7.90  3.70   53.00  92K6.78 -10.78
Polling=5  64   1079.27   99.97 3.07  7.59  3.73   83.00  90K5.90 -22.35
Polling=7  64   1444.90   99.97 3.98  5.67  3.61   95.00  19.5K  7.41  -2.44
Polling=10 64   1521.70   99.97 4.21  5.38  3.63   98.00  8.5K   7.69   1.19
Polling=25 64   1534.24   99.97 4.18  5.34  3.57   99.00  8.5K   7.71   1.51
Polling=50 64   1534.24   99.97 4.18  5.34  3.57   99.00  8.5K   7.71   1.51
  
NoPolling  128  1577.39   99.97 4.09  5.19  3.40   54.00  113K   10.24 
Polling=1  128  1596.08   99.97 4.22  5.13  3.47   71.00  120K   9.34  -8.88
Polling=5  128  2238.49   99.97 5.45  3.66  3.19   92.00  24K11.66 13.82
Polling=7  128  2330.97   99.97 5.59  3.51  3.14   95.00  19.5K  11.96 16.70
Polling=10 128  2375.78   99.97 5.69  3.45  3.14   98.00  10K12.00 17.14
Polling=25 128  2655.01   99.97 2.45  3.09  1.21   99.00  8.5K   13.34 30.25
Polling=50 128  2655.01   99.97 2.45  3.09  1.21   99.00  8.5K   13.34 30.25
  
NoPolling  25   2558.10   99.97 2.33  3.20  1.20   67.00  120K   15.32 
Polling=1  25   2508.93   99.97 3.13  3.27  1.67   75.00  125K   14.34 -6.41
Polling=5  25   3740.34   99.97 2.70  2.19  0.95   94.00  17K19.28 25.86
Polling=7  25   3692.69   99.97 2.80  2.22  0.99   97.00  15.5K  18.75 22.37
Polling=10 25   4036.60   99.97 2.69  2.03  0.87   99.00  8.5K   20.29 32.42
Polling=25 25   3998.89   99.97 2.64  2.05  0.87   99.00  8.5K   20.10 31.18
Polling=50 25   3998.89   99.97 2.64  2.05  0.87   99.00  8.5K   20.10 31.18
  
NoPolling  512  4531.50   99.90 2.75  1.81  0.79   78.00  55K25.47 
Polling=1  512  4684.19   99.95 2.69  1.75  0.75   83.00  35K25.60  0.52
Polling=5  512  4932.65   99.75 2.75  1.68  0.74   91.00  12K25.86  1.52
Polling=7  512  5226.14   99.86 2.80  1.57  0.70   95.00  7.5K   26.82  5.30
Polling=10 512  5464.90   99.60 2.90  1.49  0.70   96.00  8.2K   27.94  9.69
Polling=25 512  5550.44   99.58 2.84  1.47  0.67   99.00  7.5K   27.95  9.73
Polling=50 512  5550.44   99.58 2.84  1.47  0.67   99.00  7.5K   27.95  9.73


As you can see from the last column, polling improves performance in most cases.

I ran memcached (macro benchmark), where (as in the previous benchmark) the vm 
and 
vhost each get their own dedicated core. I configured memslap with C=128, T=8, 
as 
this configuration was required to produce enough load to saturate the vm.
I tried several other configurations, but this one produced the maximal 
throughput(for the baseline). 
  
The numbers for memcached (macro benchmark):

polling time   TPS Netvhost vm   exits  TPS/cpu  TPS/cpu
mode   rate   util  util /sec % change
  %   
Disabled15.9s  125819  91.5   4599   87K873.74   
polling=1   15.8s  126820  92.3   6099   87K797.61   -8.71
polling=5   12.82  155799  113.4  7999   25.5K  875.280.18
polling=10  11.7s  160639  116.9  8399   16.3K  882.631.02
pollling=15 12.4s  160897  117.2  8799   15K865.04   -1.00
polling=100 11.7s  170971  124.4  9999   30 863.49   -1.17


For memcached TPS/cpu does not show a significant difference in any of the 
cases. 
However, TPS numbers did improve in up to 35%, which can be useful for 
under-utilized 
systems which have cpu time to spare for extra throughput. 

If it makes sense to you, I will continue with the other changes requested for 
the patch.

Thank you,
Razya




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] vhost: Add polling mode

2014-08-24 Thread Razya Ladelsky
David Laight david.lai...@aculab.com wrote on 21/08/2014 05:29:41 PM:

 From: David Laight david.lai...@aculab.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL, Michael S. Tsirkin 
m...@redhat.com
 Cc: abel.gor...@gmail.com abel.gor...@gmail.com, Alex Glikson/
 Haifa/IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Joel Nider/Haifa/
 IBM@IBMIL, kvm@vger.kernel.org kvm@vger.kernel.org, linux-
 ker...@vger.kernel.org linux-ker...@vger.kernel.org, 
 net...@vger.kernel.org net...@vger.kernel.org, 
 virtualizat...@lists.linux-foundation.org 
 virtualizat...@lists.linux-foundation.org, Yossi 
Kuperman1/Haifa/IBM@IBMIL
 Date: 21/08/2014 05:31 PM
 Subject: RE: [PATCH] vhost: Add polling mode
 
 From: Razya Ladelsky
  Michael S. Tsirkin m...@redhat.com wrote on 20/08/2014 01:57:10 PM:
  
Results:
   
Netperf, 1 vm:
The polling patch improved throughput by ~33% (1516 MB/sec - 
 2046 MB/sec).
Number of exits/sec decreased 6x.
The same improvement was shown when I tested with 3 vms running 
netperf
(4086 MB/sec - 5545 MB/sec).
   
filebench, 1 vm:
ops/sec improved by 13% with the polling patch. Number of exits
was reduced by 31%.
The same experiment with 3 vms running filebench showed similar 
numbers.
   
Signed-off-by: Razya Ladelsky ra...@il.ibm.com
  
   This really needs more thourough benchmarking report, including
   system data.  One good example for a related patch:
   http://lwn.net/Articles/551179/
   though for virtualization, we need data about host as well, and if 
you
   want to look at streaming benchmarks, you need to test different 
message
   sizes and measure packet size.
  
  
  Hi Michael,
  I have already tried running netperf with several message sizes:
  64,128,256,512,600,800...
  But the results are inconsistent even in the baseline/unpatched
  configuration.
  For smaller msg sizes, I get consistent numbers. However, at some 
point,
  when I increase the msg size
  I get unstable results. For example, for a 512B msg, I get two 
scenarios:
  vm utilization 100%, vhost utilization 75%, throughput ~6300
  vm utilization 80%, vhost utilization 13%, throughput ~9400 (line 
rate)
  
  I don't know why vhost is behaving that way for certain message sizes.
  Do you have any insight to why this is happening?
 
 Have you tried looking at the actual ethernet packet sizes.
 It may well jump between using small packets (the size of the writes)
 and full sized ones.

I will check it,
Thanks,
Razya

 
 If you are trying to measure ethernet packet 'cost' you need to use UDP.
 However that probably uses different code paths.
 
David
 
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Add polling mode

2014-08-21 Thread Razya Ladelsky
Christian Borntraeger borntrae...@de.ibm.com wrote on 20/08/2014 
11:41:32 AM:


  
  Results:
  
  Netperf, 1 vm:
  The polling patch improved throughput by ~33% (1516 MB/sec - 2046 
MB/sec).
  Number of exits/sec decreased 6x.
  The same improvement was shown when I tested with 3 vms running 
netperf
  (4086 MB/sec - 5545 MB/sec).
  
  filebench, 1 vm:
  ops/sec improved by 13% with the polling patch. Number of exits 
 was reduced by
  31%.
  The same experiment with 3 vms running filebench showed similar 
numbers.
  
  Signed-off-by: Razya Ladelsky ra...@il.ibm.com
 
 Gave it a quick try on s390/kvm. As expected it makes no difference 
 for big streaming workload like iperf.
 uperf with a 1-1 round robin got indeed faster by about 30%.
 The high CPU consumption is something that bothers me though, as 
 virtualized systems tend to be full.
 
 

Thanks for confirming the results!
The best way to use this patch would be along with a shared vhost thread 
for multiple
devices/vms, as described in:
http://domino.research.ibm.com/library/cyberdig.nsf/1e4115aea78b6e7c85256b360066f0d4/479e3578ed05bfac85257b4200427735!OpenDocument
This work assumes having a dedicated I/O core where the vhost thread 
serves multiple vms, which 
makes the high cpu utilization less of a concern. 



  +static int poll_start_rate = 0;
  +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
  +MODULE_PARM_DESC(poll_start_rate, Start continuous polling of 
 virtqueue when rate of events is at least this number per jiffy. If 
 0, never start polling.);
  +
  +static int poll_stop_idle = 3*HZ; /* 3 seconds */
  +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
  +MODULE_PARM_DESC(poll_stop_idle, Stop continuous polling of 
 virtqueue after this many jiffies of no work.);
 
 This seems ridicoudly high. Even one jiffie is an eternity, so 
 setting it to 1 as a default would reduce the CPU overhead for most 
cases.
 If we dont have a packet in one millisecond, we can surely go back 
 to the kick approach, I think.
 
 Christian
 

Good point, will reduce it and recheck.
Thank you,
Razya

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Add polling mode

2014-08-21 Thread Razya Ladelsky
Michael S. Tsirkin m...@redhat.com wrote on 20/08/2014 01:57:10 PM:

  Results:
  
  Netperf, 1 vm:
  The polling patch improved throughput by ~33% (1516 MB/sec - 2046 
MB/sec).
  Number of exits/sec decreased 6x.
  The same improvement was shown when I tested with 3 vms running 
netperf
  (4086 MB/sec - 5545 MB/sec).
  
  filebench, 1 vm:
  ops/sec improved by 13% with the polling patch. Number of exits 
 was reduced by
  31%.
  The same experiment with 3 vms running filebench showed similar 
numbers.
  
  Signed-off-by: Razya Ladelsky ra...@il.ibm.com
 
 This really needs more thourough benchmarking report, including
 system data.  One good example for a related patch:
 http://lwn.net/Articles/551179/
 though for virtualization, we need data about host as well, and if you
 want to look at streaming benchmarks, you need to test different message
 sizes and measure packet size.


Hi Michael,
I have already tried running netperf with several message sizes: 
64,128,256,512,600,800...
But the results are inconsistent even in the baseline/unpatched 
configuration.
For smaller msg sizes, I get consistent numbers. However, at some point, 
when I increase the msg size
I get unstable results. For example, for a 512B msg, I get two scenarios:
vm utilization 100%, vhost utilization 75%, throughput ~6300 
vm utilization 80%, vhost utilization 13%, throughput ~9400 (line rate)

I don't know why vhost is behaving that way for certain message sizes.
Do you have any insight to why this is happening?
Thank you,
Razya
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Add polling mode

2014-08-19 Thread Razya Ladelsky
 That was just one example. There many other possibilities.  Either
 actually make the systems load all host CPUs equally, or divide
 throughput by host CPU.
 

The polling patch adds this capability to vhost, reducing costly exit 
overhead when the vm is loaded.

In order to load the vm I ran netperf  with msg size of 256:

Without polling:  2480 Mbits/sec,  utilization: vm - 100%   vhost - 64% 
With Polling: 4160 Mbits/sec,  utilization: vm - 100%   vhost - 100% 

Therefore, throughput/cpu without polling is 15.1, and 20.8 with polling.

My intention was to load vhost as close as possible to 100% utilization 
without polling, in order to compare it to the polling utilization case 
(where vhost is always 100%). 
The best use case, of course, would be when the shared vhost thread work 
(TBD) is integrated and then vhost will actually be using its polling 
cycles to handle requests of multiple devices (even from multiple vms).

Thanks,
Razya


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Add polling mode

2014-08-17 Thread Razya Ladelsky
  
  Hi Michael,
  
  Sorry for the delay, had some problems with my mailbox, and I realized 

  just now that 
  my reply wasn't sent.
  The vm indeed ALWAYS utilized 100% cpu, whether polling was enabled or 

  not.
  The vhost thread utilized less than 100% (of the other cpu) when 
polling 
  was disabled.
  Enabling polling increased its utilization to 100% (in which case both 

  cpus were 100% utilized). 
 
 Hmm this means the testing wasn't successful then, as you said:
 
The idea was to get it 100% loaded, so we can see that the polling is
getting it to produce higher throughput.
 
 in fact here you are producing more throughput but spending more power
 to produce it, which can have any number of explanations besides polling
 improving the efficiency. For example, increasing system load might
 disable host power management.


Hi Michael,
I re-ran the tests, this time with the  turbo mode and  C-states 
features off.
No Polling:
1 VM running netperf (msg size 64B): 1107 Mbits/sec
 Polling:
1 VM running netperf (msg size 64B): 1572 Mbits/sec








As you can see from the new results, the numbers are lower, 
but relatively (polling on/off) there's no change.
Thank you,
Razya


 


 
 
   -- 
   MST
   
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Add polling mode

2014-08-12 Thread Razya Ladelsky
Michael S. Tsirkin m...@redhat.com wrote on 12/08/2014 12:18:50 PM:

 From: Michael S. Tsirkin m...@redhat.com
 To: David Miller da...@davemloft.net
 Cc: Razya Ladelsky/Haifa/IBM@IBMIL, kvm@vger.kernel.org, Alex 
 Glikson/Haifa/IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Yossi 
 Kuperman1/Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL, 
 abel.gor...@gmail.com, linux-ker...@vger.kernel.org, 
 net...@vger.kernel.org, virtualizat...@lists.linux-foundation.org
 Date: 12/08/2014 12:18 PM
 Subject: Re: [PATCH] vhost: Add polling mode
 
 On Mon, Aug 11, 2014 at 12:46:21PM -0700, David Miller wrote:
  From: Michael S. Tsirkin m...@redhat.com
  Date: Sun, 10 Aug 2014 21:45:59 +0200
  
   On Sun, Aug 10, 2014 at 11:30:35AM +0300, Razya Ladelsky wrote:
   ...
   And, did your tests actually produce 100% load on both host CPUs?
   ...
  
  Michael, please do not quote an entire patch just to ask a one line
  question.
  
  I truly, truly, wish it was simpler in modern email clients to delete
  the unrelated quoted material because I bet when people do this they
  are simply being lazy.
  
  Thank you.
 
 Lazy - mea culpa, though I'm using mutt so it isn't even hard.
 
 The question still stands: the test results are only valid
 if CPU was at 100% in all configurations.
 This is the reason I generally prefer it when people report
 throughput divided by CPU (power would be good too but it still
 isn't easy for people to get that number).
 

Hi Michael,

Sorry for the delay, had some problems with my mailbox, and I realized 
just now that 
my reply wasn't sent.
The vm indeed ALWAYS utilized 100% cpu, whether polling was enabled or 
not.
The vhost thread utilized less than 100% (of the other cpu) when polling 
was disabled.
Enabling polling increased its utilization to 100% (in which case both 
cpus were 100% utilized). 


 -- 
 MST
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vhost: Add polling mode

2014-08-10 Thread Razya Ladelsky
From: Razya Ladelsky ra...@il.ibm.com
Date: Thu, 31 Jul 2014 09:47:20 +0300
Subject: [PATCH] vhost: Add polling mode

When vhost is waiting for buffers from the guest driver (e.g., more packets to
send in vhost-net's transmit queue), it normally goes to sleep and waits for the
guest to kick it. This kick involves a PIO in the guest, and therefore an exit
(and possibly userspace involvement in translating this PIO exit into a file
descriptor event), all of which hurts performance.

If the system is under-utilized (has cpu time to spare), vhost can continuously
poll the virtqueues for new buffers, and avoid asking the guest to kick us.
This patch adds an optional polling mode to vhost, that can be enabled via a
kernel module parameter, poll_start_rate.

When polling is active for a virtqueue, the guest is asked to disable
notification (kicks), and the worker thread continuously checks for new buffers.
When it does discover new buffers, it simulates a kick by invoking the
underlying backend driver (such as vhost-net), which thinks it got a real kick
from the guest, and acts accordingly. If the underlying driver asks not to be
kicked, we disable polling on this virtqueue.

We start polling on a virtqueue when we notice it has work to do. Polling on
this virtqueue is later disabled after 3 seconds of polling turning up no new
work, as in this case we are better off returning to the exit-based notification
mechanism. The default timeout of 3 seconds can be changed with the
poll_stop_idle kernel module parameter.

This polling approach makes lot of sense for new HW with posted-interrupts for
which we have exitless host-to-guest notifications. But even with support for
posted interrupts, guest-to-host communication still causes exits. Polling adds
the missing part.

When systems are overloaded, there won't be enough cpu time for the various
vhost threads to poll their guests' devices. For these scenarios, we plan to add
support for vhost threads that can be shared by multiple devices, even of
multiple vms.
Our ultimate goal is to implement the I/O acceleration features described in:
KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon)
https://www.youtube.com/watch?v=9EyweibHfEs
and
https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html

I ran some experiments with TCP stream netperf and filebench (having 2 threads
performing random reads) benchmarks on an IBM System x3650 M4.
I have two machines, A and B. A hosts the vms, B runs the netserver.
The vms (on A) run netperf, its destination server is running on B.
All runs loaded the guests in a way that they were (cpu) saturated. For example,
I ran netperf with 64B messages, which is heavily loading the vm (which is why
its throughput is low).
The idea was to get it 100% loaded, so we can see that the polling is getting it
to produce higher throughput.

The system had two cores per guest, as to allow for both the vcpu and the vhost
thread to run concurrently for maximum throughput (but I didn't pin the threads
to specific cores).
My experiments were fair in a sense that for both cases, with or without
polling, I run both threads, vcpu and vhost, on 2 cores (set their affinity that
way). The only difference was whether polling was enabled/disabled.

Results:

Netperf, 1 vm:
The polling patch improved throughput by ~33% (1516 MB/sec - 2046 MB/sec).
Number of exits/sec decreased 6x.
The same improvement was shown when I tested with 3 vms running netperf
(4086 MB/sec - 5545 MB/sec).

filebench, 1 vm:
ops/sec improved by 13% with the polling patch. Number of exits was reduced by
31%.
The same experiment with 3 vms running filebench showed similar numbers.

Signed-off-by: Razya Ladelsky ra...@il.ibm.com
---
 drivers/vhost/net.c   |6 +-
 drivers/vhost/scsi.c  |6 +-
 drivers/vhost/vhost.c |  245 +++--
 drivers/vhost/vhost.h |   38 +++-
 4 files changed, 277 insertions(+), 18 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 971a760..558aecb 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -742,8 +742,10 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
}
vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
 
-   vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
-   vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
+   vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT,
+   vqs[VHOST_NET_VQ_TX]);
+   vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN,
+   vqs[VHOST_NET_VQ_RX]);
 
f-private_data = n;
 
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 4f4ffa4..665eeeb 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1528,9 +1528,9 @@ static int vhost_scsi_open(struct inode *inode, struct 
file *f)
if (!vqs)
goto err_vqs;
 
-   vhost_work_init(vs

[PATCH v2] vhost: Add polling mode

2014-07-31 Thread Razya Ladelsky
Resubmitting the patch in: http://marc.info/?l=kvmm=140594903520308w=2
after fixing the whitespaces issues.
Thank you,
Razya

From f293e470b36ff9eb4910540c620315c418e4a8fc Mon Sep 17 00:00:00 2001
From: Razya Ladelsky ra...@il.ibm.com
Date: Thu, 31 Jul 2014 09:47:20 +0300
Subject: [PATCH] vhost: Add polling mode

Add an optional polling mode to continuously poll the virtqueues
for new buffers, and avoid asking the guest to kick us.

Signed-off-by: Razya Ladelsky ra...@il.ibm.com
---
 drivers/vhost/net.c   |6 +-
 drivers/vhost/scsi.c  |6 +-
 drivers/vhost/vhost.c |  245 +++--
 drivers/vhost/vhost.h |   38 +++-
 4 files changed, 277 insertions(+), 18 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 971a760..558aecb 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -742,8 +742,10 @@ static int vhost_net_open(struct inode *inode, struct file 
*f)
}
vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
 
-   vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
-   vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
+   vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT,
+   vqs[VHOST_NET_VQ_TX]);
+   vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN,
+   vqs[VHOST_NET_VQ_RX]);
 
f-private_data = n;
 
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 4f4ffa4..665eeeb 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1528,9 +1528,9 @@ static int vhost_scsi_open(struct inode *inode, struct 
file *f)
if (!vqs)
goto err_vqs;
 
-   vhost_work_init(vs-vs_completion_work, vhost_scsi_complete_cmd_work);
-   vhost_work_init(vs-vs_event_work, tcm_vhost_evt_work);
-
+   vhost_work_init(vs-vs_completion_work, NULL,
+   vhost_scsi_complete_cmd_work);
+   vhost_work_init(vs-vs_event_work, NULL, tcm_vhost_evt_work);
vs-vs_events_nr = 0;
vs-vs_events_missed = false;
 
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c90f437..fbe8174 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -24,9 +24,17 @@
 #include linux/slab.h
 #include linux/kthread.h
 #include linux/cgroup.h
+#include linux/jiffies.h
 #include linux/module.h
 
 #include vhost.h
+static int poll_start_rate = 0;
+module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(poll_start_rate, Start continuous polling of virtqueue when 
rate of events is at least this number per jiffy. If 0, never start polling.);
+
+static int poll_stop_idle = 3*HZ; /* 3 seconds */
+module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(poll_stop_idle, Stop continuous polling of virtqueue after 
this many jiffies of no work.);
 
 enum {
VHOST_MEMORY_MAX_NREGIONS = 64,
@@ -58,27 +66,28 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned 
mode, int sync,
return 0;
 }
 
-void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
+void vhost_work_init(struct vhost_work *work, struct vhost_virtqueue *vq,
+   vhost_work_fn_t fn)
 {
INIT_LIST_HEAD(work-node);
work-fn = fn;
init_waitqueue_head(work-done);
work-flushing = 0;
work-queue_seq = work-done_seq = 0;
+   work-vq = vq;
 }
 EXPORT_SYMBOL_GPL(vhost_work_init);
 
 /* Init poll structure */
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
-unsigned long mask, struct vhost_dev *dev)
+unsigned long mask, struct vhost_virtqueue *vq)
 {
init_waitqueue_func_entry(poll-wait, vhost_poll_wakeup);
init_poll_funcptr(poll-table, vhost_poll_func);
poll-mask = mask;
-   poll-dev = dev;
+   poll-dev = vq-dev;
poll-wqh = NULL;
-
-   vhost_work_init(poll-work, fn);
+   vhost_work_init(poll-work, vq, fn);
 }
 EXPORT_SYMBOL_GPL(vhost_poll_init);
 
@@ -174,6 +183,86 @@ void vhost_poll_queue(struct vhost_poll *poll)
 }
 EXPORT_SYMBOL_GPL(vhost_poll_queue);
 
+/* Enable or disable virtqueue polling (vqpoll.enabled) for a virtqueue.
+ *
+ * Enabling this mode it tells the guest not to notify (kick) us when its
+ * has made more work available on this virtqueue; Rather, we will continuously
+ * poll this virtqueue in the worker thread. If multiple virtqueues are polled,
+ * the worker thread polls them all, e.g., in a round-robin fashion.
+ * Note that vqpoll.enabled doesn't always mean that this virtqueue is
+ * actually being polled: The backend (e.g., net.c) may temporarily disable it
+ * using vhost_disable/enable_notify(), while vqpoll.enabled is unchanged.
+ *
+ * It is assumed that these functions are called relatively rarely, when vhost
+ * notices that this virtqueue's usage pattern significantly changed

Re: [PATCH] vhost: Add polling mode

2014-07-30 Thread Razya Ladelsky
kvm-ow...@vger.kernel.org wrote on 29/07/2014 03:40:18 PM:

 From: Michael S. Tsirkin m...@redhat.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL, 
 Cc: abel.gor...@gmail.com, Alex Glikson/Haifa/IBM@IBMIL, Eran 
 Raichstein/Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL, 
 kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, Yossi Kuperman1/
 Haifa/IBM@IBMIL
 Date: 29/07/2014 03:40 PM
 Subject: Re: [PATCH] vhost: Add polling mode
 Sent by: kvm-ow...@vger.kernel.org
 
 On Tue, Jul 29, 2014 at 03:23:59PM +0300, Razya Ladelsky wrote:
   
   Hmm there aren't a lot of numbers there :(. Speed increased by 33% 
but
   by how much?  E.g. maybe you are getting from 1Mbyte/sec to 1.3,
   if so it's hard to get excited about it. 
  
  Netperf 1 VM: 1516 MB/sec - 2046 MB/sec
  and for 3 VMs: 4086 MB/sec - 5545 MB/sec
 
 What do you mean by 1 VM? Streaming TCP host to vm?
 Also, your throughput is somewhat low, it's worth seeing
 why you can't hit higher speeds.
 

My configuration is this:
I have two machines, A and B.
A hosts the vms, B runs the netserver.
One vm (on A) runs netperf, where the its destination server is running on 
B. 

I ran netperf with 64B messages, which is heavily loading the vm, which is 
why its throughput is low.
The idea was to get it 100% loaded, so we can see that the polling is 
getting it to produce higher throughput. 



   Some questions that come to
   mind: what was the message size? I would expect several measurements
   with different values.  How did host CPU utilization change?
   
  
  message size  was 64B in order to get the VM to be cpu saturated. 
  so vhost had 99% cpu and vhost 38%, with the polling patch both had 
99%.
 
 Hmm so a net loss in throughput/CPU.
 

Actually, my experiments were fair in a sense that for both cases, 
with or without polling, I run both threads, vcpu and vhost, on 2 cores 
(set their affinity that way).
The only difference was whether polling was enabled/disabled. 


  
  
   What about latency? As we are competing with guest for host CPU,
   would worst-case or average latency suffer?
   
  
  Polling indeed doesn't make a lot of sense if there aren't enough 
  available cores.
  In these cases polling should not be used.
  
  Thank you,
  Razya
 
 OK but scheduler might run vm and vhost on the same cpu
 even if cores are available.
 This needs to be detected somehow and polling disabled.
 
 
  
  
   Thanks,
   
   -- 
   MST
   --
   To unsubscribe from this list: send the line unsubscribe kvm in
   the body of a message to majord...@vger.kernel.org
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
   
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Add polling mode

2014-07-29 Thread Razya Ladelsky
kvm-ow...@vger.kernel.org wrote on 29/07/2014 04:30:34 AM:

 From: Zhang Haoyu zhan...@sangfor.com
 To: Jason Wang jasow...@redhat.com, Abel Gordon 
 abel.gor...@gmail.com, 
 Cc: Razya Ladelsky/Haifa/IBM@IBMIL, Alex Glikson/Haifa/IBM@IBMIL, 
 Eran Raichstein/Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL, kvm 
 kvm@vger.kernel.org, Michael S. Tsirkin m...@redhat.com, Yossi 
 Kuperman1/Haifa/IBM@IBMIL
 Date: 29/07/2014 04:35 AM
 Subject: Re: [PATCH] vhost: Add polling mode
 Sent by: kvm-ow...@vger.kernel.org
 
 Maybe tie a knot between vhost-net scalability tuning: threading 
 for many VMs and vhost: Add polling mode is a good marriage,
 because it's more possibility to get work to do with less polling 
 time, so less cpu cycles waste.
 

Hi Zhang,
Indeed have one vhost thread shared by multiple vms, polling for their 
requests is
the ultimate goal of this plan.
The current challenge with it is that the cgroup mechanism needs to be 
supported/incorporated somehow by this shared vhost thread, as it now 
serves multiple vms(processes).
B.T.W. - if someone wants to help with this effort (mainly the cgroup 
issue),
it would be greatly appreciated...! 
 
Thank you,
Razya 

 Thanks,
 Zhang Haoyu
 
Hello All,
   
When vhost is waiting for buffers from the guest driver
 (e.g., more
packets
to send in vhost-net's transmit queue), it normally 
 goes to sleep and
waits
for the guest to kick it. This kick involves a PIO in
 the guest, and
therefore an exit (and possibly userspace involvement 
 in translating
this
PIO
exit into a file descriptor event), all of which hurts 
 performance.
   
If the system is under-utilized (has cpu time to 
 spare), vhost can
continuously poll the virtqueues for new buffers, and 
 avoid asking
the guest to kick us.
This patch adds an optional polling mode to vhost, that
 can be enabled
via a kernel module parameter, poll_start_rate.
   
When polling is active for a virtqueue, the guest is asked 
to
disable notification (kicks), and the worker thread 
continuously
checks
for
new buffers. When it does discover new buffers, it 
 simulates a kick
by
invoking the underlying backend driver (such as vhost-net), 
which
thinks
it
got a real kick from the guest, and acts accordingly. If 
the
underlying
driver asks not to be kicked, we disable polling on 
 this virtqueue.
   
We start polling on a virtqueue when we notice it has
work to do. Polling on this virtqueue is later disabled 
after 3
seconds of
polling turning up no new work, as in this case we are 
better off
returning
to the exit-based notification mechanism. The default 
 timeout of 3
seconds
can be changed with the poll_stop_idle kernel module 
parameter.
   
This polling approach makes lot of sense for new HW with
posted-interrupts
for which we have exitless host-to-guest notifications.
 But even with
support
for posted interrupts, guest-to-host communication 
 still causes exits.
Polling adds the missing part.
   
When systems are overloaded, there won?t be enough cpu 
 time for the
various
vhost threads to poll their guests' devices. For these 
 scenarios, we
plan
to add support for vhost threads that can be shared by 
multiple
devices,
even of multiple vms.
Our ultimate goal is to implement the I/O acceleration 
features
described
in:
KVM Forum 2013: Efficient and Scalable Virtio (by Abel 
Gordon)
https://www.youtube.com/watch?v=9EyweibHfEs
and

https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html
   
   
Comments are welcome,
Thank you,
Razya
Thanks for the work. Do you have perf numbers for this?
   
Hi Jason,
Thanks for reviewing. I ran some experiments with TCP 
 stream netperf and
filebench (having 2 threads performing random reads) 
 benchmarks on an IBM
System x3650 M4.
All runs loaded the guests in a way that they were (cpu) 
saturated.
The system had two cores per guest, as to allow for both 
 the vcpu and the
vhost thread to
run concurrently for maximum throughput (but I didn't pin 
 the threads to
specific cores)
I get:
   
Netperf, 1 vm:
The polling patch improved throughput by ~33%. Number of 
exits/sec
decreased 6x.
The same improvement was shown when I tested with 3 vms 
 running netperf.
   
filebench, 1 vm:
ops/sec improved by 13% with the polling patch. Number of exits 
was
reduced by 31%.
The same experiment with 3 vms running filebench showed 
 similar numbers.
  
   Looks good, may worth to add the result in the commit log.
   
And looks like the patch only poll for virtqueue. In the 
 future, may
worth to add callbacks for vhost_net to poll socket. Then
 it could be
used with rx busy polling in host which may speedup the rx 
also.
Did you mean polling the network device to avoid

Re: [PATCH] vhost: Add polling mode

2014-07-29 Thread Razya Ladelsky
Michael S. Tsirkin m...@redhat.com wrote on 29/07/2014 11:06:40 AM:

 From: Michael S. Tsirkin m...@redhat.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL, 
 Cc: kvm@vger.kernel.org, abel.gor...@gmail.com, Joel Nider/Haifa/
 IBM@IBMIL, Yossi Kuperman1/Haifa/IBM@IBMIL, Eran Raichstein/Haifa/
 IBM@IBMIL, Alex Glikson/Haifa/IBM@IBMIL
 Date: 29/07/2014 11:06 AM
 Subject: Re: [PATCH] vhost: Add polling mode
 
 On Mon, Jul 21, 2014 at 04:23:44PM +0300, Razya Ladelsky wrote:
  Hello All,
  
  When vhost is waiting for buffers from the guest driver (e.g., more 
  packets
  to send in vhost-net's transmit queue), it normally goes to sleep and 
  waits
  for the guest to kick it. This kick involves a PIO in the guest, and
  therefore an exit (and possibly userspace involvement in translating 
this 
  PIO
  exit into a file descriptor event), all of which hurts performance.
  
  If the system is under-utilized (has cpu time to spare), vhost can 
  continuously poll the virtqueues for new buffers, and avoid asking 
  the guest to kick us.
  This patch adds an optional polling mode to vhost, that can be enabled 

  via a kernel module parameter, poll_start_rate.
  
  When polling is active for a virtqueue, the guest is asked to
  disable notification (kicks), and the worker thread continuously 
checks 
  for
  new buffers. When it does discover new buffers, it simulates a kick 
by
  invoking the underlying backend driver (such as vhost-net), which 
thinks 
  it
  got a real kick from the guest, and acts accordingly. If the 
underlying
  driver asks not to be kicked, we disable polling on this virtqueue.
  
  We start polling on a virtqueue when we notice it has
  work to do. Polling on this virtqueue is later disabled after 3 
seconds of
  polling turning up no new work, as in this case we are better off 
  returning
  to the exit-based notification mechanism. The default timeout of 3 
seconds
  can be changed with the poll_stop_idle kernel module parameter.
  
  This polling approach makes lot of sense for new HW with 
posted-interrupts
  for which we have exitless host-to-guest notifications. But even with 
  support 
  for posted interrupts, guest-to-host communication still causes exits. 

  Polling adds the missing part.
  
  When systems are overloaded, there won?t be enough cpu time for the 
  various 
  vhost threads to poll their guests' devices. For these scenarios, we 
plan 
  to add support for vhost threads that can be shared by multiple 
devices, 
  even of multiple vms. 
  Our ultimate goal is to implement the I/O acceleration features 
described 
  in:
  KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) 
  https://www.youtube.com/watch?v=9EyweibHfEs
  and
  https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html
  
  
  Comments are welcome, 
  Thank you,
  Razya
  
  From: Razya Ladelsky ra...@il.ibm.com
  
  Add an optional polling mode to continuously poll the virtqueues
  for new buffers, and avoid asking the guest to kick us.
  
  Signed-off-by: Razya Ladelsky ra...@il.ibm.com
 
 This is an optimization patch, isn't it?
 Could you please include some numbers showing its
 effect?
 
 

Hi Michael,
Sure. I included them in a reply to Jason Wang in this thread,
Here it is:
http://www.spinics.net/linux/lists/kvm/msg106049.html




  ---
   drivers/vhost/net.c   |6 +-
   drivers/vhost/scsi.c  |5 +-
   drivers/vhost/vhost.c |  247 
  +++--
   drivers/vhost/vhost.h |   37 +++-
   4 files changed, 277 insertions(+), 18 deletions(-)
 
 
 Whitespace seems mangled to the point of making patch
 unreadable. Can you pls repost?
 

Sure.

  diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
  index 971a760..558aecb 100644
  --- a/drivers/vhost/net.c
  +++ b/drivers/vhost/net.c
  @@ -742,8 +742,10 @@ static int vhost_net_open(struct inode *inode, 
struct 
  file *f)
  }
  vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
  
  -   vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, 
POLLOUT, 
  dev);
  -   vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, 
POLLIN, 
  dev);
  +   vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, 
POLLOUT,
  +   vqs[VHOST_NET_VQ_TX]);
  +   vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, 
POLLIN,
  +   vqs[VHOST_NET_VQ_RX]);
  
  f-private_data = n;
  
  diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
  index 4f4ffa4..56f0233 100644
  --- a/drivers/vhost/scsi.c
  +++ b/drivers/vhost/scsi.c
  @@ -1528,9 +1528,8 @@ static int vhost_scsi_open(struct inode *inode, 
  struct file *f)
  if (!vqs)
  goto err_vqs;
  
  -   vhost_work_init(vs-vs_completion_work, 
  vhost_scsi_complete_cmd_work);
  -   vhost_work_init(vs-vs_event_work, tcm_vhost_evt_work);
  -
  +   vhost_work_init(vs-vs_completion_work, NULL, 
  vhost_scsi_complete_cmd_work);
  +   vhost_work_init(vs

Re: [PATCH] vhost: Add polling mode

2014-07-29 Thread Razya Ladelsky
 
 Hmm there aren't a lot of numbers there :(. Speed increased by 33% but
 by how much?  E.g. maybe you are getting from 1Mbyte/sec to 1.3,
 if so it's hard to get excited about it. 

Netperf 1 VM: 1516 MB/sec - 2046 MB/sec
and for 3 VMs: 4086 MB/sec - 5545 MB/sec

 Some questions that come to
 mind: what was the message size? I would expect several measurements
 with different values.  How did host CPU utilization change?
 

message size  was 64B in order to get the VM to be cpu saturated. 
so vhost had 99% cpu and vhost 38%, with the polling patch both had 99%.



 What about latency? As we are competing with guest for host CPU,
 would worst-case or average latency suffer?
 

Polling indeed doesn't make a lot of sense if there aren't enough 
available cores.
In these cases polling should not be used.

Thank you,
Razya



 Thanks,
 
 -- 
 MST
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Add polling mode

2014-07-23 Thread Razya Ladelsky
Jason Wang jasow...@redhat.com wrote on 23/07/2014 08:26:36 AM:

 From: Jason Wang jasow...@redhat.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL, kvm@vger.kernel.org, Michael S.
 Tsirkin m...@redhat.com, 
 Cc: abel.gor...@gmail.com, Joel Nider/Haifa/IBM@IBMIL, Yossi 
 Kuperman1/Haifa/IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Alex 
 Glikson/Haifa/IBM@IBMIL
 Date: 23/07/2014 08:26 AM
 Subject: Re: [PATCH] vhost: Add polling mode
 
 On 07/21/2014 09:23 PM, Razya Ladelsky wrote:
  Hello All,
 
  When vhost is waiting for buffers from the guest driver (e.g., more 
  packets
  to send in vhost-net's transmit queue), it normally goes to sleep and 
  waits
  for the guest to kick it. This kick involves a PIO in the guest, and
  therefore an exit (and possibly userspace involvement in translating 
this 
  PIO
  exit into a file descriptor event), all of which hurts performance.
 
  If the system is under-utilized (has cpu time to spare), vhost can 
  continuously poll the virtqueues for new buffers, and avoid asking 
  the guest to kick us.
  This patch adds an optional polling mode to vhost, that can be enabled 

  via a kernel module parameter, poll_start_rate.
 
  When polling is active for a virtqueue, the guest is asked to
  disable notification (kicks), and the worker thread continuously 
checks 
  for
  new buffers. When it does discover new buffers, it simulates a kick 
by
  invoking the underlying backend driver (such as vhost-net), which 
thinks 
  it
  got a real kick from the guest, and acts accordingly. If the 
underlying
  driver asks not to be kicked, we disable polling on this virtqueue.
 
  We start polling on a virtqueue when we notice it has
  work to do. Polling on this virtqueue is later disabled after 3 
seconds of
  polling turning up no new work, as in this case we are better off 
  returning
  to the exit-based notification mechanism. The default timeout of 3 
seconds
  can be changed with the poll_stop_idle kernel module parameter.
 
  This polling approach makes lot of sense for new HW with 
posted-interrupts
  for which we have exitless host-to-guest notifications. But even with 
  support 
  for posted interrupts, guest-to-host communication still causes exits. 

  Polling adds the missing part.
 
  When systems are overloaded, there won?t be enough cpu time for the 
  various 
  vhost threads to poll their guests' devices. For these scenarios, we 
plan 
  to add support for vhost threads that can be shared by multiple 
devices, 
  even of multiple vms. 
  Our ultimate goal is to implement the I/O acceleration features 
described 
  in:
  KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) 
  https://www.youtube.com/watch?v=9EyweibHfEs
  and
  https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html
 
  
  Comments are welcome, 
  Thank you,
  Razya
 
 Thanks for the work. Do you have perf numbers for this?
 

Hi Jason,
Thanks for reviewing. I ran some experiments with TCP stream netperf and 
filebench (having 2 threads performing random reads) benchmarks on an IBM 
System x3650 M4.
All runs loaded the guests in a way that they were (cpu) saturated.
The system had two cores per guest, as to allow for both the vcpu and the 
vhost thread to
run concurrently for maximum throughput (but I didn't pin the threads to 
specific cores)
I get:

Netperf, 1 vm:
The polling patch improved throughput by ~33%. Number of exits/sec 
decreased 6x.
The same improvement was shown when I tested with 3 vms running netperf.

filebench, 1 vm:
ops/sec improved by 13% with the polling patch. Number of exits was 
reduced by 31%.
The same experiment with 3 vms running filebench showed similar numbers.


 And looks like the patch only poll for virtqueue. In the future, may
 worth to add callbacks for vhost_net to poll socket. Then it could be
 used with rx busy polling in host which may speedup the rx also.

Did you mean polling the network device to avoid interrupts?

  
  diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
  index c90f437..678d766 100644
  --- a/drivers/vhost/vhost.c
  +++ b/drivers/vhost/vhost.c
  @@ -24,9 +24,17 @@
   #include linux/slab.h
   #include linux/kthread.h
   #include linux/cgroup.h
  +#include linux/jiffies.h
   #include linux/module.h
  
   #include vhost.h
  +static int poll_start_rate = 0;
  +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
  +MODULE_PARM_DESC(poll_start_rate, Start continuous polling of 
virtqueue 
  when rate of events is at least this number per jiffy. If 0, never 
start 
  polling.);
  +
  +static int poll_stop_idle = 3*HZ; /* 3 seconds */
  +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
  +MODULE_PARM_DESC(poll_stop_idle, Stop continuous polling of 
virtqueue 
  after this many jiffies of no work.);
  
 
 I'm not sure using jiffy is good enough since user need know HZ value.
 May worth to look at sk_busy_loop() which use sched_clock() and us. 

Ok, Will look into it, thanks.

  
  +/* Enable or disable virtqueue polling

[PATCH] vhost: Add polling mode

2014-07-21 Thread Razya Ladelsky
Hello All,

When vhost is waiting for buffers from the guest driver (e.g., more 
packets
to send in vhost-net's transmit queue), it normally goes to sleep and 
waits
for the guest to kick it. This kick involves a PIO in the guest, and
therefore an exit (and possibly userspace involvement in translating this 
PIO
exit into a file descriptor event), all of which hurts performance.

If the system is under-utilized (has cpu time to spare), vhost can 
continuously poll the virtqueues for new buffers, and avoid asking 
the guest to kick us.
This patch adds an optional polling mode to vhost, that can be enabled 
via a kernel module parameter, poll_start_rate.

When polling is active for a virtqueue, the guest is asked to
disable notification (kicks), and the worker thread continuously checks 
for
new buffers. When it does discover new buffers, it simulates a kick by
invoking the underlying backend driver (such as vhost-net), which thinks 
it
got a real kick from the guest, and acts accordingly. If the underlying
driver asks not to be kicked, we disable polling on this virtqueue.

We start polling on a virtqueue when we notice it has
work to do. Polling on this virtqueue is later disabled after 3 seconds of
polling turning up no new work, as in this case we are better off 
returning
to the exit-based notification mechanism. The default timeout of 3 seconds
can be changed with the poll_stop_idle kernel module parameter.

This polling approach makes lot of sense for new HW with posted-interrupts
for which we have exitless host-to-guest notifications. But even with 
support 
for posted interrupts, guest-to-host communication still causes exits. 
Polling adds the missing part.

When systems are overloaded, there won?t be enough cpu time for the 
various 
vhost threads to poll their guests' devices. For these scenarios, we plan 
to add support for vhost threads that can be shared by multiple devices, 
even of multiple vms. 
Our ultimate goal is to implement the I/O acceleration features described 
in:
KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) 
https://www.youtube.com/watch?v=9EyweibHfEs
and
https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html

 
Comments are welcome, 
Thank you,
Razya

From: Razya Ladelsky ra...@il.ibm.com

Add an optional polling mode to continuously poll the virtqueues
for new buffers, and avoid asking the guest to kick us.

Signed-off-by: Razya Ladelsky ra...@il.ibm.com
---
 drivers/vhost/net.c   |6 +-
 drivers/vhost/scsi.c  |5 +-
 drivers/vhost/vhost.c |  247 
+++--
 drivers/vhost/vhost.h |   37 +++-
 4 files changed, 277 insertions(+), 18 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 971a760..558aecb 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -742,8 +742,10 @@ static int vhost_net_open(struct inode *inode, struct 
file *f)
}
vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
 
-   vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, 
dev);
-   vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, 
dev);
+   vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT,
+   vqs[VHOST_NET_VQ_TX]);
+   vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN,
+   vqs[VHOST_NET_VQ_RX]);
 
f-private_data = n;
 
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 4f4ffa4..56f0233 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -1528,9 +1528,8 @@ static int vhost_scsi_open(struct inode *inode, 
struct file *f)
if (!vqs)
goto err_vqs;
 
-   vhost_work_init(vs-vs_completion_work, 
vhost_scsi_complete_cmd_work);
-   vhost_work_init(vs-vs_event_work, tcm_vhost_evt_work);
-
+   vhost_work_init(vs-vs_completion_work, NULL, 
vhost_scsi_complete_cmd_work);
+   vhost_work_init(vs-vs_event_work, NULL, tcm_vhost_evt_work);
vs-vs_events_nr = 0;
vs-vs_events_missed = false;
 
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index c90f437..678d766 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -24,9 +24,17 @@
 #include linux/slab.h
 #include linux/kthread.h
 #include linux/cgroup.h
+#include linux/jiffies.h
 #include linux/module.h
 
 #include vhost.h
+static int poll_start_rate = 0;
+module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(poll_start_rate, Start continuous polling of virtqueue 
when rate of events is at least this number per jiffy. If 0, never start 
polling.);
+
+static int poll_stop_idle = 3*HZ; /* 3 seconds */
+module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(poll_stop_idle, Stop continuous polling of virtqueue 
after this many jiffies of no work.);
 
 enum {
VHOST_MEMORY_MAX_NREGIONS = 64,
@@ -58,27 +66,27 @@ static int vhost_poll_wakeup(wait_queue_t *wait, 
unsigned mode, int sync,
return 0

Re: Updated Elvis Upstreaming Roadmap

2013-12-24 Thread Razya Ladelsky
Hi,

To summarize the issues raised and following steps:

1. Shared vhost thread will support multiple vms, while supporting 
cgroups. 
As soon as we have a design to support cgroups with multiple vms, we'll 
share it.

2. Adding vhost polling mode: this patch can be submitted independently 
from (1).
We'll add a condition that will be checked periodically, in order to stop 
polling 
if the guest is not running (scheduled out) at that time. 

3. Implement good heuristics (policies) in the vhost module for 
adding/removing vhost
threads. We will not expose an interface to user-space at this time.


Thank you,
Razya

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Updated Elvis Upstreaming Roadmap

2013-12-24 Thread Razya Ladelsky
Gleb Natapov g...@minantech.com wrote on 24/12/2013 06:21:03 PM:

 From: Gleb Natapov g...@kernel.org
 To: Razya Ladelsky/Haifa/IBM@IBMIL, 
 Cc: Michael S. Tsirkin m...@redhat.com, abel.gor...@gmail.com, 
 Anthony Liguori anth...@codemonkey.ws, as...@redhat.com, 
 digitale...@google.com, Eran Raichstein/Haifa/IBM@IBMIL, 
 g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/IBM@IBMIL, 
 kvm@vger.kernel.org, kvm-ow...@vger.kernel.org, pbonz...@redhat.com,
 Stefan Hajnoczi stefa...@gmail.com, Yossi Kuperman1/Haifa/
 IBM@IBMIL, Eyal Moscovici/Haifa/IBM@IBMIL, b...@redhat.com
 Date: 24/12/2013 06:21 PM
 Subject: Re: Updated Elvis Upstreaming Roadmap
 Sent by: Gleb Natapov g...@minantech.com
 
 On Tue, Dec 17, 2013 at 12:04:42PM +0200, Razya Ladelsky wrote:
  4. vhost statistics 
  
  The issue that was raised for the vhost statistics was using ftrace 
  instead of the debugfs mechanism.
  However, looking further into the kvm stat mechanism, we learned that 
  ftrace didn't replace the plain debugfs mechanism, but was used in 
  addition to it.
  
 It did. Statistics in debugfs is deprecated. No new statistics are
 added there.  kvm_stat is using ftrace now (if available) and of course
 ftrace gives seamless integration with perf.


O.k, I understand.
We'll look more into ftrace to see that it fully supports our vhost 
statistics
requirements.
Thank you,
Razya
 
 --
  Gleb.
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Updated Elvis Upstreaming Roadmap

2013-12-17 Thread Razya Ladelsky
Hi,

Thank you all for your comments.
I'm sorry for taking this long to reply, I was away on vacation..

It was a good, long discussion, many issues were raised, which we'd like 
to address with the following proposed roadmap for Elvis patches.
In general, we believe it would be best to start with patches that are 
as simple as possible, providing the basic Elvis functionality, 
and attend to the more complicated issues in subsequent patches.

Here's the road map for Elvis patches: 

1. Shared vhost thread for multiple devices.

The way to go here, we believe, is to start with a patch having a shared 
vhost thread for multiple devices of the SAME vm.
The next step/patch may be handling vms belonging to the same cgroup.

Finally, we need to extend the functionality so that the shared vhost 
thread 
serves multiple vms (not necessarily belonging to the same cgroup).

There was a lot of discussion about the way to address the enforcement 
of cgroup policies, and we will consider the various solutions with a 
future
patch.

2. Creation of vhost threads

We suggested two ways of controlling the creation and removal of vhost
threads: 
- statically determining the maximum number of virtio devices per worker 
via a kernel module parameter 
- dynamically: Sysfs mechanism to add and remove vhost threads 

It seems that it would be simplest to take the static approach as
a first stage. At a second stage (next patch), we'll advance to 
dynamically 
changing the number of vhost threads, using the static module parameter 
only as a default value. 

Regarding cwmq, it is an interesting mechanism, which we need to explore 
further.
At the moment we prefer not to change the vhost model to use cwmq, as some 
of the issues that were discussed, such as cgroups, are not supported by 
cwmq, and this is adding more complexity.
However, we'll look further into it, and consider it at a later stage.

3. Adding polling mode to vhost 

It is a good idea making polling adaptive based on various factors such as 
the I/O rate, the guest kick overhead(which is the tradeoff of polling), 
or the amount of wasted cycles (cycles we kept polling but no new work was 
added).
However, as a beginning polling patch, we would prefer having a naive 
polling approach, which could be tuned with later patches.

4. vhost statistics 

The issue that was raised for the vhost statistics was using ftrace 
instead of the debugfs mechanism.
However, looking further into the kvm stat mechanism, we learned that 
ftrace didn't replace the plain debugfs mechanism, but was used in 
addition to it.
 
We propose to continue using debugfs for statistics, in a manner similar 
to kvm,
and at some point in the future ftrace can be added to vhost as well.
 
Does this plan look o.k.?
If there are no further comments, I'll start preparing the patches 
according to what we've agreed on thus far.
Thank you,
Razya

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-25 Thread Razya Ladelsky
Michael S. Tsirkin m...@redhat.com wrote on 24/11/2013 12:26:15 PM:

 From: Michael S. Tsirkin m...@redhat.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL, 
 Cc: kvm@vger.kernel.org, anth...@codemonkey.ws, g...@redhat.com, 
 pbonz...@redhat.com, as...@redhat.com, jasow...@redhat.com, 
 digitale...@google.com, abel.gor...@gmail.com, Abel Gordon/Haifa/
 IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL
 Date: 24/11/2013 12:22 PM
 Subject: Re: Elvis upstreaming plan
 
 On Sun, Nov 24, 2013 at 11:22:17AM +0200, Razya Ladelsky wrote:
  Hi all,
  
  I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
  developed Elvis, presented by Abel Gordon at the last KVM forum: 
  ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
  ELVIS slides: 
https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 
  
  
  According to the discussions that took place at the forum, upstreaming 

  some of the Elvis approaches seems to be a good idea, which we would 
like 
  to pursue.
  
  Our plan for the first patches is the following: 
  
  1.Shared vhost thread between mutiple devices 
  This patch creates a worker thread and worker queue shared across 
multiple 
  virtio devices 
  We would like to modify the patch posted in
  https://github.com/abelg/virtual_io_acceleration/commit/
 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 
  to limit a vhost thread to serve multiple devices only if they belong 
to 
  the same VM as Paolo suggested to avoid isolation or cgroups concerns.
  
  Another modification is related to the creation and removal of vhost 
  threads, which will be discussed next.
 
  2. Sysfs mechanism to add and remove vhost threads 
  This patch allows us to add and remove vhost threads dynamically.
  
  A simpler way to control the creation of vhost threads is statically 
  determining the maximum number of virtio devices per worker via a 
kernel 
  module parameter (which is the way the previously mentioned patch is 
  currently implemented)
  
  I'd like to ask for advice here about the more preferable way to go:
  Although having the sysfs mechanism provides more flexibility, it may 
be a 
  good idea to start with a simple static parameter, and have the first 
  patches as simple as possible. What do you think?
  
  3.Add virtqueue polling mode to vhost 
  Have the vhost thread poll the virtqueues with high I/O rate for new 
  buffers , and avoid asking the guest to kick us.
  https://github.com/abelg/virtual_io_acceleration/commit/
 26616133fafb7855cc80fac070b0572fd1aaf5d0
  
  4. vhost statistics
  This patch introduces a set of statistics to monitor different 
performance 
  metrics of vhost and our polling and I/O scheduling mechanisms. The 
  statistics are exposed using debugfs and can be easily displayed with 
a 
  Python script (vhost_stat, based on the old kvm_stats)
  https://github.com/abelg/virtual_io_acceleration/commit/
 ac14206ea56939ecc3608dc5f978b86fa322e7b0
  
  
  5. Add heuristics to improve I/O scheduling 
  This patch enhances the round-robin mechanism with a set of heuristics 
to 
  decide when to leave a virtqueue and proceed to the next.
  https://github.com/abelg/virtual_io_acceleration/commit/
 f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
  
  This patch improves the handling of the requests by the vhost thread, 
but 
  could perhaps be delayed to a 
  later time , and not submitted as one of the first Elvis patches.
  I'd love to hear some comments about whether this patch needs to be 
part 
  of the first submission.
  
  Any other feedback on this plan will be appreciated,
  Thank you,
  Razya
 
 
 How about we start with the stats patch?
 This will make it easier to evaluate the other patches.
 

Hi Michael,
Thank you for your quick reply.
Our plan was to send all these patches that contain the Elvis code.
We can start with the stats patch, however, many of the statistics there 
are related to the features that the other patches provide...
B.T.W. If you a chance to look at the rest of the patches,
I'd really appreciate your comments,
Thank you very much,
Razya


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html