Re: [PATCH] KVM: ARM: enable Cortex A7 hosts

2013-09-24 Thread Simon Horman
On Thu, Sep 19, 2013 at 02:01:48PM +0200, Ulrich Hecht wrote:
 KVM runs fine on Cortex A7 cores, so they should be enabled. Tested on an
 APE6EVM board (r8a73a4 SoC).
 
 Signed-off-by: Ulrich Hecht ulrich.he...@gmail.com

Hi Ulrich,

I'm not entirely sure but it seems to me that you should expand the
CC list of this patch someohow as it doesn't seem to have got any
attention in the week since you sent it.


# ./scripts/get_maintainer.pl -f arch/arm/kvm/guest.c
Christoffer Dall christoffer.d...@linaro.org (supporter:KERNEL VIRTUAL MA...)
Gleb Natapov g...@redhat.com (supporter:KERNEL VIRTUAL MA...)
Paolo Bonzini pbonz...@redhat.com (supporter:KERNEL VIRTUAL MA...)
Russell King li...@arm.linux.org.uk (maintainer:ARM PORT)
kvm...@lists.cs.columbia.edu (open list:KERNEL VIRTUAL MA...)
kvm@vger.kernel.org (open list:KERNEL VIRTUAL MA...)
linux-arm-ker...@lists.infradead.org (moderated list:ARM PORT)
linux-ker...@vger.kernel.org (open list)



 ---
  arch/arm/kvm/guest.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
 index 152d036..05c62d5 100644
 --- a/arch/arm/kvm/guest.c
 +++ b/arch/arm/kvm/guest.c
 @@ -192,6 +192,8 @@ int __attribute_const__ kvm_target_cpu(void)
   switch (part_number) {
   case ARM_CPU_PART_CORTEX_A15:
   return KVM_ARM_TARGET_CORTEX_A15;
 + case ARM_CPU_PART_CORTEX_A7:
 + return KVM_ARM_TARGET_CORTEX_A15;
   default:
   return -EINVAL;
   }
 -- 
 1.8.3.1
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-sh in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM induced panic on 2.6.38[2367] 2.6.39

2011-06-09 Thread Simon Horman
On Thu, Jun 09, 2011 at 01:02:13AM +0800, Brad Campbell wrote:
 On 08/06/11 11:59, Eric Dumazet wrote:
 
 Well, a bisection definitely should help, but needs a lot of time in
 your case.
 
 Yes. compile, test, crash, walk out to the other building to press
 reset, lather, rinse, repeat.
 
 I need a reset button on the end of a 50M wire, or a hardware watchdog!

Not strictly on-topic, but in situations where I have machines
that either don't have lights-out facilities or have broken ones
I find that network controlled power switches to be very useful.

At one point I would have need an 8000km long wire to the reset switch :-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net

2011-02-23 Thread Simon Horman
On Wed, Feb 23, 2011 at 10:52:09AM +0530, Krishna Kumar2 wrote:
 Simon Horman ho...@verge.net.au wrote on 02/22/2011 01:17:09 PM:
 
 Hi Simon,
 
 
  I have a few questions about the results below:
 
  1. Are the (%) comparisons between non-mq and mq virtio?
 
 Yes - mainline kernel with transmit-only MQ patch.
 
  2. Was UDP or TCP used?
 
 TCP. I had done some initial testing on UDP, but don't have
 the results now as it is really old. But I will be running
 it again.
 
  3. What was the transmit size (-m option to netperf)?
 
 I didn't use the -m option, so it defaults to 16K. The
 script does:
 
 netperf -t TCP_STREAM -c -C -l 60 -H $SERVER
 
  Also, I'm interested to know what the status of these patches is.
  Are you planing a fresh series?
 
 Yes. Michael Tsirkin had wanted to see how the MQ RX patch
 would look like, so I was in the process of getting the two
 working together. The patch is ready and is being tested.
 Should I send a RFC patch at this time?
 
 The TX-only patch helped the guest TX path but didn't help
 host-guest much (as tested using TCP_MAERTS from the guest).
 But with the TX+RX patch, both directions are getting
 improvements. Remote testing is still to be done.

Hi Krishna,

thanks for clarifying the test results.
I'm looking forward to the forthcoming RFC patches.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net

2011-02-22 Thread Simon Horman
On Wed, Oct 20, 2010 at 02:24:52PM +0530, Krishna Kumar wrote:
 Following set of patches implement transmit MQ in virtio-net.  Also
 included is the user qemu changes.  MQ is disabled by default unless
 qemu specifies it.

Hi Krishna,

I have a few questions about the results below:

1. Are the (%) comparisons between non-mq and mq virtio?
2. Was UDP or TCP used?
3. What was the transmit size (-m option to netperf)?

Also, I'm interested to know what the status of these patches is.
Are you planing a fresh series?

 
   Changes from rev2:
   --
 1. Define (in virtio_net.h) the maximum send txqs; and use in
virtio-net and vhost-net.
 2. vi-sq[i] is allocated individually, resulting in cache line
aligned sq[0] to sq[n].  Another option was to define
'send_queue' as:
struct send_queue {
struct virtqueue *svq;
struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
} cacheline_aligned_in_smp;
and to statically allocate 'VIRTIO_MAX_SQ' of those.  I hope
the submitted method is preferable.
 3. Changed vhost model such that vhost[0] handles RX and vhost[1-MAX]
handles TX[0-n].
 4. Further change TX handling such that vhost[0] handles both RX/TX
for single stream case.
 
   Enabling MQ on virtio:
   ---
 When following options are passed to qemu:
 - smp  1
 - vhost=on
 - mq=on (new option, default:off)
 then #txqueues = #cpus.  The #txqueues can be changed by using an
 optional 'numtxqs' option.  e.g. for a smp=4 guest:
 vhost=on   -   #txqueues = 1
 vhost=on,mq=on -   #txqueues = 4
 vhost=on,mq=on,numtxqs=2   -   #txqueues = 2
 vhost=on,mq=on,numtxqs=8   -   #txqueues = 8
 
 
Performance (guest - local host):
---
 System configuration:
 Host:  8 Intel Xeon, 8 GB memory
 Guest: 4 cpus, 2 GB memory
 Test: Each test case runs for 60 secs, sum over three runs (except
 when number of netperf sessions is 1, which has 10 runs of 12 secs
 each).  No tuning (default netperf) other than taskset vhost's to
 cpus 0-3.  numtxqs=32 gave the best results though the guest had
 only 4 vcpus (I haven't tried beyond that).
 
 __ numtxqs=2, vhosts=3  
 #sessions  BW%  CPU%RCPU%SD%  RSD%
 
 1  4.46-1.96 .19 -12.50   -6.06
 2  4.93-1.162.10  0   -2.38
 4  46.1764.77   33.72 19.51   -2.48
 8  47.8970.00   36.23 41.4613.35
 16 48.9780.44   40.67 21.11   -5.46
 24 49.0378.78   41.22 20.51   -4.78
 32 51.1177.15   42.42 15.81   -6.87
 40 51.6071.65   42.43 9.75-8.94
 48 50.1069.55   42.85 11.80   -5.81
 64 46.2468.42   42.67 14.18   -3.28
 80 46.3763.13   41.62 7.43-6.73
 96 46.4063.31   42.20 9.36-4.78
 12850.4362.79   42.16 13.11   -1.23
 
 BW: 37.2%,  CPU/RCPU: 66.3%,41.6%,  SD/RSD: 11.5%,-3.7%
 
 __ numtxqs=8, vhosts=5  
 #sessions   BW%  CPU% RCPU% SD%  RSD%
 
 1   -.76-1.56 2.33  03.03
 2   17.4111.1111.41 0   -4.76
 4   42.1255.1130.20 19.51.62
 8   54.6980.0039.22 24.39-3.88
 16  54.7781.6240.89 20.34-6.58
 24  54.6679.6841.57 15.49-8.99
 32  54.9276.8241.79 17.59-5.70
 40  51.7968.5640.53 15.31-3.87
 48  51.7266.4040.84 9.72 -7.13
 64  51.1163.9441.10 5.93 -8.82
 80  46.5159.5039.80 9.33 -4.18
 96  47.7257.7539.84 4.20 -7.62
 128 54.3558.9540.66 3.24 -8.63
 
 BW: 38.9%,  CPU/RCPU: 63.0%,40.1%,  SD/RSD: 6.0%,-7.4%
 
 __ numtxqs=16, vhosts=5  ___
 #sessions   BW%  CPU% RCPU% SD%  RSD%
 
 1   -1.43-3.521.55  0  3.03
 2   33.09 21.63   20.12-10.00 -9.52
 4   67.17 94.60   44.28 19.51 -11.80
 8   75.72 108.14  49.15 25.00 -10.71
 16  80.34 101.77  52.94 25.93 -4.49
 24  70.84 93.12   43.62 27.63 -5.03
 32  69.01 94.16   47.33 29.68 -1.51
 40  58.56 63.47   25.91-3.92  -25.85
 48 

Re: Flow Control and Port Mirroring Revisited

2011-01-23 Thread Simon Horman
On Sun, Jan 23, 2011 at 12:39:02PM +0200, Michael S. Tsirkin wrote:
 On Sun, Jan 23, 2011 at 05:38:49PM +1100, Simon Horman wrote:
  On Sat, Jan 22, 2011 at 11:57:42PM +0200, Michael S. Tsirkin wrote:
   On Sat, Jan 22, 2011 at 10:11:52AM +1100, Simon Horman wrote:
On Fri, Jan 21, 2011 at 11:59:30AM +0200, Michael S. Tsirkin wrote:

[snip]

 Hmm, what is this supposed to measure?  Basically each time you run an
 un-paced UDP_STREAM you get some random load on the network.
 You can't tell what it was exactly, only that it was between
 the send and receive throughput.

Rick mentioned in another email that I messed up my test parameters a 
bit,
so I will re-run the tests, incorporating his suggestions.

What I was attempting to measure was the effect of an unpaced UDP_STREAM
on the latency of more moderated traffic. Because I am interested in
what effect an abusive guest has on other guests and how that my be
mitigated.

Could you suggest some tests that you feel are more appropriate?
   
   Yes. To refraze my concern in these terms, besides the malicious guest
   you have another software in host (netperf) that interferes with
   the traffic, and it cooperates with the malicious guest.
   Right?
  
  Yes, that is the scenario in this test.
 
 Yes but I think that you want to put some controlled load on host.
 Let's assume that we impove the speed somehow and now you can push more
 bytes per second without loss.  Result might be a regression in your
 test because you let the guest push as much as it can and suddenly it
 can push more data through.  OTOH with packet loss the load on host is
 anywhere in between send and receive throughput: there's no easy way to
 measure it from netperf: the earlier some buffers overrun, the earlier
 the packets get dropped and the less the load on host.
 
 This is why I say that to get a specific
 load on host you want to limit the sender
 to a specific BW and then either
 - make sure packet loss % is close to 0.
 - make sure packet loss % is close to 100%.

Thanks, and sorry for being a bit slow.  I now see what you have
been getting at with regards to limiting the tests.
I will see about getting some numbers based on your suggestions.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-22 Thread Simon Horman
On Sat, Jan 22, 2011 at 11:57:42PM +0200, Michael S. Tsirkin wrote:
 On Sat, Jan 22, 2011 at 10:11:52AM +1100, Simon Horman wrote:
  On Fri, Jan 21, 2011 at 11:59:30AM +0200, Michael S. Tsirkin wrote:
   On Thu, Jan 20, 2011 at 05:38:33PM +0900, Simon Horman wrote:
[ Trimmed Eric from CC list as vger was complaining that it is too long 
]

On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:
 So it won't be all that simple to implement well, and before we try,
 I'd like to know whether there are applications that are helped
 by it. For example, we could try to measure latency at various
 pps and see whether the backpressure helps. netperf has -b, -w
 flags which might help these measurements.
 
 Those options are enabled when one adds --enable-burst to the
 pre-compilation ./configure  of netperf (one doesn't have to
 recompile netserver).  However, if one is also looking at latency
 statistics via the -j option in the top-of-trunk, or simply at the
 histogram with --enable-histogram on the ./configure and a verbosity
 level of 2 (global -v 2) then one wants the very top of trunk
 netperf from:

Hi,

I have constructed a test where I run an un-paced  UDP_STREAM test in
one guest and a paced omni rr test in another guest at the same time.
   
   Hmm, what is this supposed to measure?  Basically each time you run an
   un-paced UDP_STREAM you get some random load on the network.
   You can't tell what it was exactly, only that it was between
   the send and receive throughput.
  
  Rick mentioned in another email that I messed up my test parameters a bit,
  so I will re-run the tests, incorporating his suggestions.
  
  What I was attempting to measure was the effect of an unpaced UDP_STREAM
  on the latency of more moderated traffic. Because I am interested in
  what effect an abusive guest has on other guests and how that my be
  mitigated.
  
  Could you suggest some tests that you feel are more appropriate?
 
 Yes. To refraze my concern in these terms, besides the malicious guest
 you have another software in host (netperf) that interferes with
 the traffic, and it cooperates with the malicious guest.
 Right?

Yes, that is the scenario in this test.

 IMO for a malicious guest you would send
 UDP packets that then get dropped by the host.
 
 For example block netperf in host so that
 it does not consume packets from the socket.

I'm more interested in rate-limiting netperf than blocking it.
But in any case, do you mean use iptables or tc based on
classification made by net_cls?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-21 Thread Simon Horman
On Fri, Jan 21, 2011 at 11:59:30AM +0200, Michael S. Tsirkin wrote:
 On Thu, Jan 20, 2011 at 05:38:33PM +0900, Simon Horman wrote:
  [ Trimmed Eric from CC list as vger was complaining that it is too long ]
  
  On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:
   So it won't be all that simple to implement well, and before we try,
   I'd like to know whether there are applications that are helped
   by it. For example, we could try to measure latency at various
   pps and see whether the backpressure helps. netperf has -b, -w
   flags which might help these measurements.
   
   Those options are enabled when one adds --enable-burst to the
   pre-compilation ./configure  of netperf (one doesn't have to
   recompile netserver).  However, if one is also looking at latency
   statistics via the -j option in the top-of-trunk, or simply at the
   histogram with --enable-histogram on the ./configure and a verbosity
   level of 2 (global -v 2) then one wants the very top of trunk
   netperf from:
  
  Hi,
  
  I have constructed a test where I run an un-paced  UDP_STREAM test in
  one guest and a paced omni rr test in another guest at the same time.
 
 Hmm, what is this supposed to measure?  Basically each time you run an
 un-paced UDP_STREAM you get some random load on the network.
 You can't tell what it was exactly, only that it was between
 the send and receive throughput.

Rick mentioned in another email that I messed up my test parameters a bit,
so I will re-run the tests, incorporating his suggestions.

What I was attempting to measure was the effect of an unpaced UDP_STREAM
on the latency of more moderated traffic. Because I am interested in
what effect an abusive guest has on other guests and how that my be
mitigated.

Could you suggest some tests that you feel are more appropriate?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-20 Thread Simon Horman
[ Trimmed Eric from CC list as vger was complaining that it is too long ]

On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:
 So it won't be all that simple to implement well, and before we try,
 I'd like to know whether there are applications that are helped
 by it. For example, we could try to measure latency at various
 pps and see whether the backpressure helps. netperf has -b, -w
 flags which might help these measurements.
 
 Those options are enabled when one adds --enable-burst to the
 pre-compilation ./configure  of netperf (one doesn't have to
 recompile netserver).  However, if one is also looking at latency
 statistics via the -j option in the top-of-trunk, or simply at the
 histogram with --enable-histogram on the ./configure and a verbosity
 level of 2 (global -v 2) then one wants the very top of trunk
 netperf from:

Hi,

I have constructed a test where I run an un-paced  UDP_STREAM test in
one guest and a paced omni rr test in another guest at the same time.
Breifly I get the following results from the omni test..

1. Omni test only:  MEAN_LATENCY=272.00
2. Omni and stream test:MEAN_LATENCY=3423.00
3. cpu and net_cls group:   MEAN_LATENCY=493.00
   As per 2 plus cgoups are created for each guest
   and guest tasks added to the groups
4. 100Mbit/s class: MEAN_LATENCY=273.00
   As per 3 plus the net_cls groups each have a 100MBit/s HTB class
5. cpu.shares=128:  MEAN_LATENCY=652.00
   As per 4 plus the cpu groups have cpu.shares set to 128
6. Busy CPUS:   MEAN_LATENCY=15126.00
   As per 5 but the CPUs are made busy using a simple shell while loop

There is a bit of noise in the results as the two netperf invocations
aren't started at exactly the same moment

For reference, my netperf invocations are:
netperf -c -C -t UDP_STREAM -H 172.17.60.216 -l 12
netperf.omni -p 12866 -D -c -C -H 172.17.60.216 -t omni -j -v 2 -- -r 1 -d rr 
-k foo -b 1 -w 200 -m 200

foo contains
PROTOCOL
THROUGHPUT,THROUGHPUT_UNITS
LOCAL_SEND_THROUGHPUT
LOCAL_RECV_THROUGHPUT
REMOTE_SEND_THROUGHPUT
REMOTE_RECV_THROUGHPUT
RT_LATENCY,MIN_LATENCY,MEAN_LATENCY,MAX_LATENCY
P50_LATENCY,P90_LATENCY,P99_LATENCY,STDDEV_LATENCY
LOCAL_CPU_UTIL,REMOTE_CPU_UTIL

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-19 Thread Simon Horman
On Tue, Jan 18, 2011 at 10:13:33PM +0200, Michael S. Tsirkin wrote:
 On Tue, Jan 18, 2011 at 11:41:22AM -0800, Rick Jones wrote:
  So it won't be all that simple to implement well, and before we try,
  I'd like to know whether there are applications that are helped
  by it. For example, we could try to measure latency at various
  pps and see whether the backpressure helps. netperf has -b, -w
  flags which might help these measurements.
  
  Those options are enabled when one adds --enable-burst to the
  pre-compilation ./configure  of netperf (one doesn't have to
  recompile netserver).  However, if one is also looking at latency
  statistics via the -j option in the top-of-trunk, or simply at the
  histogram with --enable-histogram on the ./configure and a verbosity
  level of 2 (global -v 2) then one wants the very top of trunk
  netperf from:
  
  http://www.netperf.org/svn/netperf2/trunk
  
  to get the recently added support for accurate (netperf level) RTT
  measuremnts on burst-mode request/response tests.
  
  happy benchmarking,
  
  rick jones

Thanks Rick, that is really helpful.

  PS - the enhanced latency statistics from -j are only available in
  the omni version of the TCP_RR test.  To get that add a
  --enable-omni to the ./configure - and in this case both netperf and
  netserver have to be recompiled.
 
 
 Is this TCP only? I would love to get latency data from UDP as well.

At a glance, -- -T UDP is what you are after.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-16 Thread Simon Horman
On Fri, Jan 14, 2011 at 08:54:15AM +0200, Michael S. Tsirkin wrote:
 On Fri, Jan 14, 2011 at 03:35:28PM +0900, Simon Horman wrote:
  On Fri, Jan 14, 2011 at 06:58:18AM +0200, Michael S. Tsirkin wrote:
   On Fri, Jan 14, 2011 at 08:41:36AM +0900, Simon Horman wrote:
On Thu, Jan 13, 2011 at 10:45:38AM -0500, Jesse Gross wrote:
 On Thu, Jan 13, 2011 at 1:47 AM, Simon Horman ho...@verge.net.au 
 wrote:
  On Mon, Jan 10, 2011 at 06:31:55PM +0900, Simon Horman wrote:
  On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
   On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
  
   [ snip ]
   
I know that everyone likes a nice netperf result but I agree 
with
Michael that this probably isn't the right question to be 
asking.  I
don't think that socket buffers are a real solution to the flow
control problem: they happen to provide that functionality but 
it's
more of a side effect than anything.  It's just that the 
amount of
memory consumed by packets in the queue(s) doesn't really have 
any
implicit meaning for flow control (think multiple physical 
adapters,
all with the same speed instead of a virtual device and a 
physical
device with wildly different speeds).  The analog in the 
physical
world that you're looking for would be Ethernet flow control.
Obviously, if the question is limiting CPU or memory 
consumption then
that's a different story.
  
   Point taken. I will see if I can control CPU (and thus memory) 
   consumption
   using cgroups and/or tc.
 
  I have found that I can successfully control the throughput using
  the following techniques
 
  1) Place a tc egress filter on dummy0
 
  2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then 
  eth1,
     this is effectively the same as one of my hacks to the datapath
     that I mentioned in an earlier mail. The result is that eth1
     paces the connection.
   
   This is actually a bug. This means that one slow connection will affect
   fast ones. I intend to change the default for qemu to sndbuf=0 : this
   will fix it but break your pacing. So pls do not count on this
   behaviour.
  
  Do you have a patch I could test?
 
 You can (and users already can) just run qemu with sndbuf=0. But if you
 like, below.

Thanks

  Further to this, I wonder if there is any interest in providing
  a method to switch the action order - using ovs-ofctl is a hack 
  imho -
  and/or switching the default action order for mirroring.
 
 I'm not sure that there is a way to do this that is correct in the
 generic case.  It's possible that the destination could be a VM while
 packets are being mirrored to a physical device or we could be
 multicasting or some other arbitrarily complex scenario.  Just think
 of what a physical switch would do if it has ports with two different
 speeds.

Yes, I have considered that case. And I agree that perhaps there
is no sensible default. But perhaps we could make it configurable 
somehow?
   
   The fix is at the application level. Run netperf with -b and -w flags to
   limit the speed to a sensible value.
  
  Perhaps I should have stated my goals more clearly.
  I'm interested in situations where I don't control the application.
 
 Well an application that streams UDP without any throttling
 at the application level will break on a physical network, right?
 So I am not sure why should one try to make it work on the virtual one.
 
 But let's assume that you do want to throttle the guest
 for reasons such as QOS. The proper approach seems
 to be to throttle the sender, not have a dummy throttled
 receiver pacing it. Place the qemu process in the
 correct net_cls cgroup, set the class id and apply a rate limit?

I would like to be able to use a class to rate limit egress packets.
That much works fine for me.

What I would also like is for there to be back-pressure such that the guest
doesn't consume lots of CPU, spinning, sending packets as fast as it can,
almost of all of which are dropped. That does seem like a lot of wasted
CPU to me.

Unfortunately there are several problems with this and I am fast concluding
that I will need to use a CPU cgroup. Which does make some sense, as what I
am really trying to limit here is CPU usage not network packet rates - even
if the test using the CPU is netperf.  So long as the CPU usage can
(mostly) be attributed to the guest using a cgroup should work fine.  And
indeed seems to in my limited testing.

One scenario in which I don't think it is possible for there to be
back-pressure in a meaningful sense is if root in the guest sets
/proc/sys/net/core/wmem_default to a large value, say 200.


I do think that to some extent there is back-pressure

Re: Flow Control and Port Mirroring Revisited

2011-01-13 Thread Simon Horman
On Thu, Jan 13, 2011 at 10:45:38AM -0500, Jesse Gross wrote:
 On Thu, Jan 13, 2011 at 1:47 AM, Simon Horman ho...@verge.net.au wrote:
  On Mon, Jan 10, 2011 at 06:31:55PM +0900, Simon Horman wrote:
  On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
   On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
  
   [ snip ]
   
I know that everyone likes a nice netperf result but I agree with
Michael that this probably isn't the right question to be asking.  I
don't think that socket buffers are a real solution to the flow
control problem: they happen to provide that functionality but it's
more of a side effect than anything.  It's just that the amount of
memory consumed by packets in the queue(s) doesn't really have any
implicit meaning for flow control (think multiple physical adapters,
all with the same speed instead of a virtual device and a physical
device with wildly different speeds).  The analog in the physical
world that you're looking for would be Ethernet flow control.
Obviously, if the question is limiting CPU or memory consumption then
that's a different story.
  
   Point taken. I will see if I can control CPU (and thus memory) 
   consumption
   using cgroups and/or tc.
 
  I have found that I can successfully control the throughput using
  the following techniques
 
  1) Place a tc egress filter on dummy0
 
  2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then eth1,
     this is effectively the same as one of my hacks to the datapath
     that I mentioned in an earlier mail. The result is that eth1
     paces the connection.
 
  Further to this, I wonder if there is any interest in providing
  a method to switch the action order - using ovs-ofctl is a hack imho -
  and/or switching the default action order for mirroring.
 
 I'm not sure that there is a way to do this that is correct in the
 generic case.  It's possible that the destination could be a VM while
 packets are being mirrored to a physical device or we could be
 multicasting or some other arbitrarily complex scenario.  Just think
 of what a physical switch would do if it has ports with two different
 speeds.

Yes, I have considered that case. And I agree that perhaps there
is no sensible default. But perhaps we could make it configurable somehow?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-13 Thread Simon Horman
On Fri, Jan 14, 2011 at 06:58:18AM +0200, Michael S. Tsirkin wrote:
 On Fri, Jan 14, 2011 at 08:41:36AM +0900, Simon Horman wrote:
  On Thu, Jan 13, 2011 at 10:45:38AM -0500, Jesse Gross wrote:
   On Thu, Jan 13, 2011 at 1:47 AM, Simon Horman ho...@verge.net.au wrote:
On Mon, Jan 10, 2011 at 06:31:55PM +0900, Simon Horman wrote:
On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
 On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:

 [ snip ]
 
  I know that everyone likes a nice netperf result but I agree with
  Michael that this probably isn't the right question to be asking.  
  I
  don't think that socket buffers are a real solution to the flow
  control problem: they happen to provide that functionality but it's
  more of a side effect than anything.  It's just that the amount of
  memory consumed by packets in the queue(s) doesn't really have any
  implicit meaning for flow control (think multiple physical 
  adapters,
  all with the same speed instead of a virtual device and a physical
  device with wildly different speeds).  The analog in the physical
  world that you're looking for would be Ethernet flow control.
  Obviously, if the question is limiting CPU or memory consumption 
  then
  that's a different story.

 Point taken. I will see if I can control CPU (and thus memory) 
 consumption
 using cgroups and/or tc.
   
I have found that I can successfully control the throughput using
the following techniques
   
1) Place a tc egress filter on dummy0
   
2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then eth1,
   this is effectively the same as one of my hacks to the datapath
   that I mentioned in an earlier mail. The result is that eth1
   paces the connection.
 
 This is actually a bug. This means that one slow connection will affect
 fast ones. I intend to change the default for qemu to sndbuf=0 : this
 will fix it but break your pacing. So pls do not count on this
 behaviour.

Do you have a patch I could test?

Further to this, I wonder if there is any interest in providing
a method to switch the action order - using ovs-ofctl is a hack imho -
and/or switching the default action order for mirroring.
   
   I'm not sure that there is a way to do this that is correct in the
   generic case.  It's possible that the destination could be a VM while
   packets are being mirrored to a physical device or we could be
   multicasting or some other arbitrarily complex scenario.  Just think
   of what a physical switch would do if it has ports with two different
   speeds.
  
  Yes, I have considered that case. And I agree that perhaps there
  is no sensible default. But perhaps we could make it configurable somehow?
 
 The fix is at the application level. Run netperf with -b and -w flags to
 limit the speed to a sensible value.

Perhaps I should have stated my goals more clearly.
I'm interested in situations where I don't control the application.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-12 Thread Simon Horman
On Mon, Jan 10, 2011 at 06:31:55PM +0900, Simon Horman wrote:
 On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
  On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
  
  [ snip ]
   
   I know that everyone likes a nice netperf result but I agree with
   Michael that this probably isn't the right question to be asking.  I
   don't think that socket buffers are a real solution to the flow
   control problem: they happen to provide that functionality but it's
   more of a side effect than anything.  It's just that the amount of
   memory consumed by packets in the queue(s) doesn't really have any
   implicit meaning for flow control (think multiple physical adapters,
   all with the same speed instead of a virtual device and a physical
   device with wildly different speeds).  The analog in the physical
   world that you're looking for would be Ethernet flow control.
   Obviously, if the question is limiting CPU or memory consumption then
   that's a different story.
  
  Point taken. I will see if I can control CPU (and thus memory) consumption
  using cgroups and/or tc.
 
 I have found that I can successfully control the throughput using
 the following techniques
 
 1) Place a tc egress filter on dummy0
 
 2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then eth1,
this is effectively the same as one of my hacks to the datapath
that I mentioned in an earlier mail. The result is that eth1
paces the connection.

Further to this, I wonder if there is any interest in providing
a method to switch the action order - using ovs-ofctl is a hack imho -
and/or switching the default action order for mirroring.

 3) 2) + place a tc egress filter on eth1
 
 Which mostly makes sense to me although I am a little confused about
 why 1) needs a filter on dummy0 (a filter on eth1 has no effect)
 but 3) needs a filter on eth1 (a filter on dummy0 has no effect,
 even if the skb is sent to dummy0 last.
 
 I also had some limited success using CPU cgroups, though obviously
 that targets CPU usage and thus the effect on throughput is fairly course.
 In short, its a useful technique but not one that bares further
 discussion here.
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-10 Thread Simon Horman
On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
 On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
 
 [ snip ]
  
  I know that everyone likes a nice netperf result but I agree with
  Michael that this probably isn't the right question to be asking.  I
  don't think that socket buffers are a real solution to the flow
  control problem: they happen to provide that functionality but it's
  more of a side effect than anything.  It's just that the amount of
  memory consumed by packets in the queue(s) doesn't really have any
  implicit meaning for flow control (think multiple physical adapters,
  all with the same speed instead of a virtual device and a physical
  device with wildly different speeds).  The analog in the physical
  world that you're looking for would be Ethernet flow control.
  Obviously, if the question is limiting CPU or memory consumption then
  that's a different story.
 
 Point taken. I will see if I can control CPU (and thus memory) consumption
 using cgroups and/or tc.

I have found that I can successfully control the throughput using
the following techniques

1) Place a tc egress filter on dummy0

2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then eth1,
   this is effectively the same as one of my hacks to the datapath
   that I mentioned in an earlier mail. The result is that eth1
   paces the connection.

3) 2) + place a tc egress filter on eth1

Which mostly makes sense to me although I am a little confused about
why 1) needs a filter on dummy0 (a filter on eth1 has no effect)
but 3) needs a filter on eth1 (a filter on dummy0 has no effect,
even if the skb is sent to dummy0 last.

I also had some limited success using CPU cgroups, though obviously
that targets CPU usage and thus the effect on throughput is fairly course.
In short, its a useful technique but not one that bares further
discussion here.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Flow Control and Port Mirroring Revisited

2011-01-06 Thread Simon Horman
Hi,

Back in October I reported that I noticed a problem whereby flow control
breaks down when openvswitch is configured to mirror a port[1].

I have (finally) looked into this further and the problem appears to relate
to cloning of skbs, as Jesse Gross originally suspected.

More specifically, in do_execute_actions[2] the first n-1 times that an skb
needs to be transmitted it is cloned first and the final time the original
skb is used.

In the case that there is only one action, which is the normal case, then
the original skb will be used. But in the case of mirroring the cloning
comes into effect. And in my case the cloned skb seems to go to the (slow)
eth1 interface while the original skb goes to the (fast) dummy0 interface
that I set up to be a mirror. The result is that dummy0 paces the flow,
and its a cracking pace at that.

As an experiment I hacked do_execute_actions() to use the original skb
for the first action instead of the last one.  In my case the result was
that eth1 paces the flow, and things work reasonably nicely.

Well, sort of. Things work well for non-GSO skbs but extremely poorly for
GSO skbs where only 3 (yes 3, not 3%) end up at the remote host running
netserv. I'm unsure why, but I digress.

It seems to me that my hack illustrates the point that the flow ends up
being paced by one interface. However I think that what would be
desirable is that the flow is paced by the slowest link. Unfortunately
I'm unsure how to achieve that.

One idea that I had was to skb_get() the original skb each time it is
cloned - that is easy enough. But unfortunately it seems to me that
approach would require some sort of callback mechanism in kfree_skb() so
that the cloned skbs can kfree_skb() the original skb.

Ideas would be greatly appreciated.

[1] 
http://openvswitch.org/pipermail/dev_openvswitch.org/2010-October/003806.html
[2] 
http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=datapath/actions.c;h=5e16143ca402f7da0ee8fc18ee5eb16c3b7598e6;hb=HEAD
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-06 Thread Simon Horman
On Thu, Jan 06, 2011 at 12:27:55PM +0200, Michael S. Tsirkin wrote:
 On Thu, Jan 06, 2011 at 06:33:12PM +0900, Simon Horman wrote:
  Hi,
  
  Back in October I reported that I noticed a problem whereby flow control
  breaks down when openvswitch is configured to mirror a port[1].
 
 Apropos the UDP flow control.  See this
 http://www.spinics.net/lists/netdev/msg150806.html
 for some problems it introduces.
 Unfortunately UDP does not have built-in flow control.
 At some level it's just conceptually broken:
 it's not present in physical networks so why should
 we try and emulate it in a virtual network?
 
 
 Specifically, when you do:
 # netperf -c -4 -t UDP_STREAM -H 172.17.60.218 -l 30 -- -m 1472
 You are asking: what happens if I push data faster than it can be received?
 But why is this an interesting question?
 Ask 'what is the maximum rate at which I can send data with %X packet
 loss' or 'what is the packet loss at rate Y Gb/s'. netperf has
 -b and -w flags for this. It needs to be configured
 with --enable-intervals=yes for them to work.
 
 If you pose the questions this way the problem of pacing
 the execution just goes away.

I am aware that UDP inherently lacks flow control.

The aspect of flow control that I am interested in is situations where the
guest can create large amounts of work for the host. However, it seems that
in the case of virtio with vhostnet that the CPU utilisation seems to be
almost entirely attributable to the vhost and qemu-system processes.  And
in the case of virtio without vhost net the CPU is used by the qemu-system
process. In both case I assume that I could use a cgroup or something
similar to limit the guests.

Assuming all of that is true then from a resource control problem point of
view, which is mostly what I am concerned about, the problem goes away.
However, I still think that it would be nice to resolve the situation I
described.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-06 Thread Simon Horman
On Thu, Jan 06, 2011 at 02:07:22PM +0200, Michael S. Tsirkin wrote:
 On Thu, Jan 06, 2011 at 08:30:52PM +0900, Simon Horman wrote:
  On Thu, Jan 06, 2011 at 12:27:55PM +0200, Michael S. Tsirkin wrote:
   On Thu, Jan 06, 2011 at 06:33:12PM +0900, Simon Horman wrote:
Hi,

Back in October I reported that I noticed a problem whereby flow control
breaks down when openvswitch is configured to mirror a port[1].
   
   Apropos the UDP flow control.  See this
   http://www.spinics.net/lists/netdev/msg150806.html
   for some problems it introduces.
   Unfortunately UDP does not have built-in flow control.
   At some level it's just conceptually broken:
   it's not present in physical networks so why should
   we try and emulate it in a virtual network?
   
   
   Specifically, when you do:
   # netperf -c -4 -t UDP_STREAM -H 172.17.60.218 -l 30 -- -m 1472
   You are asking: what happens if I push data faster than it can be 
   received?
   But why is this an interesting question?
   Ask 'what is the maximum rate at which I can send data with %X packet
   loss' or 'what is the packet loss at rate Y Gb/s'. netperf has
   -b and -w flags for this. It needs to be configured
   with --enable-intervals=yes for them to work.
   
   If you pose the questions this way the problem of pacing
   the execution just goes away.
  
  I am aware that UDP inherently lacks flow control.
 
 Everyone's is aware of that, but this is always followed by a 'however'
 :).
 
  The aspect of flow control that I am interested in is situations where the
  guest can create large amounts of work for the host. However, it seems that
  in the case of virtio with vhostnet that the CPU utilisation seems to be
  almost entirely attributable to the vhost and qemu-system processes.  And
  in the case of virtio without vhost net the CPU is used by the qemu-system
  process. In both case I assume that I could use a cgroup or something
  similar to limit the guests.
 
 cgroups, yes. the vhost process inherits the cgroups
 from the qemu process so you can limit them all.
 
 If you are after limiting the max troughput of the guest
 you can do this with cgroups as well.

Do you mean a CPU cgroup or something else?

  Assuming all of that is true then from a resource control problem point of
  view, which is mostly what I am concerned about, the problem goes away.
  However, I still think that it would be nice to resolve the situation I
  described.
 
 We need to articulate what's wrong here, otherwise we won't
 be able to resolve the situation. We are sending UDP packets
 as fast as we can and some receivers can't cope. Is this the problem?
 We have made attempts to add a pseudo flow control in the past
 in an attempt to make UDP on the same host work better.
 Maybe they help some but they also sure introduce problems.

In the case where port mirroring is not active, which is the
usual case, to some extent there is flow control in place due to
(as Eric Dumazet pointed out) the socket buffer.

When port mirroring is activated the flow control operates based
only on one port - which can't be controlled by the administrator
in an obvious way.

I think that it would be more intuitive if flow control was
based on sending a packet to all ports rather than just one.

Though now I think about it some more, perhaps this isn't the best either.
For instance the case where data was being sent to dummy0 and suddenly
adding a mirror on eth1 slowed everything down.

So perhaps there needs to be another knob to tune when setting
up port-mirroring. Or perhaps the current situation isn't so bad.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-06 Thread Simon Horman
On Thu, Jan 06, 2011 at 11:22:42AM +0100, Eric Dumazet wrote:
 Le jeudi 06 janvier 2011 à 18:33 +0900, Simon Horman a écrit :
  Hi,
  
  Back in October I reported that I noticed a problem whereby flow control
  breaks down when openvswitch is configured to mirror a port[1].
  
  I have (finally) looked into this further and the problem appears to relate
  to cloning of skbs, as Jesse Gross originally suspected.
  
  More specifically, in do_execute_actions[2] the first n-1 times that an skb
  needs to be transmitted it is cloned first and the final time the original
  skb is used.
  
  In the case that there is only one action, which is the normal case, then
  the original skb will be used. But in the case of mirroring the cloning
  comes into effect. And in my case the cloned skb seems to go to the (slow)
  eth1 interface while the original skb goes to the (fast) dummy0 interface
  that I set up to be a mirror. The result is that dummy0 paces the flow,
  and its a cracking pace at that.
  
  As an experiment I hacked do_execute_actions() to use the original skb
  for the first action instead of the last one.  In my case the result was
  that eth1 paces the flow, and things work reasonably nicely.
  
  Well, sort of. Things work well for non-GSO skbs but extremely poorly for
  GSO skbs where only 3 (yes 3, not 3%) end up at the remote host running
  netserv. I'm unsure why, but I digress.
  
  It seems to me that my hack illustrates the point that the flow ends up
  being paced by one interface. However I think that what would be
  desirable is that the flow is paced by the slowest link. Unfortunately
  I'm unsure how to achieve that.
  
 
 Hi Simon !
 
 pacing is done because skb is attached to a socket, and a socket has a
 limited (but configurable) sndbuf. sk-sk_wmem_alloc is the current sum
 of all truesize skbs in flight.
 
 When you enter something that :
 
 1) Get a clone of the skb, queue the clone to device X
 2) queue the original skb to device Y
 
 Then :Socket sndbuf is not affected at all by device X queue.
   This is speed on device Y that matters.
 
 You want to get servo control on both X and Y
 
 You could try to
 
 1) Get a clone of skb
Attach it to socket too (so that socket get a feedback of final
 orphaning for the clone) with skb_set_owner_w()
queue the clone to device X
 
 Unfortunatly, stacked skb-destructor() makes this possible only for
 known destructor (aka sock_wfree())

Hi Eric !

Thanks for the advice. I had thought about the socket buffer but at some
point it slipped my mind.

In any case the following patch seems to implement the change that I had in
mind. However my discussions Michael Tsirkin elsewhere in this thread are
beginning to make me think that think that perhaps this change isn't the
best solution.

diff --git a/datapath/actions.c b/datapath/actions.c
index 5e16143..505f13f 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -384,7 +384,12 @@ static int do_execute_actions(struct datapath *dp, struct 
sk_buff *skb,
 
for (a = actions, rem = actions_len; rem  0; a = nla_next(a, rem)) {
if (prev_port != -1) {
-   do_output(dp, skb_clone(skb, GFP_ATOMIC), prev_port);
+   struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
+   if (nskb) {
+   if (skb-sk)
+   skb_set_owner_w(nskb, skb-sk);
+   do_output(dp, nskb, prev_port);
+   }
prev_port = -1;
}

I got a rather nasty panic without the if (skb-sk),
I guess some skbs don't have a socket.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-06 Thread Simon Horman
On Thu, Jan 06, 2011 at 02:28:18PM +0100, Eric Dumazet wrote:
 Le jeudi 06 janvier 2011 à 21:44 +0900, Simon Horman a écrit :
 
  Hi Eric !
  
  Thanks for the advice. I had thought about the socket buffer but at some
  point it slipped my mind.
  
  In any case the following patch seems to implement the change that I had in
  mind. However my discussions Michael Tsirkin elsewhere in this thread are
  beginning to make me think that think that perhaps this change isn't the
  best solution.
  
  diff --git a/datapath/actions.c b/datapath/actions.c
  index 5e16143..505f13f 100644
  --- a/datapath/actions.c
  +++ b/datapath/actions.c
  @@ -384,7 +384,12 @@ static int do_execute_actions(struct datapath *dp, 
  struct sk_buff *skb,
   
  for (a = actions, rem = actions_len; rem  0; a = nla_next(a, rem)) {
  if (prev_port != -1) {
  -   do_output(dp, skb_clone(skb, GFP_ATOMIC), prev_port);
  +   struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
  +   if (nskb) {
  +   if (skb-sk)
  +   skb_set_owner_w(nskb, skb-sk);
  +   do_output(dp, nskb, prev_port);
  +   }
  prev_port = -1;
  }
  
  I got a rather nasty panic without the if (skb-sk),
  I guess some skbs don't have a socket.
 
 Indeed, some packets are not linked to a socket.
 
 (ARP packets for example)
 
 Sorry, I should have mentioned it :)

Not at all, the occasional panic during hacking is good for the soul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Flow Control and Port Mirroring Revisited

2011-01-06 Thread Simon Horman
On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:

[ snip ]
 
 I know that everyone likes a nice netperf result but I agree with
 Michael that this probably isn't the right question to be asking.  I
 don't think that socket buffers are a real solution to the flow
 control problem: they happen to provide that functionality but it's
 more of a side effect than anything.  It's just that the amount of
 memory consumed by packets in the queue(s) doesn't really have any
 implicit meaning for flow control (think multiple physical adapters,
 all with the same speed instead of a virtual device and a physical
 device with wildly different speeds).  The analog in the physical
 world that you're looking for would be Ethernet flow control.
 Obviously, if the question is limiting CPU or memory consumption then
 that's a different story.

Point taken. I will see if I can control CPU (and thus memory) consumption
using cgroups and/or tc.

 This patch also double counts memory, since the full size of the
 packet will be accounted for by each clone, even though they share the
 actual packet data.  Probably not too significant here but it might be
 when flooding/mirroring to many interfaces.  This is at least fixable
 (the Xen-style accounting through page tracking deals with it, though
 it has its own problems).

Agreed on all counts.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ovs-dev] Flow Control and Port Mirroring

2010-11-07 Thread Simon Horman
On Mon, Nov 08, 2010 at 01:41:23PM +1030, Rusty Russell wrote:
 On Sat, 30 Oct 2010 01:29:33 pm Simon Horman wrote:
  [ CCed VHOST contacts ]
  
  On Thu, Oct 28, 2010 at 01:22:02PM -0700, Jesse Gross wrote:
   On Thu, Oct 28, 2010 at 4:54 AM, Simon Horman ho...@verge.net.au wrote:
My reasoning is that in the non-mirroring case the guest is
limited by the external interface through wich the packets
eventually flow - that is 1Gbit/s. But in the mirrored either
there is no flow control or the flow control is acting on the
rate of dummy0, which is essentailly infinate.
   
Before investigating this any further I wanted to ask if
this behaviour is intentional.
   
   It's not intentional but I can take a guess at what is happening.
   
   When we send the packet to a mirror, the skb is cloned but only the
   original skb is charged to the sender.  If the original packet is
   delivered to localhost then it will be freed quickly and no longer
   accounted for, despite the fact that the real packet is still
   sitting in the transmit queue on the NIC.  The UDP stack will then
   send the next packet, limited only by the speed of the CPU.
  
  That would explain what I have observed.
 
 I can't find the thread (what is ovs-dev?),

Sorry, yes its on ovs-dev.
http://openvswitch.org/pipermail/dev_openvswitch.org/2010-October/003806.html

 but I think the tap device
 has this fundamental feature: you can blast as many packets as you want
 through it.
 
 If that's a bad thing, we have to look harder...

There does seem to be flow control in the non-mirrored case.
So I suspect its occurring at the skb level but that breaks down when
a clone occurs. It would seem that fragment level flow control would
help this problem (which is basically what Xen's netback/netfront has),
but by this point I am speculating wildly.  I'll try and find out exactly
where the problem is occurring in order for us to have a more informed
discussion.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ovs-dev] Flow Control and Port Mirroring

2010-10-29 Thread Simon Horman
[ CCed VHOST contacts ]

On Thu, Oct 28, 2010 at 01:22:02PM -0700, Jesse Gross wrote:
 On Thu, Oct 28, 2010 at 4:54 AM, Simon Horman ho...@verge.net.au wrote:
  My reasoning is that in the non-mirroring case the guest is
  limited by the external interface through wich the packets
  eventually flow - that is 1Gbit/s. But in the mirrored either
  there is no flow control or the flow control is acting on the
  rate of dummy0, which is essentailly infinate.
 
  Before investigating this any further I wanted to ask if
  this behaviour is intentional.
 
 It's not intentional but I can take a guess at what is happening.
 
 When we send the packet to a mirror, the skb is cloned but only the
 original skb is charged to the sender.  If the original packet is
 delivered to localhost then it will be freed quickly and no longer
 accounted for, despite the fact that the real packet is still
 sitting in the transmit queue on the NIC.  The UDP stack will then
 send the next packet, limited only by the speed of the CPU.

That would explain what I have observed.

 Normally, this would be tracked by accounting for the memory charged
 to the socket.  However, I know that Xen tracks whether the actual
 pages of memory have been freed, which should avoid this problem since
 the memory won't be released util the last packet has been sent.  I
 don't know what KVM virtio does but I'm guessing that it similar to
 the former, since this problem is occurring.

I am also familiar of how Xen tracks pages but less sure of the
virtio side of things.

 While it would be easy to charge the socket for all clones, I also
 want to be careful about over accounting of the same data, leading to
 a very small effective socket buffer.

Agreed, we don't want to see over-charging.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qemu-kvm] device assignment: default requires IOMMU

2009-12-23 Thread Simon Horman
On Thu, Dec 24, 2009 at 01:45:34AM +0100, Alexander Graf wrote:
 
 Am 23.12.2009 um 23:40 schrieb Chris Wright chr...@sous-sol.org:
 
 [ resend, fixing email header, sorry for duplicate ]
 
 The default mode for device assignment is to rely on an IOMMU for
 proper translations and a functioning device in the guest.  The
 current
 logic makes this requirement advisory, and simply disables the request
 for IOMMU if one is not found on the host.  This makes for a confused
 user when the device assignment appears to work, but the device in the
 guest is not functioning  (I've seen about a half-dozen reports with
 this failure mode).
 
 Change the logic such that the default requires the IOMMU.  Period.
 If the host does not have an IOMMU, device assignment will fail.
 
 This is a user visible change, however I think the current
 situation is
 simply broken.
 
 And, of course, disabling the IOMMU requirement using the old:
 
   -pcidevice host=[addr],dma=none
 
 or the newer:
 
   -device pci-assign,host=[addr],iommu=0
 
 will do what it always did (not require an IOMMU, and fail to work
 properly).
 
 Yay!

Sounds good to me. Though I am curious to know the reasoning
behind the current logic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH qemu-kvm] device assignment: default requires IOMMU

2009-12-23 Thread Simon Horman
On Thu, Dec 24, 2009 at 02:56:00PM +0800, Sheng Yang wrote:
 On Thursday 24 December 2009 14:51:23 Simon Horman wrote:
  On Thu, Dec 24, 2009 at 01:45:34AM +0100, Alexander Graf wrote:
   Am 23.12.2009 um 23:40 schrieb Chris Wright chr...@sous-sol.org:
   [ resend, fixing email header, sorry for duplicate ]
   
   The default mode for device assignment is to rely on an IOMMU for
   proper translations and a functioning device in the guest.  The
   current
   logic makes this requirement advisory, and simply disables the request
   for IOMMU if one is not found on the host.  This makes for a confused
   user when the device assignment appears to work, but the device in the
   guest is not functioning  (I've seen about a half-dozen reports with
   this failure mode).
   
   Change the logic such that the default requires the IOMMU.  Period.
   If the host does not have an IOMMU, device assignment will fail.
   
   This is a user visible change, however I think the current
   situation is
   simply broken.
   
   And, of course, disabling the IOMMU requirement using the old:
   
 -pcidevice host=[addr],dma=none
   
   or the newer:
   
 -device pci-assign,host=[addr],iommu=0
   
   will do what it always did (not require an IOMMU, and fail to work
   properly).
  
   Yay!
  
  Sounds good to me. Though I am curious to know the reasoning
  behind the current logic.
  
 Sounds pretty good. :)
 
 I think maybe it due to we are interested in implementing PV DMA?

Ok, that would explain it.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-19 Thread Simon Horman
On Thu, Feb 19, 2009 at 10:06:17PM +1030, Rusty Russell wrote:
 On Thursday 19 February 2009 10:01:42 Simon Horman wrote:
  On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
   
   2) Direct NIC attachment This is particularly interesting with SR-IOV or
   other multiqueue nics, but for boutique cases or benchmarks, could be for
   normal NICs.  So far I have some very sketched-out patches: for the
   attached nic dev_alloc_skb() gets an skb from the guest (which supplies
   them via some kind of AIO interface), and a branch in netif_receive_skb()
   which returned it to the guest.  This bypasses all firewalling in the
   host though; we're basically having the guest process drive the NIC
   directly.
  
  Hi Rusty,
  
  Can I clarify that the idea with utilising SR-IOV would be to assign
  virtual functions to guests? That is, something conceptually similar to
  PCI pass-through in Xen (although I'm not sure that anyone has virtual
  function pass-through working yet).
 
 Not quite: I think PCI passthrough IMHO is the *wrong* way to do it: it
 makes migrate complicated (if not impossible), and requires emulation or
 the same NIC on the destination host.
 
 This would be the *host* seeing the virtual functions as multiple NICs,
 then the ability to attach a given NIC directly to a process.
 
 This isn't guest-visible: the kvm process is configured to connect
 directly to a NIC, rather than (say) bridging through the host.

Hi Rusty, Hi Chris,

Thanks for the clarification.

I think that the approach that Xen recommends for migration is to
use a bonding device that accesses the pass-through device if present
and a virtual nic.

The idea that you outline above does sound somewhat cleaner :-)

  If so, wouldn't this also be useful on machines that have multiple
  NICs?
 
 Yes, but mainly as a benchmark hack AFAICT :)

Ok, I was under the impression that at least in the Xen world it
was something people actually used. But I could easily be mistaken.

 Hope that clarifies, Rusty.

On Thu, Feb 19, 2009 at 03:37:52AM -0800, Chris Wright wrote:
 * Simon Horman (ho...@verge.net.au) wrote:
  On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
   2) Direct NIC attachment This is particularly interesting with SR-IOV or
   other multiqueue nics, but for boutique cases or benchmarks, could be for
   normal NICs.  So far I have some very sketched-out patches: for the
   attached nic dev_alloc_skb() gets an skb from the guest (which supplies
   them via some kind of AIO interface), and a branch in netif_receive_skb()
   which returned it to the guest.  This bypasses all firewalling in the
   host though; we're basically having the guest process drive the NIC
   directly.
  
  Can I clarify that the idea with utilising SR-IOV would be to assign
  virtual functions to guests? That is, something conceptually similar to
  PCI pass-through in Xen (although I'm not sure that anyone has virtual
  function pass-through working yet). If so, wouldn't this also be useful
  on machines that have multiple NICs?
 
 This would be the typical usecase for sr-iov.  But I think Rusty is
 referring to giving a nic directly to a guest but the guest is still
 seeing a virtio nic (not pass-through/device-assignment).  So there's
 no bridge, and zero copy so the dma buffers are supplied by guest,
 but host has the driver for the physical nic or the VF.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copyless virtio net thoughts?

2009-02-18 Thread Simon Horman
On Wed, Feb 18, 2009 at 10:08:00PM +1030, Rusty Russell wrote:
 
 2) Direct NIC attachment This is particularly interesting with SR-IOV or
 other multiqueue nics, but for boutique cases or benchmarks, could be for
 normal NICs.  So far I have some very sketched-out patches: for the
 attached nic dev_alloc_skb() gets an skb from the guest (which supplies
 them via some kind of AIO interface), and a branch in netif_receive_skb()
 which returned it to the guest.  This bypasses all firewalling in the
 host though; we're basically having the guest process drive the NIC
 directly.

Hi Rusty,

Can I clarify that the idea with utilising SR-IOV would be to assign
virtual functions to guests? That is, something conceptually similar to
PCI pass-through in Xen (although I'm not sure that anyone has virtual
function pass-through working yet). If so, wouldn't this also be useful
on machines that have multiple NICs?

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/16 v6] PCI: define PCI resource names in an 'enum'

2008-11-13 Thread Simon Horman
 - 1,
 +
 + /* total resources associated with a PCI device */
 + PCI_NUM_RESOURCES,
 +
 + /* preserve this for compatibility */
 + DEVICE_COUNT_RESOURCE
 +};
  
  typedef int __bitwise pci_power_t;
  
 @@ -262,18 +285,6 @@ static inline void pci_add_saved_cap(struct pci_dev 
 *pci_dev,
   hlist_add_head(new_cap-next, pci_dev-saved_cap_space);
  }
  
 -/*
 - *  For PCI devices, the region numbers are assigned this way:
 - *
 - *   0-5 standard PCI regions
 - *   6   expansion ROM
 - *   7-10bridges: address space assigned to buses behind the bridge
 - */
 -
 -#define PCI_ROM_RESOURCE 6
 -#define PCI_BRIDGE_RESOURCES 7
 -#define PCI_NUM_RESOURCES11
 -
  #ifndef PCI_BUS_NUM_RESOURCES
  #define PCI_BUS_NUM_RESOURCES16
  #endif

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/16 v6] PCI: Linux kernel SR-IOV support

2008-11-06 Thread Simon Horman
On Thu, Nov 06, 2008 at 09:53:08AM -0800, Greg KH wrote:
 On Thu, Nov 06, 2008 at 10:47:41AM -0700, Matthew Wilcox wrote:
  On Thu, Nov 06, 2008 at 08:49:19AM -0800, Greg KH wrote:
   On Thu, Nov 06, 2008 at 08:41:53AM -0800, H L wrote:
I have not modified any existing drivers, but instead I threw together
a bare-bones module enabling me to make a call to pci_iov_register()
and then poke at an SR-IOV adapter's /sys entries for which no driver
was loaded.

It appears from my perusal thus far that drivers using these new
SR-IOV patches will require modification; i.e. the driver associated
with the Physical Function (PF) will be required to make the
pci_iov_register() call along with the requisite notify() function.
Essentially this suggests to me a model for the PF driver to perform
any global actions or setup on behalf of VFs before enabling them
after which VF drivers could be associated.
   
   Where would the VF drivers have to be associated?  On the pci_dev
   level or on a higher one?
   
   Will all drivers that want to bind to a VF device need to be
   rewritten?
  
  The current model being implemented by my colleagues has separate
  drivers for the PF (aka native) and VF devices.  I don't personally
  believe this is the correct path, but I'm reserving judgement until I
  see some code.
 
 Hm, I would like to see that code before we can properly evaluate this
 interface.  Especially as they are all tightly tied together.
 
  I don't think we really know what the One True Usage model is for VF
  devices.  Chris Wright has some ideas, I have some ideas and Yu Zhao has
  some ideas.  I bet there's other people who have other ideas too.
 
 I'd love to hear those ideas.
 
 Rumor has it, there is some Xen code floating around to support this
 already, is that true?

Xen patches were posted to xen-devel by Yu Zhao on the 29th of September [1].
Unfortunately the only responses that I can find are a) that the patches
were mangled and b) they seem to include changes (by others) that have
been merged into Linux. I have confirmed that both of these concerns
are valid.

I have not yet examined the difference, if any, in the approach taken by Yu
to SR-IOV in Linux and Xen. Unfortunately comparison is less than trivial
due to the gaping gap in kernel versions between Linux-Xen (2.6.18.8) and
Linux itself.

One approach that I was considering in order to familiarise myself with the
code was to backport the v6 Linux patches (this thread) to Linux-Xen. I made a
start on that, but again due to kernel version differences it is non-trivial.

[1] http://lists.xensource.com/archives/html/xen-devel/2008-09/msg00923.html

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git repository for SR-IOV development?

2008-11-06 Thread Simon Horman
On Thu, Nov 06, 2008 at 11:58:25AM -0800, H L wrote:
 --- On Thu, 11/6/08, Greg KH [EMAIL PROTECTED] wrote:
 
  On Thu, Nov 06, 2008 at 08:51:09AM -0800, H L wrote:
   
   Has anyone initiated or given consideration to the
  creation of a git
   repository (say, on kernel.org) for SR-IOV
  development?
  
  Why?  It's only a few patches, right?  Why would it
  need a whole new git
  tree?
 
 
 So as to minimize the time and effort patching a kernel, especially if
 the tree (and/or hash level) against which the patches were created fails
 to be specified on a mailing-list.  Plus, there appears to be questions
 raised on how, precisely, the implementation should ultimately be modeled
 and especially given that, who knows at this point what number of patches
 will ultimately be submitted?  I know I've built the 7-patch one
 (painfully, by the way), and I'm aware there's another 15-patch set out
 there which I've not yet examined.

FWIW, the v6 patch series (this thread) applied to both 2.6.28-rc3
and the current Linus tree after a minor tweak to the first patch, as below.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

From: Yu Zhao [EMAIL PROTECTED]

[PATCH 1/16 v6] PCI: remove unnecessary arg of pci_update_resource()

This cleanup removes unnecessary argument 'struct resource *res' in
pci_update_resource(), so it takes same arguments as other companion
functions (pci_assign_resource(), etc.).

Cc: Alex Chiang [EMAIL PROTECTED]
Cc: Grant Grundler [EMAIL PROTECTED]
Cc: Greg KH [EMAIL PROTECTED]
Cc: Ingo Molnar [EMAIL PROTECTED]
Cc: Jesse Barnes [EMAIL PROTECTED]
Cc: Matthew Wilcox [EMAIL PROTECTED]
Cc: Randy Dunlap [EMAIL PROTECTED]
Cc: Roland Dreier [EMAIL PROTECTED]
Signed-off-by: Yu Zhao [EMAIL PROTECTED]
Upported-by: Simon Horman [EMAIL PROTECTED]

---
 drivers/pci/pci.c   |4 ++--
 drivers/pci/setup-res.c |7 ---
 include/linux/pci.h |2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

* Fri, 07 Nov 2008 09:05:18 +1100, Simon Horman
  - Minor rediff of include/linux/pci.h section to apply to 2.6.28-rc3

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 4db261e..ae62f01 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -376,8 +376,8 @@ pci_restore_bars(struct pci_dev *dev)
return;
}
 
-   for (i = 0; i  numres; i ++)
-   pci_update_resource(dev, dev-resource[i], i);
+   for (i = 0; i  numres; i++)
+   pci_update_resource(dev, i);
 }
 
 static struct pci_platform_pm_ops *pci_platform_pm;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 2dbd96c..b7ca679 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -26,11 +26,12 @@
 #include pci.h
 
 
-void pci_update_resource(struct pci_dev *dev, struct resource *res, int resno)
+void pci_update_resource(struct pci_dev *dev, int resno)
 {
struct pci_bus_region region;
u32 new, check, mask;
int reg;
+   struct resource *res = dev-resource + resno;
 
/*
 * Ignore resources for unimplemented BARs and unused resource slots
@@ -162,7 +163,7 @@ int pci_assign_resource(struct pci_dev *dev, int resno)
} else {
res-flags = ~IORESOURCE_STARTALIGN;
if (resno  PCI_BRIDGE_RESOURCES)
-   pci_update_resource(dev, res, resno);
+   pci_update_resource(dev, resno);
}
 
return ret;
@@ -197,7 +198,7 @@ int pci_assign_resource_fixed(struct pci_dev *dev, int 
resno)
dev_err(dev-dev, BAR %d: can't allocate %s resource %pR\n,
resno, res-flags  IORESOURCE_IO ? I/O : mem, res);
} else if (resno  PCI_BRIDGE_RESOURCES) {
-   pci_update_resource(dev, res, resno);
+   pci_update_resource(dev, resno);
}
 
return ret;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 085187b..43e1fc1 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -626,7 +626,7 @@ int pcix_get_mmrbc(struct pci_dev *dev);
 int pcie_set_readrq(struct pci_dev *dev, int rq);
 int pci_reset_function(struct pci_dev *dev);
 int pci_execute_reset_function(struct pci_dev *dev);
-void pci_update_resource(struct pci_dev *dev, struct resource *res, int resno);
+void pci_update_resource(struct pci_dev *dev, int resno);
 int __must_check pci_assign_resource(struct pci_dev *dev, int i);
 int pci_select_bars(struct pci_dev *dev, unsigned long flags);
 
-- 
1.5.6.4

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo

Re: [PATCH 0/2] kvm: disable virtualization on kdump

2008-10-23 Thread Simon Horman
[ Added Andrew Morton, Eric Biederman, Vivek Goyal and Haren Myneni to CC ]

On Thu, Oct 23, 2008 at 05:41:29PM -0200, Eduardo Habkost wrote:
 On Thu, Oct 23, 2008 at 10:28:24AM +1100, Simon Horman wrote:
  On Mon, Oct 20, 2008 at 01:01:32PM -0200, Eduardo Habkost wrote:
   The following two patches should make kdump work when the kvm-intel module
   is loaded. We need to disable vmx mode before booting the kdump kernel,
   so I've introduced a notifier interface where KVM can hook and disable
   virtualization on all CPUs just before they are halted.
   
   It has the same purpose of the KVM reboot notifier that gets executed
   at kexec-time. But on the kdump case, things are not as simple because
   the kernel has just crashed.
   
   The notifier interface being introduced is x86-specific. I don't know
   if an arch-independent interface would be more appropriate for this
   case.
   
   It was tested only using kvm-intel. Testing on different machines
   is welcome.
  
  These changes look fine to me from a kexec/kdump point of view.
  
  Reviewed-by: Simon Horman [EMAIL PROTECTED]
 
 Thanks.
 
 Considering they touch both KVM and kexec, which tree would be best way
 to get them in?

As I understand it, there is no kexec tree as such, rather
patches either get picked up by an arch tree or Andrew Morton.
I am happy to create and maintain a kexec tree if there is a need.
But in this case it seems that using the KVM tree would be best.

 (Avi: the patches were sent only to kexec and kvm mailing lists,
 initially. If it's better to submit them to your address also so it gets
 on your queue, please let me know)

I won't speak for Avi, but usually its good to CC the maintainer.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm: disable virtualization on kdump

2008-10-22 Thread Simon Horman
On Mon, Oct 20, 2008 at 01:01:32PM -0200, Eduardo Habkost wrote:
 The following two patches should make kdump work when the kvm-intel module
 is loaded. We need to disable vmx mode before booting the kdump kernel,
 so I've introduced a notifier interface where KVM can hook and disable
 virtualization on all CPUs just before they are halted.
 
 It has the same purpose of the KVM reboot notifier that gets executed
 at kexec-time. But on the kdump case, things are not as simple because
 the kernel has just crashed.
 
 The notifier interface being introduced is x86-specific. I don't know
 if an arch-independent interface would be more appropriate for this
 case.
 
 It was tested only using kvm-intel. Testing on different machines
 is welcome.

These changes look fine to me from a kexec/kdump point of view.

Reviewed-by: Simon Horman [EMAIL PROTECTED]

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html