Re: Time and KVM - best practices

2010-03-22 Thread Dor Laor

On 03/21/2010 01:29 PM, Thomas Løcke wrote:

Hey,

What is considered best practice when running a KVM host with a
mixture of Linux and Windows guests?

Currently I have ntpd running on the host, and I start my guests using
-rtc base=localhost,clock=host, with an extra -tdf added for
Windows guests, just to keep their clock from drifting madly during
load.

But with this setup, all my guests are constantly 1-2 seconds behind
the host. I can live with that for the Windows guests, as they are not


Is it just during boot time? If you run ntpdate after the boot inside 
the guest, does the time is 100% in sync with the host from that moment on?


Glauber once analyzed it and blames hwclock call in rc.sysinit


running anything that depends heavily on the time being set perfect,
but for some of the Linux guests it's an issue.

Would I be better of using ntpd and -rtc base=localhost,clock=vm for
all the Linux guests, or is there some other magic way of ensuring
that the clock is perfectly in sync with the host? Perhaps there are
some kernel configuration I can do to optimize the host for KVM?


Jan is the expert here, but last I checked clock=vm is not appropriate 
since this is virtual time and not host time - if qemu is 
stopped/migrated you won't notice it with virtual time withing the guest 
but the drift will grow.




I'm currently using QEMU PC emulator version 0.12.50 (qemu-kvm-devel)
because version 0.12.30 did not work well at all with Windows guests,
and the kernel in both host and Linux guests is 2.6.33.1

:o)
/Thomas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About KVM Forum 2010

2010-03-22 Thread Dor Laor

On 03/17/2010 07:37 AM, kazushi takahashi wrote:

Hi all

Does anybody know exact important date, such as paper deadline
for KVM Forum 2010?


It's not yet official and Chris Wright will publish the dates but last 
we talked it was about asking for pretty simple abstracts (a paragraph 
or two, ~100-150 words) due by April 15, notification by May 7th.


Again, not official, probably because of admin needed to set up a site 
for the paper submission.
So Chris will update us all officially, in the mean time, all can start 
working on their proposals.


hth,
Dor




I can find this
blog(http://www.linux-kvm.com/content/kvm-forum-2010-scheduled-august-9-10-2010)
  but the blog only say about the date of the conference.

Regards,
Kazushi Takahashi
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Timedrift in KVM guests after livemigration.

2010-04-18 Thread Dor Laor

On 04/18/2010 02:21 AM, Espen Berg wrote:

Den 17.04.2010 22:17, skrev Michael Tokarev:

We have three KVM hosts that supports live-migration between them, but
one of our problems is time drifting. The three frontends has different
CPU frequency and the KVM guests adopt the frequency from the host
machine where it was first started.

What do you mean by adopts ? Note that the cpu frequency
means nothing for all the modern operating systems, at least
since the days of common usage of MS-DOS which relied on CPU
frequency for its time functions. All interesting things are
now done using timers instead, and timers (which don't depend
on CPU frequency again) usually work quite well.


The assumption that frequency of the ticks was calculated by the hosts
MHz, was based on the fact that grater clock frequency differences
caused higher time drift. 60 MHz difference caused about 24min drift,
332 MHz difference caused about 2h25min drift.



What complicates things is that the most cheap and accurate
enough time source is TSC (time stamp counter register in
the CPU), but it will definitely be different on each
machine. For that, 0.12.3 kvm and 2.6.32 kernel (I think)
introduced a compensation. See for example -tdf kvm option.


Ah, nice to know. :)


That's two different things here:
The issue that Espen is reporting is that the hosts have different 
frequency and guests that relay on the tsc as a source clock will notice 
that post migration. The is indeed a problem that -tdf does not solve. 
-tdf only adds compensation for the RTC clock emulation.


What's the guest type and what's the guest's source clock?
Using tsc directly as a source clock is not recommended because of this 
migration issue (that is not solveable until we trap every rdtsc by the 
guest). Using pv kvmclock in Linux mitigates this issue since it exposes 
both the tsc and the host clock so guests can adjust themselves.


Several months ago a pvclock migration fix was added to pass the pvclock 
MSRs reading to the destination: 1a03675db146dfc760b3b48b3448075189f142cc






Since this is a cluster in production, I'm not able to try the latest
version either.

Well, that's difficult one, no? It either works or not.
If you can't try anything else, why to ask? :)


What I tried to say was that there are many important virtual servers
running on this cluster at the moment, so trial by error was not an
option. The last time we tried 0.12.x (during the initial tests of the
cluster) there where a lot of stability issues, crashes during migration
etc.

Regards, Espen

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Timedrift in KVM guests after livemigration.

2010-04-19 Thread Dor Laor

On 04/19/2010 12:29 PM, Gleb Natapov wrote:

On Mon, Apr 19, 2010 at 11:21:47AM +0200, Espen Berg wrote:

Den 18.04.2010 11:56, skrev Gleb Natapov:


That's two different things here:
The issue that Espen is reporting is that the hosts have different
frequency and guests that relay on the tsc as a source clock will
notice that post migration. The is indeed a problem that -tdf does
not solve. -tdf only adds compensation for the RTC clock emulation.


It's -rtc-td-hack. -tdf does pit compensation, but since usually kernel
pit is used it does nothing.


So this hack will not solve our problem?


As I also stated, in the past the kvmclock MSRs were not sync upon live 
migration and it was fixed in 1a03675db146dfc760b3b48b3448075189f142cc ,

better check with the code.




If your guest uses RTC for time keeping it may help. Otherwise it does
nothing.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1

2010-04-22 Thread Dor Laor

On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote:

Hi all,

We have been implementing the prototype of Kemari for KVM, and we're sending
this message to share what we have now and TODO lists.  Hopefully, we would like
to get early feedback to keep us in the right direction.  Although advanced
approaches in the TODO lists are fascinating, we would like to run this project
step by step while absorbing comments from the community.  The current code is
based on qemu-kvm.git 2b644fd0e737407133c88054ba498e772ce01f27.

For those who are new to Kemari for KVM, please take a look at the
following RFC which we posted last year.

http://www.mail-archive.com/kvm@vger.kernel.org/msg25022.html

The transmission/transaction protocol, and most of the control logic is
implemented in QEMU.  However, we needed a hack in KVM to prevent rip from
proceeding before synchronizing VMs.  It may also need some plumbing in the
kernel side to guarantee replayability of certain events and instructions,
integrate the RAS capabilities of newer x86 hardware with the HA stack, as well
as for optimization purposes, for example.


[ snap]



The rest of this message describes TODO lists grouped by each topic.

=== event tapping ===

Event tapping is the core component of Kemari, and it decides on which event the
primary should synchronize with the secondary.  The basic assumption here is
that outgoing I/O operations are idempotent, which is usually true for disk I/O
and reliable network protocols such as TCP.


IMO any type of network even should be stalled too. What if the VM runs 
non tcp protocol and the packet that the master node sent reached some 
remote client and before the sync to the slave the master failed?


[snap]



=== clock ===

Since synchronizing the virtual machines every time the TSC is accessed would be
prohibitive, the transmission of the TSC will be done lazily, which means
delaying it until there is a non-TSC synchronization point arrives.


Why do you specifically care about the tsc sync? When you sync all the 
IO model on snapshot it also synchronizes the tsc.


In general, can you please explain the 'algorithm' for continuous 
snapshots (is that what you like to do?):

A trivial one would we to :
 - do X online snapshots/sec
 - Stall all IO (disk/block) from the guest to the outside world
   until the previous snapshot reaches the slave.
 - Snapshots are made of
   - diff of dirty pages from last snapshot
   - Qemu device model (+kvm's) diff from last.
You can do 'light' snapshots in between to send dirty pages to reduce 
snapshot time.


I wrote the above to serve a reference for your comments so it will map 
into my mind. Thanks, dor




TODO:
  - Synchronization of clock sources (need to intercept TSC reads, etc).

=== usability ===

These are items that defines how users interact with Kemari.

TODO:
  - Kemarid daemon that takes care of the cluster management/monitoring
side of things.
  - Some device emulators might need minor modifications to work well
with Kemari.  Use white(black)-listing to take the burden of
choosing the right device model off the users.

=== optimizations ===

Although the big picture can be realized by completing the TODO list above, we
need some optimizations/enhancements to make Kemari useful in real world, and
these are items what needs to be done for that.

TODO:
  - SMP (for the sake of performance might need to implement a
synchronization protocol that can maintain two or more
synchronization points active at any given moment)
  - VGA (leverage VNC's subtilting mechanism to identify fb pages that
are really dirty).


Any comments/suggestions would be greatly appreciated.

Thanks,

Yoshi

--

Kemari starts synchronizing VMs when QEMU handles I/O requests.
Without this patch VCPU state is already proceeded before
synchronization, and after failover to the VM on the receiver, it
hangs because of this.

Signed-off-by: Yoshiaki Tamuratamura.yoshi...@lab.ntt.co.jp
---
  arch/x86/include/asm/kvm_host.h |1 +
  arch/x86/kvm/svm.c  |   11 ---
  arch/x86/kvm/vmx.c  |   11 ---
  arch/x86/kvm/x86.c  |4 
  4 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 26c629a..7b8f514 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -227,6 +227,7 @@ struct kvm_pio_request {
int in;
int port;
int size;
+   bool lazy_skip;
  };

  /*
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d04c7ad..e373245 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1495,7 +1495,7 @@ static int io_interception(struct vcpu_svm *svm)
  {
struct kvm_vcpu *vcpu =svm-vcpu;
u32 io_info = svm-vmcb-control.exit_info_1; /* address size bug? */
-   int size, in, string;
+   int size, in, string, ret;
unsigned port;

++svm-vcpu.stat.io_exits;
@@ 

Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1

2010-04-22 Thread Dor Laor

On 04/22/2010 01:35 PM, Yoshiaki Tamura wrote:

Dor Laor wrote:

On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote:

Hi all,

We have been implementing the prototype of Kemari for KVM, and we're
sending
this message to share what we have now and TODO lists. Hopefully, we
would like
to get early feedback to keep us in the right direction. Although
advanced
approaches in the TODO lists are fascinating, we would like to run
this project
step by step while absorbing comments from the community. The current
code is
based on qemu-kvm.git 2b644fd0e737407133c88054ba498e772ce01f27.

For those who are new to Kemari for KVM, please take a look at the
following RFC which we posted last year.

http://www.mail-archive.com/kvm@vger.kernel.org/msg25022.html

The transmission/transaction protocol, and most of the control logic is
implemented in QEMU. However, we needed a hack in KVM to prevent rip
from
proceeding before synchronizing VMs. It may also need some plumbing in
the
kernel side to guarantee replayability of certain events and
instructions,
integrate the RAS capabilities of newer x86 hardware with the HA
stack, as well
as for optimization purposes, for example.


[ snap]



The rest of this message describes TODO lists grouped by each topic.

=== event tapping ===

Event tapping is the core component of Kemari, and it decides on which
event the
primary should synchronize with the secondary. The basic assumption
here is
that outgoing I/O operations are idempotent, which is usually true for
disk I/O
and reliable network protocols such as TCP.


IMO any type of network even should be stalled too. What if the VM runs
non tcp protocol and the packet that the master node sent reached some
remote client and before the sync to the slave the master failed?


In current implementation, it is actually stalling any type of network
that goes through virtio-net.

However, if the application was using unreliable protocols, it should
have its own recovering mechanism, or it should be completely stateless.


Why do you treat tcp differently? You can damage the entire VM this way 
- think of dhcp request that was dropped on the moment you switched 
between the master and the slave?






[snap]



=== clock ===

Since synchronizing the virtual machines every time the TSC is
accessed would be
prohibitive, the transmission of the TSC will be done lazily, which
means
delaying it until there is a non-TSC synchronization point arrives.


Why do you specifically care about the tsc sync? When you sync all the
IO model on snapshot it also synchronizes the tsc.


So, do you agree that an extra clock synchronization is not needed since 
it is done anyway as part of the live migration state sync?




In general, can you please explain the 'algorithm' for continuous
snapshots (is that what you like to do?):


Yes, of course.
Sorry for being less informative.


A trivial one would we to :
- do X online snapshots/sec


I currently don't have good numbers that I can share right now.
Snapshots/sec depends on what kind of workload is running, and if the
guest was almost idle, there will be no snapshots in 5sec. On the other
hand, if the guest was running I/O intensive workloads (netperf, iozone
for example), there will be about 50 snapshots/sec.


- Stall all IO (disk/block) from the guest to the outside world
until the previous snapshot reaches the slave.


Yes, it does.


- Snapshots are made of


Full device model + diff of dirty pages from the last snapshot.


- diff of dirty pages from last snapshot


This also depends on the workload.
In case of I/O intensive workloads, dirty pages are usually less than 100.


The hardest would be memory intensive loads.
So 100 snap/sec means latency of 10msec right?
(not that it's not ok, with faster hw and IB you'll be able to get much 
more)





- Qemu device model (+kvm's) diff from last.


We're currently sending full copy because we're completely reusing this
part of existing live migration framework.

Last time we measured, it was about 13KB.
But it varies by which QEMU version is used.


You can do 'light' snapshots in between to send dirty pages to reduce
snapshot time.


I agree. That's one of the advanced topic we would like to try too.


I wrote the above to serve a reference for your comments so it will map
into my mind. Thanks, dor


Thank your for the guidance.
I hope this answers to your question.

At the same time, I would also be happy it we could discuss how to
implement too. In fact, we needed a hack to prevent rip from proceeding
in KVM, which turned out that it was not the best workaround.


There are brute force solutions like
- stop the guest until you send all of the snapshot to the remote (like
  standard live migration)
- Stop + fork + cont the father

Or mark the recent dirty pages that were not sent to the remote as write 
protected and copy them if touched.





Thanks,

Yoshi





TODO:
- Synchronization of clock sources (need to intercept TSC reads, etc).

=== usability

Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1

2010-04-22 Thread Dor Laor

On 04/22/2010 04:16 PM, Yoshiaki Tamura wrote:

2010/4/22 Dor Laordl...@redhat.com:

On 04/22/2010 01:35 PM, Yoshiaki Tamura wrote:


Dor Laor wrote:


On 04/21/2010 08:57 AM, Yoshiaki Tamura wrote:


Hi all,

We have been implementing the prototype of Kemari for KVM, and we're
sending
this message to share what we have now and TODO lists. Hopefully, we
would like
to get early feedback to keep us in the right direction. Although
advanced
approaches in the TODO lists are fascinating, we would like to run
this project
step by step while absorbing comments from the community. The current
code is
based on qemu-kvm.git 2b644fd0e737407133c88054ba498e772ce01f27.

For those who are new to Kemari for KVM, please take a look at the
following RFC which we posted last year.

http://www.mail-archive.com/kvm@vger.kernel.org/msg25022.html

The transmission/transaction protocol, and most of the control logic is
implemented in QEMU. However, we needed a hack in KVM to prevent rip
from
proceeding before synchronizing VMs. It may also need some plumbing in
the
kernel side to guarantee replayability of certain events and
instructions,
integrate the RAS capabilities of newer x86 hardware with the HA
stack, as well
as for optimization purposes, for example.


[ snap]



The rest of this message describes TODO lists grouped by each topic.

=== event tapping ===

Event tapping is the core component of Kemari, and it decides on which
event the
primary should synchronize with the secondary. The basic assumption
here is
that outgoing I/O operations are idempotent, which is usually true for
disk I/O
and reliable network protocols such as TCP.


IMO any type of network even should be stalled too. What if the VM runs
non tcp protocol and the packet that the master node sent reached some
remote client and before the sync to the slave the master failed?


In current implementation, it is actually stalling any type of network
that goes through virtio-net.

However, if the application was using unreliable protocols, it should
have its own recovering mechanism, or it should be completely stateless.


Why do you treat tcp differently? You can damage the entire VM this way -
think of dhcp request that was dropped on the moment you switched between
the master and the slave?


I'm not trying to say that we should treat tcp differently, but just
it's severe.
In case of dhcp request, the client would have a chance to retry after
failover, correct?


But until it timeouts it won't have networking.


BTW, in current implementation, it's synchronizing before dhcp ack is sent.
But in case of tcp, once you send ack to the client before sync, there
is no way to recover.


What if the guest is running dhcp server? It we provide an IP to a 
client and then fail to the secondary that will run without knowing the 
master allocated this IP





[snap]



=== clock ===

Since synchronizing the virtual machines every time the TSC is
accessed would be
prohibitive, the transmission of the TSC will be done lazily, which
means
delaying it until there is a non-TSC synchronization point arrives.


Why do you specifically care about the tsc sync? When you sync all the
IO model on snapshot it also synchronizes the tsc.


So, do you agree that an extra clock synchronization is not needed since it
is done anyway as part of the live migration state sync?


I agree that its sent as part of the live migration.
What I wanted to say here is that this is not something for real time
applications.
I usually get questions like can this guarantee fault tolerance for
real time applications.


First the huge cost of snapshots won't match to any real time app.
Second, even if it wasn't the case, the tsc delta and kvmclock are 
synchronized as part of the VM state so there is no use of trapping it 
in the middle.





In general, can you please explain the 'algorithm' for continuous
snapshots (is that what you like to do?):


Yes, of course.
Sorry for being less informative.


A trivial one would we to :
- do X online snapshots/sec


I currently don't have good numbers that I can share right now.
Snapshots/sec depends on what kind of workload is running, and if the
guest was almost idle, there will be no snapshots in 5sec. On the other
hand, if the guest was running I/O intensive workloads (netperf, iozone
for example), there will be about 50 snapshots/sec.


- Stall all IO (disk/block) from the guest to the outside world
until the previous snapshot reaches the slave.


Yes, it does.


- Snapshots are made of


Full device model + diff of dirty pages from the last snapshot.


- diff of dirty pages from last snapshot


This also depends on the workload.
In case of I/O intensive workloads, dirty pages are usually less than 100.


The hardest would be memory intensive loads.
So 100 snap/sec means latency of 10msec right?
(not that it's not ok, with faster hw and IB you'll be able to get much
more)


Doesn't 100 snap/sec mean the interval of snap is 10msec?
IIUC, to get the latency

Re: [Qemu-devel] [RFC PATCH 00/20] Kemari for KVM v0.1

2010-04-25 Thread Dor Laor

On 04/23/2010 10:36 AM, Fernando Luis Vázquez Cao wrote:

On 04/23/2010 02:17 PM, Yoshiaki Tamura wrote:

Dor Laor wrote:

[...]

Second, even if it wasn't the case, the tsc delta and kvmclock are
synchronized as part of the VM state so there is no use of trapping it
in the middle.


I should study the clock in KVM, but won't tsc get updated by the HW
after migration?
I was wondering the following case for example:

1. The application on the guest calls rdtsc on host A.
2. The application uses rdtsc value for something.
3. Failover to host B.
4. The application on the guest replays the rdtsc call on host B.
5. If the rdtsc value is different between A and B, the application may
get into trouble because of it.


Regarding the TSC, we need to guarantee that the guest sees a monotonic
TSC after migration, which can be achieved by adjusting the TSC offset properly.
Besides, we also need a trapping TSC, so that we can tackle the case where the
primary node and the standby node have different TSC frequencies.


You're right but this is already taken care of by normal save/restore 
process. Check void kvm_load_tsc(CPUState *env) function.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Apr 27

2010-04-27 Thread Dor Laor

On 04/27/2010 11:14 AM, Avi Kivity wrote:

On 04/27/2010 01:36 AM, Anthony Liguori wrote:


A few comments:

1) The problem was not block watermark itself but generating a
notification on the watermark threshold. It's a heuristic and should
be implemented based on polling block stats.


Polling for an event that never happens is bad engineering. What
frequency do you poll? you're forcing the user to make a lose-lose
tradeoff.


Otherwise, we'll be adding tons of events to qemu that we'll struggle
to maintain.


That's not a valid reason to reject a user requirement. We may argue the
requirement is bogus, or that the suggested implementation is wrong and
point in a different direction, but saying that we may have to add more
code in the future due to other requirements is ... well I can't find a
word for it.



2) A block plugin doesn't solve the problem if it's just at the
BlockDriverState level because it can't interact with qcow2.


Why not? We have a layered model. guest - qcow2 - plugin (sends event)
- raw-posix. Just need to insert the plugin at the appropriate layer.



3) For general block plugins, it's probably better to tackle userspace
block devices. We have CUSE and FUSE already, a BUSE is a logical
conclusion.


We also have an nbd client.

Here's another option: an nbd-like protocol that remotes all BlockDriver
operations except read and write over a unix domain socket. The open
operation returns an fd (SCM_RIGHTS strikes again) that is used for read
and write. This can be used to implement snapshots over LVM, for example.



Why w/o read/writes? the watermark code needs them too (as info, not the 
actual buffer).


IMHO the whole thing is way over engineered:
 a) Having another channel into qemu is complicating management
software. Isn't the monitor should be the channel? Otherwise we'll
need to create another QMP (or nbd like Avi suggest) for these
actions. It's extra work for mgmt and they will have hard time to
understand events interleaving of the various channels
 b) How the plugins are defined? Is it scripts? Binaries? Do they open
their own sockets?

So I suggest either to stick with qmp or to have new block layer but let 
qmp pass events from it - this is actually the nbd-like approach but 
with qmp socket.


Thanks,
Dor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Apr 27

2010-04-27 Thread Dor Laor

On 04/27/2010 11:56 AM, Avi Kivity wrote:

On 04/27/2010 11:48 AM, Dor Laor wrote:

Here's another option: an nbd-like protocol that remotes all BlockDriver
operations except read and write over a unix domain socket. The open
operation returns an fd (SCM_RIGHTS strikes again) that is used for read
and write. This can be used to implement snapshots over LVM, for
example.




Why w/o read/writes?


To avoid the copying.


Of course, just pass the offset+len on read/write too




the watermark code needs them too (as info, not the actual buffer).


Yeah. It works for lvm snapshots, not for watermarks.



IMHO the whole thing is way over engineered:
a) Having another channel into qemu is complicating management
software. Isn't the monitor should be the channel? Otherwise we'll
need to create another QMP (or nbd like Avi suggest) for these
actions. It's extra work for mgmt and they will have hard time to
understand events interleaving of the various channels


block layer plugins allow intercepting all interesting block layer
events, not just write-past-a-watermark, and allow actions based on
those events. It's a more general solution.


No problem there, as long as we do try to use the single existing QMP 
with the plugins. Otherwise we'll create QMP2 for the block events in a 
year from now.





b) How the plugins are defined? Is it scripts? Binaries? Do they open
their own sockets?


Shared objects.




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Apr 27

2010-04-27 Thread Dor Laor

On 04/27/2010 12:22 PM, Avi Kivity wrote:

On 04/27/2010 12:08 PM, Dor Laor wrote:

On 04/27/2010 11:56 AM, Avi Kivity wrote:

On 04/27/2010 11:48 AM, Dor Laor wrote:

Here's another option: an nbd-like protocol that remotes all
BlockDriver
operations except read and write over a unix domain socket. The open
operation returns an fd (SCM_RIGHTS strikes again) that is used for
read
and write. This can be used to implement snapshots over LVM, for
example.




Why w/o read/writes?


To avoid the copying.


Of course, just pass the offset+len on read/write too


There will be a large performance impact.



IMHO the whole thing is way over engineered:
a) Having another channel into qemu is complicating management
software. Isn't the monitor should be the channel? Otherwise we'll
need to create another QMP (or nbd like Avi suggest) for these
actions. It's extra work for mgmt and they will have hard time to
understand events interleaving of the various channels


block layer plugins allow intercepting all interesting block layer
events, not just write-past-a-watermark, and allow actions based on
those events. It's a more general solution.


No problem there, as long as we do try to use the single existing QMP
with the plugins. Otherwise we'll create QMP2 for the block events in
a year from now.


I don't see how we can interleave messages from the plugin into the qmp
stream without causing confusion.


Those are QMP async events.

Since Kevin suggested adding even more events (was is cynical?) maybe we 
can use optional QMP opaque block events that the plugin issues and it 
will travel using the standard QMP connection as async event to the 
interested mgmt app.

Once stabilized each event can go into the official QMP protocol.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio: put last seen used index into ring itself

2010-05-05 Thread Dor Laor

On 05/05/2010 11:58 PM, Michael S. Tsirkin wrote:

Generally, the Host end of the virtio ring doesn't need to see where
Guest is up to in consuming the ring.  However, to completely understand
what's going on from the outside, this information must be exposed.
For example, host can reduce the number of interrupts by detecting
that the guest is currently handling previous buffers.

Fortunately, we have room to expand: the ring is always a whole number
of pages and there's hundreds of bytes of padding after the avail ring
and the used ring, whatever the number of descriptors (which must be a
power of 2).

We add a feature bit so the guest can tell the host that it's writing
out the current value there, if it wants to use that.

This is based on a patch by Rusty Russell, with the main difference
being that we dedicate a feature bit to guest to tell the host it is
writing the used index.  This way we don't need to force host to publish
the last available index until we have a use for it.

Signed-off-by: Rusty Russellru...@rustcorp.com.au
Signed-off-by: Michael S. Tsirkinm...@redhat.com
---

Rusty,
this is a simplified form of a patch you posted in the past.
I have a vhost patch that, using this feature, shows external
to host bandwidth grow from 5 to 7 GB/s, by avoiding


You mean external to guest I guess.

We have a similar issue with virtio-blk - when using very fast 
multi-spindle storage on the host side, there are too many irq injection 
events. This patch should probably reduce them allot.

The principle exactly matches the Xen ring.


an interrupt in the window after previous interrupt
was sent and before interrupts were disabled for the vq.
With vhost under some external to host loads I see
this window being hit about 30% sometimes.

I'm finalizing the host bits and plan to send
the final version for inclusion when all's ready,
but I'd like to hear comments meanwhile.

  drivers/virtio/virtio_ring.c |   28 +---
  include/linux/virtio_ring.h  |   14 +-
  2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1ca8890..7729aba 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -89,9 +89,6 @@ struct vring_virtqueue
/* Number we've added since last sync. */
unsigned int num_added;

-   /* Last used index we've seen. */
-   u16 last_used_idx;
-
/* How to notify other side. FIXME: commonalize hcalls! */
void (*notify)(struct virtqueue *vq);

@@ -285,12 +282,13 @@ static void detach_buf(struct vring_virtqueue *vq, 
unsigned int head)

  static inline bool more_used(const struct vring_virtqueue *vq)
  {
-   return vq-last_used_idx != vq-vring.used-idx;
+   return *vq-vring.last_used_idx != vq-vring.used-idx;
  }

  void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
  {
struct vring_virtqueue *vq = to_vvq(_vq);
+   struct vring_used_elem *u;
void *ret;
unsigned int i;

@@ -307,12 +305,13 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned 
int *len)
return NULL;
}

-   /* Only get used array entries after they have been exposed by host. */
-   virtio_rmb();
-
-   i = vq-vring.used-ring[vq-last_used_idx%vq-vring.num].id;
-   *len = vq-vring.used-ring[vq-last_used_idx%vq-vring.num].len;
+   /* Only get used array entries after they have been exposed by host.
+* Need mb(), not just rmb() because we write last_used_idx below. */
+   virtio_mb();

+   u =vq-vring.used-ring[*vq-vring.last_used_idx % vq-vring.num];
+   i = u-id;
+   *len = u-len;
if (unlikely(i= vq-vring.num)) {
BAD_RING(vq, id %u out of range\n, i);
return NULL;
@@ -325,7 +324,8 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int 
*len)
/* detach_buf clears data, so grab it now. */
ret = vq-data[i];
detach_buf(vq, i);
-   vq-last_used_idx++;
+   (*vq-vring.last_used_idx)++;
+
END_USE(vq);
return ret;
  }
@@ -431,7 +431,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
vq-vq.name = name;
vq-notify = notify;
vq-broken = false;
-   vq-last_used_idx = 0;
+   *vq-vring.last_used_idx = 0;
vq-num_added = 0;
list_add_tail(vq-vq.list,vdev-vqs);
  #ifdef DEBUG
@@ -440,6 +440,10 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,

vq-indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);

+   /* We publish used index whether Host offers it or not: if not, it's
+* junk space anyway.  But calling this acknowledges the feature. */
+   virtio_has_feature(vdev, VIRTIO_RING_F_PUBLISH_USED);
+
/* No callback?  Tell other side not to bother us. */
if (!callback)
vq-vring.avail-flags |= VRING_AVAIL_F_NO_INTERRUPT;
@@ -473,6 +477,8 @@ void 

Re: Copy and paste feature across guest and host

2010-05-27 Thread Dor Laor

On 05/27/2010 12:17 PM, Tomasz Chmielewski wrote:

Just installed Fedora13 as guest on KVM.  However there is no
cross-platform copy and paste feature. I trust I have setup this
feature on other guest sometime before. Unfortunately I can't the
relevant document. Could you please shed me some light. Pointer
would be appreciated. TIA


Did you try;

# modprobe virtio-copypaste

?


Seriously, qemu does not make it easy (well, its GUI does not make most
things easy) and you'll need a tool which synchronizes the clipboard
between two machines (google for qemu copy paste?).


There is no cutpaste at the moment. The plan is to enable it through 
virtio-serial and have spice  vnc use it. Cannot guarantee a date but 
it shouldn't be too long.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm still unable to load option rom extboot.bin

2009-09-09 Thread Dor Laor

On 09/09/2009 04:47 PM, Lucas Meneghel Rodrigues wrote:

Hi folks, seems like we are still facing a build problem on qemu-kvm:
The option rom is failing to boot:

09/04 11:12:08 DEBUG|kvm_vm:0384| Running qemu command:
/usr/local/autotest/tests/kvm/qemu -name 'vm1' -monitor 
unix:/tmp/monitor-20090904-111208-9nyy,server,nowait -drive 
file=/usr/local/autotest/tests/kvm/images/fc9-32.qcow2,if=ide,boot=on -net 
nic,vlan=0 -net user,vlan=0 -m 512 -cdrom 
/usr/local/autotest/tests/kvm/isos/linux/Fedora-9-i386-DVD.iso -redir 
tcp:5000::22 -vnc :0
09/04 11:12:08 DEBUG| kvm_utils:0858| (qemu) Could not load option rom 
'extboot.bin'

So qemu is still not able to locate roms when it needs them. The test
could work around this as pointed out by Marcelo, by copying the roms to
the right repository, but that's not desirable, it should be fixed on
the build system appropriately.

I will do my best to allways watch closely the results of daily git
testing and report on problems.


Avi just committed it:
[COMMIT master] qemu-kvm: Install built option roms




Thanks!

Lucas

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm network latency, higher with virtio ?

2009-09-16 Thread Dor Laor

On 09/16/2009 10:27 AM, Michael S. Tsirkin wrote:

On Tue, Sep 15, 2009 at 05:15:09PM +0200, Luca Bigliardi wrote:

Hi,
I'm running some tests between two linux instances bridged together.

If I try to ping 10 times I obtain the following results:

-net nic,model=virtio -net tap :
 rtt min/avg/max/mdev = 0.756/0.967/2.115/0.389 ms

-net nic,model=rtl8139 -net tap :
 rtt min/avg/max/mdev = 0.301/0.449/1.173/0.248 ms

So it seems with virtio the latency is higher. Is it normal?


Yes, the main reason is the TX timer it uses for interrupt/vm exit mitigation.


Originally we used the tx mitigation timer in order to provide better 
throughput on the expense of latency.
Measurements of older versions of virtio proved that we can cancel this 
timer and achieve better latency while not hurting throughput.


Vhost wouldn't use it. For the time being until be get vhost, we should 
probably remove it from qemu.





The results I'm reporting were obtained with
- host
   qemu-kvm 0.11-rc2
   kvm-kmod-2.6.30.1
   kernel: 2.6.30.5 (HIGH_RES_TIMERS=y as suggested in
 http://www.linux-kvm.org/page/Virtio )
- guest
   kernel: 2.6.31

but I also tested older versions always obtaining latency values at least two
times higher than rtl8139/e1000 .

Thank you,
Luca

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST PATCH 1/2] Add KSM test

2009-09-16 Thread Dor Laor

On 09/15/2009 09:58 PM, Jiri Zupka wrote:

After a quick review I have the following questions:
1. Why did you implement the guest tool in 'c' and not in python?
   Python is much simpler and you can share some code with the server.
   This 'test protocol' would also be easier to understand this way.


We need speed and the precise control of allocate memory in pages.


2. IMHO there is no need to use select, you can do blocking read.


We replace socket communication by interactive program communication via 
ssh/telnet


3. Also you can use plain malloc without the more complex ( a bit) mmap.


We need address exactly the memory pages. We can't allow shift of the data in 
memory.


You can use the tmpfs+dd idea instead of the specific program as I 
detailed before. Maybe some other binary can be used. My intention is to 
simplify the test/environment as much as possible.




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST PATCH 1/2] Add KSM test

2009-09-16 Thread Dor Laor

On 09/16/2009 04:09 PM, Jiri Zupka wrote:


- Dor Laordl...@redhat.com  wrote:


On 09/15/2009 09:58 PM, Jiri Zupka wrote:

After a quick review I have the following questions:
1. Why did you implement the guest tool in 'c' and not in python?
Python is much simpler and you can share some code with the

server.

This 'test protocol' would also be easier to understand this

way.


We need speed and the precise control of allocate memory in pages.


2. IMHO there is no need to use select, you can do blocking read.


We replace socket communication by interactive program communication

via ssh/telnet



3. Also you can use plain malloc without the more complex ( a bit)

mmap.


We need address exactly the memory pages. We can't allow shift of

the data in memory.

You can use the tmpfs+dd idea instead of the specific program as I
detailed before. Maybe some other binary can be used. My intention is
to
simplify the test/environment as much as possible.



We need compatibility with others system, like Windows etc..
We want to add support for others system in next version


KSM is a host feature and should be agnostic to the guest.
Also I don't think your code will compile on windows...






--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Binary Windows guest drivers are released

2009-09-24 Thread Dor Laor

On 09/24/2009 11:59 PM, Javier Guerra wrote:

On Thu, Sep 24, 2009 at 3:38 PM, Kenni Lundke...@kelu.dk  wrote:

I've done some benchmarking with the drivers on Windows XP SP3 32bit,
but it seems like using the VirtIO drivers are slower than the IDE drivers in
(almost) all cases. Perhaps I've missed something or does the driver still
need optimization?


very interesting!

it seems that IDE wins on all the performance numbers, but VirtIO
always has lower CPU utilization.  i guess this is guest CPU %, right?
it would also be interesting to compare the CPU usage from the host
point of view, since a lower 'off-guest' CPU usage is very important
for scaling to many guests doing I/O.



Can you re-try it with setting the host ioscheduler to deadline?
Virtio backend (thread pool) is sensitive for it.

These drivers are mainly tweaked for win2k3 and win2k8. We once had 
queue depth settings in the driver, not sure we still have it, Vadim, 
can you add more info?


Also virtio should provide IO parallelism as opposed to IDE. I don't 
think your test test it. Virtio can provide more virtual drives than the 
max 4 that ide offers.


Dor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST PATCH 1/2] Add KSM test

2009-09-30 Thread Dor Laor

On 09/29/2009 05:50 PM, Lucas Meneghel Rodrigues wrote:

On Fri, 2009-09-25 at 05:22 -0400, Jiri Zupka wrote:

- Dor Laordl...@redhat.com  wrote:


On 09/16/2009 04:09 PM, Jiri Zupka wrote:


- Dor Laordl...@redhat.com   wrote:


On 09/15/2009 09:58 PM, Jiri Zupka wrote:

After a quick review I have the following questions:
1. Why did you implement the guest tool in 'c' and not in

python?

 Python is much simpler and you can share some code with the

server.

 This 'test protocol' would also be easier to understand this

way.


We need speed and the precise control of allocate memory in

pages.



2. IMHO there is no need to use select, you can do blocking

read.


We replace socket communication by interactive program

communication

via ssh/telnet



3. Also you can use plain malloc without the more complex ( a

bit)

mmap.


We need address exactly the memory pages. We can't allow shift of

the data in memory.

You can use the tmpfs+dd idea instead of the specific program as I
detailed before. Maybe some other binary can be used. My intention

is

to
simplify the test/environment as much as possible.



We need compatibility with others system, like Windows etc..
We want to add support for others system in next version


KSM is a host feature and should be agnostic to the guest.
Also I don't think your code will compile on windows...


Yes, I think you have true.


First of all, sorry, I am doing the best I can to review carefully all
the patch queue, and as KSM is a more involved feature that I am not
very familiar with, I need a bit more time to review it!


But because we need generate special data to pages in memory.
We need use script on guest side of test. Because communication
over ssh is to slow to transfer lot of GB of special data to guests.

We can use optimized C program which is 10x and more faster than
python script on native system. Heavy load of virtual guest can
make some performance problem.


About code compiling under windows, I guess making a native windows c or
c++ program is an option, I generally agree with your reasoning, this
case seems to be better covered with a c program. Will get into it in
more detail ASAP...


We can use tmpfs but with python script to generate special data.
We can't use dd with random because we need test some special case.
(change only last 96B of page etc.. )


What do you think about it?



I think it can be done with some simple scripting and it will be fast 
enough and more importantly, easier to understand and to change in the 
future.


Here is a short example for creating lots of identical pages that 
contain '0' apart for the last two bytes. If you'll run it in a single 
guest you should expect to save lots of memory. Then you can change the 
last bytes to random value and see the memory consumption grow:

[Remember to cancel the guest swap to keep it in the guest ram]

dd if=/dev/zero of=template  count=1 bs=4094
echo '1'  template
cp template large_file
for ((i=0;i10;i++)) do dd if=large_file of=large_file conv=notrunc 
oflag=append  /dev/null 21 ; done


It creates a 4k*2^10 file with identical pages (since it's on tmpfs with 
no swap)


Can you try it? It should be far simpler than the original option.

Thanks,
Dor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] Test 802.1Q vlan of nic

2009-10-19 Thread Dor Laor

On 10/15/2009 11:48 AM, Amos Kong wrote:


Test 802.1Q vlan of nic, config it by vconfig command.
   1) Create two VMs
   2) Setup guests in different vlan by vconfig and test communication by ping
  using hard-coded ip address
   3) Setup guests in same vlan and test communication by ping
   4) Recover the vlan config

Signed-off-by: Amos Kongak...@redhat.com
---
  client/tests/kvm/kvm_tests.cfg.sample |6 +++
  client/tests/kvm/tests/vlan_tag.py|   73 +
  2 files changed, 79 insertions(+), 0 deletions(-)
  mode change 100644 =  100755 client/tests/kvm/scripts/qemu-ifup


In general the above should come as an independent patch.


  create mode 100644 client/tests/kvm/tests/vlan_tag.py

diff --git a/client/tests/kvm/kvm_tests.cfg.sample 
b/client/tests/kvm/kvm_tests.cfg.sample
index 9ccc9b5..4e47767 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -166,6 +166,12 @@ variants:
  used_cpus = 5
  used_mem = 2560

+- vlan_tag:  install setup
+type = vlan_tag
+subnet2 = 192.168.123
+vlans = 10 20


If we want to be fanatic and safe we should dynamically choose subnet 
and vlans numbers that are not used on the host instead of hard code it.



+nic_mode = tap
+nic_model = e1000


Why only e1000? Let's test virtio and rtl8139 as well. Can't you inherit 
the nic model from the config?




  - autoit:   install setup
  type = autoit
diff --git a/client/tests/kvm/scripts/qemu-ifup 
b/client/tests/kvm/scripts/qemu-ifup
old mode 100644
new mode 100755
diff --git a/client/tests/kvm/tests/vlan_tag.py 
b/client/tests/kvm/tests/vlan_tag.py
new file mode 100644
index 000..15e763f
--- /dev/null
+++ b/client/tests/kvm/tests/vlan_tag.py
@@ -0,0 +1,73 @@
+import logging, time
+from autotest_lib.client.common_lib import error
+import kvm_subprocess, kvm_test_utils, kvm_utils
+
+def run_vlan_tag(test, params, env):
+
+Test 802.1Q vlan of nic, config it by vconfig command.
+
+1) Create two VMs
+2) Setup guests in different vlan by vconfig and test communication by ping
+   using hard-coded ip address
+3) Setup guests in same vlan and test communication by ping
+4) Recover the vlan config
+
+@param test: Kvm test object
+@param params: Dictionary with the test parameters.
+@param env: Dictionary with test environment.
+
+
+vm = []
+session = []
+subnet2 = params.get(subnet2)
+vlans = params.get(vlans).split()
+
+vm.append(kvm_test_utils.get_living_vm(env, %s % params.get(main_vm)))
+
+params_vm2 = params.copy()
+params_vm2['image_snapshot'] = yes
+params_vm2['kill_vm_gracefully'] = no
+params_vm2[address_index] = int(params.get(address_index, 0))+1
+vm.append(vm[0].clone(vm2, params_vm2))
+kvm_utils.env_register_vm(env, vm2, vm[1])
+if not vm[1].create():
+raise error.TestError(VM 1 create faild)



The whole 7-8 lines above should be grouped as a function to clone 
existing VM. It should be part of kvm autotest infrastructure.


Besides that, it looks good.


+
+for i in range(2):
+session.append(kvm_test_utils.wait_for_login(vm[i]))
+
+try:
+vconfig_cmd = vconfig add eth0 %s;ifconfig eth0.%s %s.%s
+# Attempt to configure IPs for the VMs and record the results in
+# boolean variables
+# Make vm1 and vm2 in the different vlan
+
+ip_config_vm1_ok = (session[0].get_command_status(vconfig_cmd
+   % (vlans[0], vlans[0], subnet2, 11)) == 0)
+ip_config_vm2_ok = (session[1].get_command_status(vconfig_cmd
+   % (vlans[1], vlans[1], subnet2, 12)) == 0)
+if not ip_config_vm1_ok or not ip_config_vm2_ok:
+raise error.TestError, Fail to config VMs ip address
+ping_diff_vlan_ok = (session[0].get_command_status(
+ ping -c 2 %s.12 % subnet2) == 0)
+
+if ping_diff_vlan_ok:
+raise error.TestFail(VM 2 is unexpectedly pingable in different 
+ vlan)
+# Make vm2 in the same vlan with vm1
+vlan_config_vm2_ok = (session[1].get_command_status(
+  vconfig rem eth0.%s;vconfig add eth0 %s;
+  ifconfig eth0.%s %s.12 %
+  (vlans[1], vlans[0], vlans[0], subnet2)) == 0)
+if not vlan_config_vm2_ok:
+raise error.TestError, Fail to config ip address of VM 2
+
+ping_same_vlan_ok = (session[0].get_command_status(
+ ping -c 2 %s.12 % subnet2) == 0)
+if not ping_same_vlan_ok:
+raise error.TestFail(Fail to ping the guest in same vlan)
+finally:
+# Clean the vlan config
+for i in range(2):
+session[i].sendline(vconfig rem eth0.%s % vlans[0])
+

Re: Do I set up separate bridges for each guest?

2009-10-20 Thread Dor Laor

On 10/20/2009 04:37 AM, Neil Aggarwal wrote:

Hello:

I am installing KVM on top of CentOS 5.4 so I can
have two guests running on my host. I would like to
have the host and guests accessible from my
network.

Do I set up separate bridges for each guest or would
they somehow be shared?

If I set up separate bridges, I think I need to do
in /etc/sysconfig/network-scripts on the host machine:

1. Set up ifcfg-eth0 with the ip information of the
host (For example 192.168.2.200)
2. Set up ifcfg-eth0:1 for the first guest.  It will
have BRIDGE=br1
3. Create ifcfg-br1 with the IP info for the first
guest (For example 192.168.2.201)
4. Set up ifcfg-eth0:2 for the second guest.  It will
have BRIDGE=br2
5. Create ifcfg-br2 with the IP info for the second
guest (For example 192.168.2.202)

Is this correct or did I miss something?


The simplest thing is to use a single bridge for all -
The physical nic should be part of it and supply the outside world 
connection. The physical nic doesn't need an IP and the bridge should 
own it. All vms can use this bridge.


cat /etc/sysconfig/network-scripts/ifcfg-br0
DEVICE=br0
TYPE=Bridge
ONBOOT=yes
GATEWAYDEV=''
BOOTPROTO=dhcp
DELAY=0
HWADDR=00:14:5E:17:D0:04
# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=none
HWADDR=00:14:5E:17:D0:04
BRIDGE=br0




Thanks,
Neil


--
Neil Aggarwal, (281)846-8957, www.JAMMConsulting.com
Will your e-commerce site go offline if you have
a DB server failure, fiber cut, flood, fire, or other disaster?
If so, ask about our geographically redundant database system.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] Test 802.1Q vlan of nic

2009-10-21 Thread Dor Laor

On 10/21/2009 03:46 PM, Uri Lublin wrote:

On 10/21/2009 12:37 PM, Amos Kong wrote:

On Tue, Oct 20, 2009 at 09:19:50AM -0400, Michael Goldish wrote:

- Dor Laordl...@redhat.com  wrote:

On 10/15/2009 11:48 AM, Amos Kong wrote:

For the sake of safety maybe we should start both VMs with -snapshot.
Dor, what do you think?  Is it safe to start 2 VMs with the same disk
image
when only one of them uses -snapshot?


Setup the second VM with -snapshot is enough. The image can only be
R/W by 1th VM.



Actually, I agree with Michael. If both VMs use the same disk image, it
is safer to setup both VMs with -snapshot. When the first VM writes to
the disk-image the second VM may be affected.


That's a must. If only one VM uses -snapshot, its base will get written 
and the snapshot will get obsolete.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KSM and HugePages

2009-10-24 Thread Dor Laor

On 10/23/2009 08:21 PM, David Martin wrote:

Does KSM support HugePages?  Reading the Fedora 12 feature list I notice this:
Using huge pages for guest memory does have a downside, however - you
can no longer swap nor balloon guest memory.
However it is unclear to me if that includes KSM.


ksm pages are only standard 4k pages.



If I use 1GB HugePages and KSM (assuming this is possible), does that
mean the entire 1GB page has to match another for them to merge?  Are
there any other downsides to using them other than swapping and
ballooning?


It needs to be available at VM creation time.
Also the tlb size for use pages is smaller although it does bring better 
results than 4k pages.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration

2009-10-27 Thread Dor Laor

On 10/12/2009 05:28 PM, Lucas Meneghel Rodrigues wrote:

Hi Michael, I am reviewing your patchset and have just a minor remark
to make here:

On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldishmgold...@redhat.com  wrote:

This patch adds a new test that checks the timedrift introduced by migrations.
It uses the same parameters used by the timedrift test to get the guest time.
In addition, the number of migrations the test performs is controlled by the
parameter 'migration_iterations'.

Signed-off-by: Michael Goldishmgold...@redhat.com
---
  client/tests/kvm/kvm_tests.cfg.sample  |   33 ---
  client/tests/kvm/tests/timedrift_with_migration.py |   95 
  2 files changed, 115 insertions(+), 13 deletions(-)
  create mode 100644 client/tests/kvm/tests/timedrift_with_migration.py

diff --git a/client/tests/kvm/kvm_tests.cfg.sample 
b/client/tests/kvm/kvm_tests.cfg.sample
index 540d0a2..618c21e 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -100,19 +100,26 @@ variants:
 type = linux_s3

 - timedrift:install setup
-type = timedrift
 extra_params +=  -rtc-td-hack
-# Pin the VM and host load to CPU #0
-cpu_mask = 0x1
-# Set the load and rest durations
-load_duration = 20
-rest_duration = 20
-# Fail if the drift after load is higher than 50%
-drift_threshold = 50
-# Fail if the drift after the rest period is higher than 10%
-drift_threshold_after_rest = 10
-# For now, make sure this test is executed alone
-used_cpus = 100
+variants:
+- with_load:
+type = timedrift
+# Pin the VM and host load to CPU #0
+cpu_mask = 0x1



Let's use -smp 2 always.

btw: we need not to parallel the load test with standard tests.


+# Set the load and rest durations
+load_duration = 20
+rest_duration = 20


Even the default duration here seems way too brief here, is there any
reason why 20s was chosen instead of, let's say, 1800s? I am under the
impression that 20s of load won't be enough to cause any noticeable
drift...


+# Fail if the drift after load is higher than 50%
+drift_threshold = 50
+# Fail if the drift after the rest period is higher than 10%
+drift_threshold_after_rest = 10


I am also curious about those tresholds and the reasoning behind them.
Is there any official agreement on what we consider to be an
unreasonable drift?

Another thing that struck me out is drift calculation: On the original
timedrift test, the guest drift is normalized against the host drift:

drift = 100.0 * (host_delta - guest_delta) / host_delta

While in the new drift tests, we consider only the guest drift. I
believe is better to normalize all tests based on one drift
calculation criteria, and those values should be reviewed, and at
least a certain level of agreement on our development community should
be reached.


I think we don't need to calculate drift ratio. We should define a 
threshold in seconds, let's say 2 seconds. Beyond that, there should not 
be any drift.


Do we support migration to a different host? We should, especially in 
this test too. The destination host reading should also be used.


Apart for that, good patchset, and good thing you refactored some of the 
code to shared utils.




Other than this concern that came to my mind, the new tests look good
and work fine here. I had to do a slight rebase in one of the patches,
very minor stuff. The default values and the drift calculation can be
changed on a later time. Thanks!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM Fault Tolerance: Kemari for KVM

2009-11-12 Thread Dor Laor

On 11/09/2009 05:53 AM, Fernando Luis Vázquez Cao wrote:

Hi all,

It has been a while coming, but we have finally started work on
Kemari's port to KVM. For those not familiar with it, Kemari provides
the basic building block to create a virtualization-based fault
tolerant machine: a virtual machine synchronization mechanism.

Traditional high availability solutions can be classified in two
groups: fault tolerant servers, and software clustering.

Broadly speaking, fault tolerant servers protect us against hardware
failures and, generally, rely on redundant hardware (often
proprietary), and hardware failure detection to trigger fail-over.

On the other hand, software clustering, as its name indicates, takes
care of software failures and usually requires a standby server whose
software configuration for the part we are trying to make fault
tolerant must be identical to that of the active server.

Both solutions may be applied to virtualized environments. Indeed,
the current incarnation of Kemari (Xen-based) brings fault tolerant
server-like capabilities to virtual machines and integration with
existing HA stacks (Heartbeat, RHCS, etc) is under consideration.

After some time in the drawing board we completed the basic design of
Kemari for KVM, so we are sending an RFC at this point to get early
feedback and, hopefully, get things right from the start. Those
already familiar with Kemari and/or fault tolerance may want to skip
the Background and go directly to the design and implementation
bits.

This is a pretty long write-up, but please bear with me.

== Background ==

We started to play around with continuous virtual synchronization
technology about 3 years ago. As development progressed and, most
importantly, we got the first Xen-based working prototypes it became
clear that we needed a proper name for our toy: Kemari.

The goal of Kemari is to provide a fault tolerant platform for
virtualization environments, so that in the event of a hardware
failure the virtual machine fails over from compromised to properly
operating hardware (a physical machine) in a way that is completely
transparent to the guest operating system.

Although hardware based fault tolerant servers and HA servers
(software clustering) have been around for a (long) while, they
typically require specifically designed hardware and/or modifications
to applications. In contrast, by abstracting hardware using
virtualization, Kemari can be used on off-the-shelf hardware and no
application modifications are needed.

After a period of in-house development the first version of Kemari for
Xen was released in Nov 2008 as open source. However, by then it was
already pretty clear that a KVM port would have several
advantages. First, KVM is integrated into the Linux kernel, which
means one gets support for a wide variety of hardware for
free. Second, and in the same vein, KVM can also benefit from Linux'
low latency networking capabilities including RDMA, which is of
paramount importance for a extremely latency-sensitive functionality
like Kemari. Last, but not the least, KVM and its community is growing
rapidly, and there is increasing demand for Kemari-like functionality
for KVM.

Although the basic design principles will remain the same, our plan is
to write Kemari for KVM from scratch, since there does not seem to be
much opportunity for sharing between Xen and KVM.

== Design outline ==

The basic premise of fault tolerant servers is that when things go
awry with the hardware the running system should transparently
continue execution on an alternate physical host. For this to be
possible the state of the fallback host has to be identical to that of
the primary.

Kemari runs paired virtual machines in an active-passive configuration
and achieves whole-system replication by continuously copying the
state of the system (dirty pages and the state of the virtual devices)
from the active node to the passive node. An interesting implication
of this is that during normal operation only the active node is
actually executing code.

Another possible approach is to run a pair of systems in lock-step
(à la VMware FT). Since both the primary and fallback virtual machines
are active keeping them synchronized is a complex task, which usually
involves carefully injecting external events into both virtual
machines so that they result in identical states.

The latter approach is extremely architecture specific and not SMP
friendly. This spurred us to try the design that became Kemari, which
we believe lends itself to further optimizations.

== Implementation ==

The first step is to encapsulate the machine to be protected within a
virtual machine. Then the live migration functionality is leveraged to
keep the virtual machines synchronized.

Whereas during live migration dirty pages can be sent asynchronously
from the primary to the fallback server until the ratio of dirty pages
is low enough to guarantee very short downtimes, when it comes to
fault tolerance 

Re: virtio disk slower than IDE?

2009-11-15 Thread Dor Laor

On 11/14/2009 04:23 PM, Gordan Bobic wrote:

I just tried paravirtualized virtio block devices, and my tests show
that they are approximately 30% slower than emulated IDE devices. I'm
guessing this isn't normal. Is this a known issue or am I likely to have
mosconfigured something? I'm using 64-bit RHEL/CentOS 5 (both host and
guest).


Please try to change the io scheduler on the host to io scheduler, it 
should boost your performance back.




Thanks.

Gordan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio disk slower than IDE?

2009-11-15 Thread Dor Laor

On 11/15/2009 02:00 PM, Gordan Bobic wrote:

Dor Laor wrote:

On 11/14/2009 04:23 PM, Gordan Bobic wrote:

I just tried paravirtualized virtio block devices, and my tests show
that they are approximately 30% slower than emulated IDE devices. I'm
guessing this isn't normal. Is this a known issue or am I likely to have
mosconfigured something? I'm using 64-bit RHEL/CentOS 5 (both host and
guest).


Please try to change the io scheduler on the host to io scheduler, it
should boost your performance back.


I presume you mean the deadline io scheduler. I tried that (kernel
parameter elevator=deadline) and it made no measurable difference
compared to the cfq scheduler.


What version of kvm do you use? Is it rhel5.4?
Can you post the qemu cmdline and the perf test in the guest?

Lastly, do you use cache=wb on qemu? it's just a fun mode, we use 
cache=off only.




Gordan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] KVM Fault Tolerance: Kemari for KVM

2009-11-15 Thread Dor Laor

On 11/13/2009 01:48 PM, Yoshiaki Tamura wrote:

Hi,

Thanks for your comments!

Dor Laor wrote:

On 11/09/2009 05:53 AM, Fernando Luis Vázquez Cao wrote:

Hi all,

It has been a while coming, but we have finally started work on
Kemari's port to KVM. For those not familiar with it, Kemari provides
the basic building block to create a virtualization-based fault
tolerant machine: a virtual machine synchronization mechanism.

Traditional high availability solutions can be classified in two
groups: fault tolerant servers, and software clustering.

Broadly speaking, fault tolerant servers protect us against hardware
failures and, generally, rely on redundant hardware (often
proprietary), and hardware failure detection to trigger fail-over.

On the other hand, software clustering, as its name indicates, takes
care of software failures and usually requires a standby server whose
software configuration for the part we are trying to make fault
tolerant must be identical to that of the active server.

Both solutions may be applied to virtualized environments. Indeed,
the current incarnation of Kemari (Xen-based) brings fault tolerant
server-like capabilities to virtual machines and integration with
existing HA stacks (Heartbeat, RHCS, etc) is under consideration.

After some time in the drawing board we completed the basic design of
Kemari for KVM, so we are sending an RFC at this point to get early
feedback and, hopefully, get things right from the start. Those
already familiar with Kemari and/or fault tolerance may want to skip
the Background and go directly to the design and implementation
bits.

This is a pretty long write-up, but please bear with me.

== Background ==

We started to play around with continuous virtual synchronization
technology about 3 years ago. As development progressed and, most
importantly, we got the first Xen-based working prototypes it became
clear that we needed a proper name for our toy: Kemari.

The goal of Kemari is to provide a fault tolerant platform for
virtualization environments, so that in the event of a hardware
failure the virtual machine fails over from compromised to properly
operating hardware (a physical machine) in a way that is completely
transparent to the guest operating system.

Although hardware based fault tolerant servers and HA servers
(software clustering) have been around for a (long) while, they
typically require specifically designed hardware and/or modifications
to applications. In contrast, by abstracting hardware using
virtualization, Kemari can be used on off-the-shelf hardware and no
application modifications are needed.

After a period of in-house development the first version of Kemari for
Xen was released in Nov 2008 as open source. However, by then it was
already pretty clear that a KVM port would have several
advantages. First, KVM is integrated into the Linux kernel, which
means one gets support for a wide variety of hardware for
free. Second, and in the same vein, KVM can also benefit from Linux'
low latency networking capabilities including RDMA, which is of
paramount importance for a extremely latency-sensitive functionality
like Kemari. Last, but not the least, KVM and its community is growing
rapidly, and there is increasing demand for Kemari-like functionality
for KVM.

Although the basic design principles will remain the same, our plan is
to write Kemari for KVM from scratch, since there does not seem to be
much opportunity for sharing between Xen and KVM.

== Design outline ==

The basic premise of fault tolerant servers is that when things go
awry with the hardware the running system should transparently
continue execution on an alternate physical host. For this to be
possible the state of the fallback host has to be identical to that of
the primary.

Kemari runs paired virtual machines in an active-passive configuration
and achieves whole-system replication by continuously copying the
state of the system (dirty pages and the state of the virtual devices)
from the active node to the passive node. An interesting implication
of this is that during normal operation only the active node is
actually executing code.

Another possible approach is to run a pair of systems in lock-step
(à la VMware FT). Since both the primary and fallback virtual machines
are active keeping them synchronized is a complex task, which usually
involves carefully injecting external events into both virtual
machines so that they result in identical states.

The latter approach is extremely architecture specific and not SMP
friendly. This spurred us to try the design that became Kemari, which
we believe lends itself to further optimizations.

== Implementation ==

The first step is to encapsulate the machine to be protected within a
virtual machine. Then the live migration functionality is leveraged to
keep the virtual machines synchronized.

Whereas during live migration dirty pages can be sent asynchronously
from the primary to the fallback server until the ratio

Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration

2009-11-16 Thread Dor Laor

On 10/28/2009 08:54 AM, Michael Goldish wrote:


- Dor Laordl...@redhat.com  wrote:


On 10/12/2009 05:28 PM, Lucas Meneghel Rodrigues wrote:

Hi Michael, I am reviewing your patchset and have just a minor

remark

to make here:

On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldishmgold...@redhat.com

  wrote:

This patch adds a new test that checks the timedrift introduced by

migrations.

It uses the same parameters used by the timedrift test to get the

guest time.

In addition, the number of migrations the test performs is

controlled by the

parameter 'migration_iterations'.

Signed-off-by: Michael Goldishmgold...@redhat.com
---
   client/tests/kvm/kvm_tests.cfg.sample  |   33

---

   client/tests/kvm/tests/timedrift_with_migration.py |   95



   2 files changed, 115 insertions(+), 13 deletions(-)
   create mode 100644

client/tests/kvm/tests/timedrift_with_migration.py


diff --git a/client/tests/kvm/kvm_tests.cfg.sample

b/client/tests/kvm/kvm_tests.cfg.sample

index 540d0a2..618c21e 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -100,19 +100,26 @@ variants:
  type = linux_s3

  - timedrift:install setup
-type = timedrift
  extra_params +=  -rtc-td-hack
-# Pin the VM and host load to CPU #0
-cpu_mask = 0x1
-# Set the load and rest durations
-load_duration = 20
-rest_duration = 20
-# Fail if the drift after load is higher than 50%
-drift_threshold = 50
-# Fail if the drift after the rest period is higher than

10%

-drift_threshold_after_rest = 10
-# For now, make sure this test is executed alone
-used_cpus = 100
+variants:
+- with_load:
+type = timedrift
+# Pin the VM and host load to CPU #0
+cpu_mask = 0x1



Let's use -smp 2 always.


We can also just make -smp 2 the default for all tests. Does that sound
good?


Yes




btw: we need not to parallel the load test with standard tests.


We already don't, because the load test has used_cpus = 100 which
forces it to run alone.


Soon I'll have 100 on my laptop :), better change it to -1 or MAX_INT




+# Set the load and rest durations
+load_duration = 20
+rest_duration = 20


Even the default duration here seems way too brief here, is there

any

reason why 20s was chosen instead of, let's say, 1800s? I am under

the

impression that 20s of load won't be enough to cause any noticeable
drift...


+# Fail if the drift after load is higher than 50%
+drift_threshold = 50
+# Fail if the drift after the rest period is

higher than 10%

+drift_threshold_after_rest = 10


I am also curious about those tresholds and the reasoning behind

them.

Is there any official agreement on what we consider to be an
unreasonable drift?

Another thing that struck me out is drift calculation: On the

original

timedrift test, the guest drift is normalized against the host

drift:


drift = 100.0 * (host_delta - guest_delta) / host_delta

While in the new drift tests, we consider only the guest drift. I
believe is better to normalize all tests based on one drift
calculation criteria, and those values should be reviewed, and at
least a certain level of agreement on our development community

should

be reached.


I think we don't need to calculate drift ratio. We should define a
threshold in seconds, let's say 2 seconds. Beyond that, there should
not be any drift.


Are you talking about the timedrift with load or timedrift with
migration or reboot tests?  I was told that when running the load test
for e.g 60 secs, the drift should be given in % of that duration.
In the case of migration and reboot, absolute durations are used (in
seconds, no %).  Should we do that in the load test too?


Yes, but: during extreme load, we do predict that a guest *without* pv 
clock will drift and won't be able to catchup until the load stops and 
only then it will catchup. So my recommendation is to do the following:
- pvclock guest - can check with 'cat 
/sys/devices/system/clocksource/clocksource0/current_clocksource ' don't 
allow drift during huge loads.

  Exist (+safe) for rhel5.4 guests and ~2.6.29 (from 2.6.27).
- non-pv clock - run the load, stop the load, wait 5 seconds, measure time

For both, use absolute times.





Do we support migration to a different host? We should, especially in
this test too. The destination host reading should also be used.
Apart for that, good patchset, and good thing you refactored some of
the code to shared utils.


We don't, and it would be very messy to implement with the framework
right now.  We should probably do that as some sort of server side test,
but we don't have server side tests right now, so doing it may take a
little time and effort.  I got the 

Re: virtio disk slower than IDE?

2009-11-16 Thread Dor Laor

On 11/16/2009 08:11 PM, Charles Duffy wrote:

Gordan Bobic wrote:

Lastly, do you use cache=wb on qemu? it's just a fun mode, we use
cache=off only.


I don't see the option being set in the logs, so I'd guess it's
whatever qemu-kvm defaults to.


You can set this through libvirt by putting an element such as the
following within your disk element:

driver name='qemu' type='qcow2' cache='none'/


It's not needed on rhel5.4 qemu - we have cache=none as a default



(Setting the type is preferred to avoid security issues wherein a guest
writes an arbitrary qcow2 header to the beginning of a raw disk, reboots
and allows qemu's autodetection to decide that this formerly-raw disk
should now be treated as a delta against a file they otherwise might not
have access to read; as such, it's particularly important if you intend
that the type be raw).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [KVM-AUTOTEST] KSM-overcommit test v.2 (python version)

2009-11-22 Thread Dor Laor

On 11/17/2009 04:49 PM, Jiri Zupka wrote:

Hi,
   We find a little mistake with ending of allocator.py.
Because I send this patch today. I resend whole repaired patch again.



It sure is big improvment from the previous.
There are still many refactoring to be made to make it more readable.
Comments embedded.


- Original Message -
From: Jiri Zupkajzu...@redhat.com
To: autotestautot...@test.kernel.org, kvmkvm@vger.kernel.org
Cc:u...@redhat.com
Sent: Tuesday, November 17, 2009 12:52:28 AM GMT +01:00 Amsterdam / Berlin / 
Bern / Rome / Stockholm / Vienna
Subject: [Autotest] [KVM-AUTOTEST] KSM-overcommit test v.2 (python version)

Hi,
   based on your requirements we have created new version
of KSM-overcommit patch (submitted in September).

Describe:
   It tests KSM (kernel shared memory) with overcommit of memory.

Changelog:
   1) Based only on python (remove C code)
   2) Add new test (check last 96B)
   3) Separate test to (serial,parallel,both)
   4) Improve log and documentation
   5) Add perf constat to change time limit for waiting. (slow computer problem)

Functionality:
   KSM test start guests. They are connect to guest over ssh.
   Copy and run allocator.py to guests.
   Host can run any python command over Allocator.py loop on client side.

   Start run_ksm_overcommit.
   Define host and guest reserve variables (host_reserver,guest_reserver).
   Calculate amount of virtual machine and their memory based on variables
   host_mem and overcommit.
   Check KSM status.
   Create and start virtual guests.
   Test :
a] serial
 1) initialize, merge all mem to single page
 2) separate first guset mem
 3) separate rest of guest up to fill all mem
 4) kill all guests except for the last
 5) check if mem of last guest is ok
 6) kill guest
b] parallel
 1) initialize, merge all mem to single page
 2) separate mem of guest
 3) verification of guest mem
 4) merge mem to one block
 5) verification of guests mem
 6) separate mem of guests by 96B
 7) check if mem is all right
 8) kill guest
   allocator.py (client side script)
 After start they wait for command witch they make in client side.
 mem_fill class implement commands to fill, check mem and return
 error to host.

We need client side script because we need generate lot of GB of special
data.

Future plane:
   We want to add to log information about time spend in task.
   Information from log we want to use to automatic compute perf contant.
   And add New tests.










___
Autotest mailing list
autot...@test.kernel.org
http://test.kernel.org/cgi-bin/mailman/listinfo/autotest


ksm_overcommit.patch


diff --git a/client/tests/kvm/kvm_tests.cfg.sample 
b/client/tests/kvm/kvm_tests.cfg.sample
index ac9ef66..90f62bb 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -118,6 +118,23 @@ variants:
  test_name = npb
  test_control_file = npb.control

+- ksm_overcommit:
+# Don't preprocess any vms as we need to change it's params
+vms = ''
+image_snapshot = yes
+kill_vm_gracefully = no
+type = ksm_overcommit
+ksm_swap = yes   # yes | no
+no hugepages
+# Overcommit of host memmory
+ksm_overcommit_ratio = 3
+# Max paralel runs machine
+ksm_paralel_ratio = 4
+variants:
+- serial
+ksm_test_size = serial
+- paralel
+ksm_test_size = paralel

  - linux_s3: install setup unattended_install
  type = linux_s3
diff --git a/client/tests/kvm/tests/ksm_overcommit.py 
b/client/tests/kvm/tests/ksm_overcommit.py
new file mode 100644
index 000..408e711
--- /dev/null
+++ b/client/tests/kvm/tests/ksm_overcommit.py
@@ -0,0 +1,605 @@
+import logging, time
+from autotest_lib.client.common_lib import error
+import kvm_subprocess, kvm_test_utils, kvm_utils
+import kvm_preprocessing
+import random, string, math, os
+
+def run_ksm_overcommit(test, params, env):
+
+Test how KSM (Kernel Shared Memory) act with more than physical memory is
+used. In second part is also tested, how KVM can handle the situation,
+when the host runs out of memory (expected is to pause the guest system,
+wait until some process returns the memory and bring the guest back to 
life)
+
+@param test: kvm test object.
+@param params: Dictionary with test parameters.
+@param env: Dictionary with the test wnvironment.
+
+
+def parse_meminfo(rowName):
+
+Function get date from file /proc/meminfo
+
+@param rowName: Name of line in meminfo
+
+for line in open('/proc/meminfo').readlines():
+if line.startswith(rowName+:):
+name, amt, unit = line.split()
+return name, amt, unit
+
+def parse_meminfo_value(rowName):
+   

Re: [Autotest] [KVM-AUTOTEST] KSM-overcommit test v.2 (python version)

2009-11-29 Thread Dor Laor

On 11/26/2009 12:11 PM, Lukáš Doktor wrote:

Hello Dor,

Thank you for your review. I have few questions about your comments:

--- snip ---

+ stat += Guests memsh = {
+ for vm in lvms:
+ if vm.is_dead():
+ logging.info(Trying to get informations of death VM: %s
+ % vm.name)
+ continue


You can fail the entire test. Afterwards it will be hard to find the
issue.



Well if it's what the community wants, we can change it. We just didn't
want to lose information about the rest of the systems. Perhaps we can
set some DIE flag and after collecting all statistics raise an Error.


I don't think we need to continue testing if some thing as basic as VM 
died upon us.




--- snip ---

+ def get_true_pid(vm):
+ pid = vm.process.get_pid()
+ for i in range(1,10):
+ pid = pid + 1


What are you trying to do here? It's seems like a nasty hack that might
fail on load.




qemu has -pifile option. It works fine.



Yes and I'm really sorry for this ugly hack. The qemu command has
changed since the first patch was made. Nowadays the vm.pid returns
PID of the command itself, not the actual qemu process.
We need to have the PID of the actual qemu process, which is executed by
the command with PID vm.pid. That's why first I try finding the qemu
process as the following vm.pid PID. I haven't found another solution
yet (in case we don't want to change the qemu command back in the
framework).
We have tested this solution under heavy process load and either first
or second part always finds the right value.

--- snip ---

+ if (params['ksm_test_size'] == paralel) :
+ vmsc = 1
+ overcommit = 1
+ mem = host_mem
+ # 32bit system adjustment
+ if not params['image_name'].endswith(64):
+ logging.debug(Probably i386 guest architecture, \
+ max allocator mem = 2G)


Better not to relay on the guest name. You can test percentage of the
guest mem.



What do you mean by percentage of the guest mem? This adjustment is
made because the maximum memory for 1 process in 32 bit OS is 2GB.
Testing of the 'image_name' showed to be most reliable method we found.



It's not that important but it should be a convention of kvm autotest.
If that's acceptable, fine, otherwise, each VM will define it in the 
config file




--- snip ---

+ # Guest can have more than 2G but kvm mem + 1MB (allocator itself)
+ # can't
+ if (host_mem 2048):
+ mem = 2047
+
+
+ if os.popen(uname -i).readline().startswith(i386):
+ logging.debug(Host is i386 architecture, max guest mem is 2G)


There are bigger 32 bit guests.


How do you mean this note? We are testing whether the host machine is 32
bit. If so, the maximum process allocation is 2GB (similar case to 32
bit guest) but this time the whole qemu process (2GB qemu machine + 64
MB qemu overhead) can't exceeded 2GB.
Still the maximum memory used in test is the same (as we increase the VM
count - host_mem = quest_mem * vm_count; quest_mem is decreased,
vm_count is increased)


i386 guests with PAE mode (additional 4 bits) can have up to 16G ram on 
theory.




--- snip ---

+
+ # Copy the allocator.c into guests


.py


yes indeed.

--- snip ---

+ # Let kksmd works (until shared mem rich expected value)
+ shm = 0
+ i = 0
+ cmd = cat/proc/%d/statm % get_true_pid(vm)
+ while shm ksm_size:
+ if i 64:
+ logging.info(get_stat(lvms))
+ raise error.TestError(SHM didn't merged the memory until \
+ the DL on guest: %s% (vm.name))
+ logging.debug(Sleep(%d) % (ksm_size / 200 * perf_ratio))
+ time.sleep(ksm_size / 200 * perf_ratio)
+ try:
+ shm = int(os.popen(cmd).readline().split()[2])
+ shm = shm * 4 / 1024
+ i = i + 1


Either you have nice statistic calculation function or not.
I vote for the first case.



Yes, we are using the statistics function for the output. But in this
case we just need to know the shm value, not to log anything.
If this is a big problem even for others, we can split the statistics
function into 2:
int = _get_stat(vm) - returns shm value
string = get_stat(vm) - Uses _get_stats and creates a nice log output

--- snip ---

+  Check if memory in max loading guest is allright
+ logging.info(Starting phase 3b)
+
+  Kill rest of machine


We should have a function for it for all kvm autotest



you think lsessions[i].close() instead of (status,data) =
lsessions[i].get_command_status_output(exit;,20)?
Yes, it would be better.


+ for i in range(last_vm+1, vmsc):
+ (status,data) = lsessions[i].get_command_status_output(exit;,20)
+ if i == (vmsc-1):
+ logging.info(get_stat([lvms[i]]))
+ lvms[i].destroy(gracefully = False)


--- snip ---

+ def phase_paralel():
+  Paralel page spliting 
+ logging.info(Phase 1: Paralel page spliting)
+ # We have to wait until allocator is finished (it waits 5 seconds to
+ # clean the socket
+


The whole function is very similar to phase_separate_first_guest please
refactor them.


Yes, those functions are a bit similar. On the other hand there are 

Re: [Autotest] [KVM-autotest][RFC] 32/32 PAE bit guest system definition

2009-12-16 Thread Dor Laor

On 12/15/2009 09:04 PM, Lucas Meneghel Rodrigues wrote:

On Fri, Dec 11, 2009 at 2:34 PM, Jiri Zupkajzu...@redhat.com  wrote:

Hello,
  we write KSM_overcommit test. If we calculate memory for guest we need to know
which architecture is Guest. If it is a 32b or 32b with PAE or 64b system.
Because with a 32b guest we can allocate only 3100M +-.

Currently we use the name of disk's image file. Image file name ends with 64 or 
32.
Is there way how we can detect if guest machine run with PAE etc.. ?
Do you think that kvm_autotest can define parameter in kvm_tests.cfg which
configure determine if is guest 32b or 32b with PAE or 64b.


Hi Jiri, sorry for taking long to answer you, I am reviewing the
overcommit test.

About your question, I'd combine your approach of picking if host is
32/64 bit from the image name with looking on /cat/cpuinfo for PAE
support.


We might keep in KISS for the time being since 99% hosts installation 
are 64 bit only and many times only the guest can turn on PAE to 
practically use it.


So I'll go with naming only.



Let's go with this approach for the final version of the test, OK?

Thanks and congrats for the test, it's a great piece of work! More
comments soon,



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Benchmarking on CentOS 5

2008-06-02 Thread Dor Laor

On Mon, 2008-06-02 at 14:35 +0530, Amit Shah wrote:
 On Friday 30 May 2008 23:00:41 Farkas Levente wrote:
  this is out production server at the development department (10-15)
  people using it so actually if i tell them that i'll stop the host and
  all guests for max an hour it's acceptable, but more not really. it's
  run it type programs. from my experience in the last 6-12 months is that
  kvm is not production ready. as you can read from this list there are
  far too many change day-by-day which are very core. and this comes from
  the current state of kvm. which indicate that rh can't include in there
 
 You'll find the most stable version of kvm in the kernel that your 
 distribution ships. Linux-2.6.x (where x  20) should also be stable. The 
 development on kvm will continue to proceed at a fast pace, so you'll see 
 several kvm releases and this, as a result, is bound to bring in a few new 
 bugs in each iteration.
 
  imho the biggest problem with the current development of kvm that there
  is not a stable releases which is somewhat related to the current
  release number. eg kvm-0.5.x kvm-0.6.x would be better. but currently
 
 So the short answer is: if you're looking for a stable version of kvm, look 
 at 
 a kernel.org kernel or the kvm version provided to you by your distribution.
 
  kvm development is so fast that keep 2-3 parallel branch where there is
  a development and stable release seems to too much work.
  so to answer to your question i don't know:-(
 
 The stable branch of kvm is the one in the most-recently available Linux 
 kernel from kernel.org. kvm.git is the development version.

In the near future we'll publish a stable branch. There are actually 2
repositories: kernel repo, based on the latest kernel - 2.6.26 and a
userspace repository that will be based on kvm-68.

The idea is to maintain the above repos together and only apply bug
fixes. New features will come with every next kernel release.

We're in the process of creating an automatic test framework for kvm.
It will be open source framework based on autotest and similar to
Anthony's kvmtest. It will help stabilizing both the 'stable' branch and
the master.

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] KVM: Report hardware virtualization features

2008-06-22 Thread Dor Laor
On Sun, 2008-06-22 at 09:49 +0300, Avi Kivity wrote:
 Yang, Sheng wrote:
  From f02d2ccf01e8671d2da517f14a908d1df1cc42ad Mon Sep 17 00:00:00 2001
  From: Sheng Yang [EMAIL PROTECTED]
  Date: Thu, 19 Jun 2008 18:41:26 +0800
  Subject: [PATCH] KVM: Report hardware virtualization features
 
  The hardware virtualization technology evolves very fast. But currently it's
  hard to tell if your CPU support certain kind of HW technology without dig 
  into the source code.
 
  The patch introduced a virtual file called kvm_hw_feature_report under
  /sys/devices/system/kvm/kvm0 to show the mainly important current hardware
  virtualization feature, then it's pretty easy to tell if your CPU support
  some advanced virtualization technology now.
 

 
 Yes, this is definitely helpful.  However, I think that users will 
 expect cpu flags under /proc/cpuinfo.
 
 Perhaps we should add a new line 'virt flags' to /proc/cpuinfo?  I think 
 all the features are reported using msrs, so it can be done from 
 arch/x86/kernel/cpu/proc.c without involving kvm at all.
 

while I agree with Avi, it would be nice thought to see them on older
kernels. At least sprinkle a printk message.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] Fix time drift problem under high load when PIT is in use.

2008-06-29 Thread Dor Laor
On Sun, 2008-06-29 at 16:59 +0300, Gleb Natapov wrote:
 Count the number of interrupts that was lost due to interrupt coalescing
 and re-inject them back when possible. This fixes time drift problem when
 pit is used as a time source.
 
 Signed-off-by: Gleb Natapov [EMAIL PROTECTED]
 ---
 
  hw/i8254.c |   20 +++-
  1 files changed, 19 insertions(+), 1 deletions(-)
 
 diff --git a/hw/i8254.c b/hw/i8254.c
 index 4813b03..c4f0f46 100644
 --- a/hw/i8254.c
 +++ b/hw/i8254.c
 @@ -61,6 +61,8 @@ static PITState pit_state;
  
  static void pit_irq_timer_update(PITChannelState *s, int64_t current_time);
  
 +static uint32_t pit_irq_coalesced;

The pit has 3 channels, it should be a channel field.

Also every time the pit frequency changes the above field should be
compensated with * (new_freq/old_freq). 
For example, if the guest was running with 1000hz clock and the
pit_irq_coalesced value is 100 currently, a frequency change to 100hz
should reduce pit_irq_coalesced to 10.

Except that, its high time we stop drifting :)

 +
  static int pit_get_count(PITChannelState *s)
  {
  uint64_t d;
 @@ -369,12 +371,28 @@ static void pit_irq_timer_update(PITChannelState *s, 
 int64_t current_time)
  return;
  expire_time = pit_get_next_transition_time(s, current_time);
  irq_level = pit_get_out1(s, current_time);
 -qemu_set_irq(s-irq, irq_level);
 +if(irq_level) {
 +if(!qemu_irq_raise(s-irq))
 +pit_irq_coalesced++;
 +} else {
 +qemu_irq_lower(s-irq);
 +if(pit_irq_coalesced  0) {
 +if(qemu_irq_raise(s-irq))
 +pit_irq_coalesced--;
 +qemu_irq_lower(s-irq);
 +}
 +}
 +
  #ifdef DEBUG_PIT
  printf(irq_level=%d next_delay=%f\n,
 irq_level,
 (double)(expire_time - current_time) / ticks_per_sec);
  #endif
 +if(pit_irq_coalesced  expire_time != -1) {
 +uint32_t div = ((pit_irq_coalesced  10)  0x7f) + 2;
 +expire_time -= ((expire_time - current_time) / div);
 +}
 +
  s-next_transition_time = expire_time;
  if (expire_time != -1)
  qemu_mod_timer(s-irq_timer, expire_time);
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix block mode during halt emulation

2008-06-30 Thread Dor Laor
From d85feaae019bc0abc98a2524369e04d521a78aa8 Mon Sep 17 00:00:00 2001
From: Dor Laor [EMAIL PROTECTED]
Date: Mon, 30 Jun 2008 18:22:44 -0400
Subject: [PATCH] Fix block mode hduring halt emulation

There is no need to check for pending pit/apic timer, nor
pending virq, since all of the check KVM_MP_STATE_RUNNABLE
and wakeup the waitqueue.

It fixes 100% cpu when windows guest is shutdown (non acpi HAL)

Signed-off-by: Dor Laor [EMAIL PROTECTED]
---
 virt/kvm/kvm_main.c |4 
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b90da0b..faa0778 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -816,10 +816,6 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
for (;;) {
prepare_to_wait(vcpu-wq, wait, TASK_INTERRUPTIBLE);
 
-   if (kvm_cpu_has_interrupt(vcpu))
-   break;
-   if (kvm_cpu_has_pending_timer(vcpu))
-   break;
if (kvm_arch_vcpu_runnable(vcpu))
break;
if (signal_pending(current))
-- 
1.5.4


0001-Fix-block-mode-during-halt-emulation.patch
Description: application/mbox


Re: Sharing variables/memory between host and guest ?

2008-07-12 Thread Dor Laor

Arn wrote:

How can one share memory (a few variables not necessarily a page)
between host/hypervisor and guest VM ?
Since the guest is just a process within the host, there should be
existing ways to do this.
  
It's not that straight forward since the host has its pfn (page frame 
number) while the guest has gfn (guest frame number) and also use 
virtual memory.



What about using something like debugfs or sysfs, is it possible to
share variables this way ? Note, I want a system that
is fast, i.e. changes to shared variable/memory should be visible instantly.

  
A paravirtualized driver can take care of that with driver in the guest 
and device side in qemu/host kernel.
You can use 9p virtio solution in Linux that implements a shared file 
system.

I search the kvm-devel archives and found emails referring to kshmem
 but a search on the kvm-70 code turns up nothing.
There are also some emails on sharing a page but no final outcome or
what exactly to do.

Thanks
Arn
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rtl8139 stop working under high load

2008-07-12 Thread Dor Laor

Farkas Levente wrote:

hi,
i'm just switch to the use rtl8139 network emulator in kvm-70 for the 
guests, but under high load it's simple stop working. a reboot or even a

service network restart
solve the problem, but imho there should have to be some bug in the 
qemu's rtl8139 code. and there is not any kind of error in any log 
(neither the host's nor the guest's).
this not happened with e1000. but with e1000 the network sometimes 
seems to breathing (sometimes slow down and then speed up again).

do currently which is the preferred network network device in qemu/kvm?
thanks.

I think rtl8139 is the most stable, and afterwards virtio and e1000 
(which both perform much better too).

Maybe it's irq problem. Can you run the same test with -no-kvm-irqchip ?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] kvm-autotest

2008-07-15 Thread Dor Laor




It's definitely worth looking at the autotest server code/samples.
There exists code in-tree already to build an deploy kvm via autotest
server mode which a single machine can drive the building, installing,
creation of guests on N number of clients, directing each guest
image to run various autotest client tests, collecting all of the
results.

See autotest/server/samples/*kvm*

A proper server setup is a little involved[1] but much more streamlined
these days.



Let's think of a guest-installation test. Would you implement it on 
the server or on the client ?

What do you plan for non-linux guests ?

We'll try this little exercise of writing a kvm-test on the server 
side and on the client side and compare complexity.


Thanks,
Uri.

IMHO we need a mixture:
- kvm/environment setup
 autoserve tests/deploy
- Internal guest tests
 Implemented as client test, executed from the server. Composed of 
benchmarks, standard functionality, applications, unit tests, etc.

- guest installation, guest boot
 client  test that  execute on the kvm host

Regards,
Dor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: best practices for kvm setup?

2008-07-15 Thread Dor Laor

Rik Theys wrote:

Hi,

I'm looking into virtualizing some of our servers onto two (or more) 
physical nodes with either KVM or Xen. What are the 'best practices' 
for running virtual _servers_ with KVM? Any good/bad experiences with 
running KVM for virtual servers that have to run for months on end?


I've installed ubuntu 8.04 because it should have KVM as the default 
virtualization tool and is the only 'enterprise' distribution with kvm 
right now. I used one host to act as an iSCSI target and installed 
ubuntu with KVM on two other nodes. I can create a virtual server with 
virt-manager, but it seems live migration is not (yet) supported by 
libvirt/virsh? So how are other people running their KVM virtual 
servers? Do you create a script for each virtual server and invoke kvm 
directly? How do you do the live migration then? Launch the script 
with an 'incoming' parameter on the target host, and run the migrate 
command manually? 
If libvirt does not support migration than you'll need to automate it 
yourself, we use a daemon to exec/migrate VMs. AFAIK, except for 
libvirt* there is no other free tool for it.
Or is there an other (automated) way? I once tried the live migration 
on a test host and if I recall correctly, the kvm process kept on 
running on the source host even after the server was migrated to the 
target? Is that the expected behaviour?


This is works-as-designed, the idea is that a 3rd party mgmt tool get 
the result of the migration process and closes one of the 
source/destination. Without 3rd party, the destination cannot continue 
the source got end-of-migration message and the opposite on failure.


What type of shared storage is best used with KVM (or Xen for that 
matter)? Our physical servers will be connected to a SAN. Should I 
create volumes on my san and export them to my physical servers where 
I can then use them as /dev/by-id/xxx disk in my KVM configs? Of 
should I configure my two servers into a GFS cluster and use files as 
backend for my KVM virtual machines? What are you using as shared 
storage?


We use NFS and it works pretty well, your proposals are also valid 
options. Just make sure an image is not accessed in parallel by 2 hosts.

Regards,

Rik


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm: Unknown error 524, Fail to handle apic access vmexit

2008-07-15 Thread Dor Laor

Martin Michlmayr wrote:

I installed a Windows XP SP2 guest on a Debian x86_64 host The
installation itself went fine but kvm aborts when when XP starts
during Windows XP Setup.  XP mentions something with intelppm.sys
(see the attached screenshot) and kvm says:

kvm_run: Unknown error 524
kvm_run returned -524

  
It's a FlexPriority bug, while it should be solved, you can disable it 
by using kvm-intel module parameter.

In dmesg, I see:

[ 8891.352876] Fail to handle apic access vmexit! Offset is 0xf0

This happens with kvm 70, and kernel 2.6.25 and 2.6.26-rc9.

Someone else reported a similar problem before but there was no
response:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg12111.html

  







--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm: Unknown error 524, Fail to handle apic access vmexit

2008-07-16 Thread Dor Laor

Yang, Sheng wrote:

On Tuesday 15 July 2008 23:19:07 Dor Laor wrote:
  

Martin Michlmayr wrote:


I installed a Windows XP SP2 guest on a Debian x86_64 host The
installation itself went fine but kvm aborts when when XP starts
during Windows XP Setup.  XP mentions something with
intelppm.sys (see the attached screenshot) and kvm says:

kvm_run: Unknown error 524
kvm_run returned -524
  

It's a FlexPriority bug, while it should be solved, you can disable
it by using kvm-intel module parameter.




Dor, are you sure it's a FlexPriority bug? 

  
Well, I'm not sure it's the FlexPriority's fault, it's just when it is 
disabled it does not happen and I saw the apic

access. It can be miss emulation too.
It happened to me on ~ kvm-69
If you look at where is the complain, you would find there is a result 
of emulate_instruction().


And you will find a clearly emulation failed (mmio) rip 7cb3d000 ff 
ff 8d 85 in the bug tracker Martin metioned above the Fail to 
handle apic access vmexit! Offset is 0xf0(Spurious Interrupt Vector 
Register).


I don't think ff ff 8d 85 is a vaild opcode for that case.

Maybe it's a regression? The last report is long ago...

Hi Martin, can you show more dmesg here? And if it can be reproduce 
stable?


Thanks.

  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm guest loops_per_jiffy miscalibration under host load

2008-07-22 Thread Dor Laor

Marcelo Tosatti wrote:

On Tue, Jul 22, 2008 at 10:22:00AM +0200, Jan Kiszka wrote:
  

The in-kernel PIT rearms relative to host clock, so the frequency is
more reliable (next_expiration = prev_expiration + count).
  

The same happens under plain QEMU:

static void pit_irq_timer_update(PITChannelState *s, int64_t current_time);

static void pit_irq_timer(void *opaque)
{
PITChannelState *s = opaque;

pit_irq_timer_update(s, s-next_transition_time);
}



True. I misread current_time.

  

To my experience QEMU's PIT is suffering from lost ticks under load
(when some delay gets larger than 2*period).



Yes, with clock=pit on RHEL4 its quite noticeable. Even with -tdf. The
  
Note that -tdf works only when you use userspace irqchip too, then it 
should work.

in-kernel timer seems immune to that under the load I was testing.

  
In the long run we should try to remove the in kernel pit. Currently it 
does handle pit

irq coalescing problem that leads to time drift.
The problem is that its not yet 100% production level, migration with it 
has some issues and
basically we should try not to duplicate userspace code unless there is 
no good reason (like performance).


There are floating patches by Glen Natapov for the pit and virtual rtc 
to prevent time drifts.

Hope they'll get accepted by qemu.

I recently played a bit with QEMU new icount feature. Than one tracks
the guest progress based on a virtual instruction pointer, derives the
QEMU's virtual clock from it, but also tries to keep that clock in sync
with the host by periodically adjusting its scaling factor (kind of
virtual CPU frequency tuning to keep the TSC in sync with real time).
Works quite nicely, but my feeling is that the adjustment is not 100%
stable yet.

Maybe such pattern could be applied on kvm as well with tsc_vmexit -
tsc_vmentry serving as guest progress counter (instead of icount which
depends on QEMU's code translator).



I see. Do you have patches around?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] Remove -tdf

2008-07-22 Thread Dor Laor

Anthony Liguori wrote:

The last time I posted the KVM patch series to qemu-devel, the -tdf patch met 
with
some opposition.  Since today we implement timer catch-up in the in-kernel PIT 
and
the in-kernel PIT is used by default, it doesn't seem all that valuable to have
timer catch-up in userspace too.

Removing it will reduce our divergence from QEMU.

  
IMHO the in kernel PIT should go away, there is no reason to keep it 
except that userspace PIT drifts.
Currently both in-kernel PIT and even the in kernel irqchips are not 
100% bullet proof.
Of course this code is a hack, Gleb Natapov has send better fix for 
PIT/RTC to qemu list.

Can you look into them:
http://www.mail-archive.com/kvm@vger.kernel.org/msg01181.html

Thanks, Dor

Signed-off-by: Anthony Liguori [EMAIL PROTECTED]

diff --git a/qemu/hw/i8254.c b/qemu/hw/i8254.c
index 69eb889..d0394c0 100644
--- a/qemu/hw/i8254.c
+++ b/qemu/hw/i8254.c
@@ -332,11 +332,6 @@ static uint32_t pit_ioport_read(void *opaque, uint32_t 
addr)
 return ret;
 }
 
-/* global counters for time-drift fix */

-int64_t timer_acks=0, timer_interrupts=0, timer_ints_to_push=0;
-
-extern int time_drift_fix;
-
 static void pit_irq_timer_update(PITChannelState *s, int64_t current_time)
 {
 int64_t expire_time;
@@ -347,24 +342,6 @@ static void pit_irq_timer_update(PITChannelState *s, 
int64_t current_time)
 expire_time = pit_get_next_transition_time(s, current_time);
 irq_level = pit_get_out1(s, current_time);
 qemu_set_irq(s-irq, irq_level);
-if (time_drift_fix  irq_level==1) {
-/* FIXME: fine tune timer_max_fix (max fix per tick). 
- *Should it be 1 (double time), 2 , 4, 10 ? 
- *Currently setting it to 5% of PIT-ticks-per-second (per PIT-tick)

- */
-const long pit_ticks_per_sec = (s-count0) ? (PIT_FREQ/s-count) : 0;
-const long timer_max_fix = pit_ticks_per_sec/20;
-const long delta = timer_interrupts - timer_acks;
-const long max_delta = pit_ticks_per_sec * 60; /* one minute */
-if ((delta   max_delta)  (pit_ticks_per_sec  0)) {
-printf(time drift is too long, %ld seconds were lost\n, 
delta/pit_ticks_per_sec);
-timer_acks = timer_interrupts;
-timer_ints_to_push = 0;
-} else if (delta  0) {
-timer_ints_to_push = MIN(delta, timer_max_fix);
-}
-timer_interrupts++;
-}
 #ifdef DEBUG_PIT
 printf(irq_level=%d next_delay=%f\n,
irq_level,
diff --git a/qemu/hw/i8259.c b/qemu/hw/i8259.c
index b266119..1707434 100644
--- a/qemu/hw/i8259.c
+++ b/qemu/hw/i8259.c
@@ -221,35 +221,18 @@ static inline void pic_intack(PicState *s, int irq)
 } else {
 s-isr |= (1  irq);
 }
-
 /* We don't clear a level sensitive interrupt here */
 if (!(s-elcr  (1  irq)))
 s-irr = ~(1  irq);
-
 }
 
-extern int time_drift_fix;

-
 int pic_read_irq(PicState2 *s)
 {
 int irq, irq2, intno;
 
 irq = pic_get_irq(s-pics[0]);

 if (irq = 0) {
-
 pic_intack(s-pics[0], irq);
-#ifndef TARGET_IA64
-   if (time_drift_fix  irq == 0) {
-   extern int64_t timer_acks, timer_ints_to_push;
-   timer_acks++;
-   if (timer_ints_to_push  0) {
-   timer_ints_to_push--;
-/* simulate an edge irq0, like the one generated by i8254 */
-pic_set_irq1(s-pics[0], 0, 0);
-pic_set_irq1(s-pics[0], 0, 1);
-   }
-   }
-#endif
 if (irq == 2) {
 irq2 = pic_get_irq(s-pics[1]);
 if (irq2 = 0) {
diff --git a/qemu/vl.c b/qemu/vl.c
index 19c8bbf..d6877cd 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -229,7 +229,6 @@ const char *option_rom[MAX_OPTION_ROMS];
 int nb_option_roms;
 int semihosting_enabled = 0;
 int autostart = 1;
-int time_drift_fix = 0;
 unsigned int kvm_shadow_memory = 0;
 const char *mem_path = NULL;
 int hpagesize = 0;
@@ -7968,7 +7967,6 @@ static void help(int exitcode)
 #ifndef _WIN32
   -daemonize  daemonize QEMU after initializing\n
 #endif
-   -tdfinject timer interrupts that got lost\n
-kvm-shadow-memory megs set the amount of shadow pages to be 
allocated\n
-mem-path   set the path to hugetlbfs/tmpfs mounted directory, also 
enables allocation of guest memory with huge pages\n
   -option-rom rom load a file, rom, into the option ROM space\n
@@ -8089,7 +8087,6 @@ enum {
 QEMU_OPTION_tb_size,
 QEMU_OPTION_icount,
 QEMU_OPTION_incoming,
-QEMU_OPTION_tdf,
 QEMU_OPTION_kvm_shadow_memory,
 QEMU_OPTION_mempath,
 };
@@ -8202,7 +8199,6 @@ const QEMUOption qemu_options[] = {
 #if defined(TARGET_ARM) || defined(TARGET_M68K)
 { semihosting, 0, QEMU_OPTION_semihosting },
 #endif
-{ tdf, 0, QEMU_OPTION_tdf }, /* enable time drift fix */
 { kvm-shadow-memory, HAS_ARG, QEMU_OPTION_kvm_shadow_memory },
 { name, HAS_ARG, QEMU_OPTION_name },
 #if 

Re: [PATCH 2/2] Remove -tdf

2008-07-24 Thread Dor Laor

Anthony Liguori wrote:

Gleb Natapov wrote:

On Tue, Jul 22, 2008 at 08:20:41PM -0500, Anthony Liguori wrote:
 
Currently both in-kernel PIT and even the in kernel irqchips are 
not  100% bullet proof.
Of course this code is a hack, Gleb Natapov has send better fix 
for  PIT/RTC to qemu list.

Can you look into them:
http://www.mail-archive.com/kvm@vger.kernel.org/msg01181.html
  
Paul Brook's initial feedback is still valid.  It causes quite a lot 
of  churn and may not jive well with a virtual time base.  An 
advantage to  the current -tdf patch is that it's more contained.  I 
don't think  either approach is going to get past Paul in it's 
current form.


Yes, my patch causes a lot of churn because it changes widely used API.
  


Indeed.


But the time drift fix itself is contained to PIT/RTC code only. The
last patch series I've sent disables time drift fix if virtual time base
is enabled as Paul requested. There was no further feedback from him.
  


I think there's a healthy amount of scepticism  about whether tdf 
really is worth it.  This is why I suggested that we need to better 
quantify exactly how much this patch set helps things.  For instance, 
a time drift test for kvm-autotest would be perfect.


tdf is ugly and deviates from how hardware works.  A compelling case 
is needed to justify it.


We'll add time drift tests to autotest the minute it starts to run 
enough interesting tests/loads.

In our private test platform we use a simple scenario to test it:
1. Use windows guest and play a movie (changes rtc on acpi win/pit on 
-no-acpi win freq to 1000hz).

2. Pin the guest to a physical cpu + load the same cpu.
3. Measure a minute in real life vs in the guest.

Actually the movie seems to be more smooth without time drift fix. When 
fixing irqs some times the player needs to cope with too rapid changes. 
Anyway the main focus is time accuracy and not smoother movies.


In-kernel pit does relatively good job for Windows guests, the problem 
its not yet 100% stable and also we can do it in userspace and the rtc 
needs a solution too.

As Jan Kiszka wrote in one of his mails may be Paul's virtual time base
can be adopted to work with KVM too. BTW how virtual time base handles
SMP guest?
  


I really don't know.  I haven't looked to deeply at the virtual time 
base.  Keep in mind though, that QEMU SMP is not true SMP.  All VCPUs 
run in lock-step.


Regards,

Anthony Liguori

Also, it's important that this is reproducible in upstream QEMU and 
not  just in KVM.  If we can make a compelling case for the 
importance of  this, we can possibly work out a compromise.




I developed and tested my patch with upstream QEMU.

--
Gleb.
  




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Live Migration, DRBD

2008-07-24 Thread Dor Laor

Kent Borg wrote:

I am very happy to discover that KVM does live migration.  Now I am
figuring out whether it will work for me. 


What I have in mind is to use DRBD for the file system image.  The
problem is that during the migration I want to shift the file system
access at the moment when the VM has quit running on the host it is
leaving but before it starts running on the host where it is arriving. 
Is there a hook to let me do stuff at this point?


This is what I want to do:

On the departing machine...

  - VM has stopped here
  - umount the volume with the VM file system image
  - mark volume in DRDB as secondary


On the arriving machine...

  - mark volume in DRBD as primary
  - mount the volume with the VM file system image
  - VM can now start here


Is there a way?

  
No, but one can add such pretty easy. The whole migration code is in one 
file qemu/migration.c
You can add a parameter to qemu migration command to specify a script 
that should be called on

migration end event (similar to the tap script).

Thanks,

-kb
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: scsi broken 4GB RAM

2008-07-24 Thread Dor Laor

Martin Maurer wrote:

Using IDE boot disk, no problem. Win2008 (64bit) works without any problems - 6 
gb ram in the guest.

After successful booting IDE, I added a second disk using SCSI: windows see the 
disk but cannot initialize the disk.

So SCSI looks quite unusable if you run windows guest (win2003 sp2 also stops 
during install), or should we load any SCSI driver during setup? Win2008 uses 
LSI Logic 8953U PCI SCSI Adapter, 53C895A Device (LSI Logic Driver 4.16.6.0, 
signed)

Any other expierences running SCSI on windows?

  

You're right, its broken right now :(
At least ide is stable.

Best Regards,

Martin

  

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Martin Maurer
Sent: Donnerstag, 24. Juli 2008 11:46
To: kvm@vger.kernel.org
Subject: RE: scsi broken  4GB RAM

Sorry, just returned to the installer - also stopped with the same
error code, using just 2 gb ram.

Best Regards,

Martin Maurer

[EMAIL PROTECTED]
http://www.proxmox.com


Proxmox Server Solutions GmbH
Kohlgasse 51/10, 1050 Vienna, Austria
Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22
Commercial register no.: FN 258879 f
Registration office: Handelsgericht Wien




-Original Message-
From: Martin Maurer
Sent: Donnerstag, 24. Juli 2008 11:44
To: kvm@vger.kernel.org
Subject: RE: scsi broken  4GB RAM

Hi,

I tried windows server 2008 (64 bit) on Proxmox VE 0.9beta2 (KVM 71),
see http://pve.proxmox.com):

Some details:
--memory 6144 --cdrom
en_windows_server_2008_datacenter_enterprise_standard_x64_dvd_X14-
26714.iso --name win2008-6gb-scsi --smp 1 --bootdisk scsi0 --scsi0 80

The installer shows 80 GB harddisk but freezes after clicking next
  

for


a minute then:

Windows could not creat a partition on disk 0. The error occurred
while preparing the computer´s system volume. Error code:
  

0x8004245F.


I also got installer problems if I just use scsi as boot disk (no
  

high


memory) on several windows versions, including win2003 and xp. So I
decided to use IDE, works without any issue on windows.

But: I reduced the memory to 2048 and the installer continues to
  

work!


Best Regards,

Martin Maurer

[EMAIL PROTECTED]
http://www.proxmox.com


Proxmox Server Solutions GmbH
Kohlgasse 51/10, 1050 Vienna, Austria
Phone: +43 1 545 4497 11 Fax: +43 1 545 4497 22
Commercial register no.: FN 258879 f
Registration office: Handelsgericht Wien


  

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]


On


Behalf Of Henrik Holst
Sent: Mittwoch, 23. Juli 2008 23:09
To: kvm@vger.kernel.org
Subject: scsi broken  4GB RAM

I do not know if this is a bug in qemu or the linux kernel


sym53c8xx


module (I haven't had the opportunity to test with anything other


than
  

Linux at the moment) but if one starts an qemu instance with -m


4096


and larger the scsi emulated disk fails in the Linux guest.

If booting any install cd the /dev/sda is seen as only 512B in size
and if booting an ubuntu 8.04-amd64 with the secondary drive as


scsi


it is seen with the correct size but one cannot read not write the
partition table.

Is there anyone out there that could test say a Windows image on


scsi


with 4GB or more of RAM and see if it works or not? If so it could


be


the linux driver that is faulty.

/Henrik Holst
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/9] kvm: qemu: Drop the mutex while reading from tapfd

2008-07-24 Thread Dor Laor

Mark McLoughlin wrote:

The idea here is that with GSO, packets are much larger
and we can allow the vcpu threads to e.g. process irq
acks during the window where we're reading these
packets from the tapfd.

Signed-off-by: Mark McLoughlin [EMAIL PROTECTED]
---
 qemu/vl.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/qemu/vl.c b/qemu/vl.c
index efdaafd..de92848 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -4281,7 +4281,9 @@ static void tap_send(void *opaque)
sbuf.buf = s-buf;
s-size = getmsg(s-fd, NULL, sbuf, f) =0 ? sbuf.len : -1;
 #else
  

Maybe do it only when GSO is actually used by the guest/tap.
Otherwise it can cause some ctx trashing right?

+   kvm_sleep_begin();
s-size = read(s-fd, s-buf, sizeof(s-buf));
+   kvm_sleep_end();
 #endif
 
 	if (s-size == -1  errno == EINTR)
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/9] kvm: qemu: Remove virtio_net tx ring-full heuristic

2008-07-24 Thread Dor Laor

Mark McLoughlin wrote:

virtio_net tries to guess when it has received a tx
notification from the guest whether it indicates that the
guest has no more room in the tx ring and it should
immediately flush the queued buffers.

The heuristic is based on the fact that there are 128
buffer entries in the ring and each packet uses 2 buffers
(i.e. the virtio_net_hdr and the packet's linear data).

Using GSO or increasing the size of the rings will break
that heuristic, so let's remove it and assume that any
notification from the guest after we've disabled
notifications indicates that we should flush our buffers.

Signed-off-by: Mark McLoughlin [EMAIL PROTECTED]
---
 qemu/hw/virtio-net.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index 31867f1..4adfa42 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -175,8 +175,7 @@ static void virtio_net_handle_tx(VirtIODevice *vdev, 
VirtQueue *vq)
 {
 VirtIONet *n = to_virtio_net(vdev);
 
-if (n-tx_timer_active 

-   (vq-vring.avail-idx - vq-last_avail_idx) == 64) {
+if (n-tx_timer_active) {
vq-vring.used-flags = ~VRING_USED_F_NO_NOTIFY;
qemu_del_timer(n-tx_timer);
n-tx_timer_active = 0;
  
Actually we can improve latency a bit more by using this timer only for 
high throughput
scenario. For example, if during the previous timer period no/few 
packets were accumulated,
we can set the flag off and not issue new timer. This way we'll get 
notified immediately without timer
latency. When lots of packets will be transmitted, we'll go back to this 
batch mode again.

Cheers, Dor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/3] fix PIT injection

2008-07-29 Thread Dor Laor

Marcelo Tosatti wrote:

The in-kernel PIT emulation can either inject too many or too few
interrupts.

  
While it's an improvement, the in-kernel pit is still not perfect. For 
example, on pit frequency changes the
pending count should be recalculated and matched to the new frequency. I 
also tumbled on live migration problem

and there is your guest smp fix.
IMHO we need to switch back to userspace pit. [Actually I did consider 
in-kernel pit myself in the past.]. The reasons:

1. There is no performance advantage doing this in the kernel.
   It's just potentially reduced the host stability and reduces code
2. There are floating patches to fix pit/rtc injection in the same way 
the acked irq is sone here.

   So the first 2 patches are relevant.
3. Will we do the same for rtc? - why duplicate userspace code in the 
kernel?
   We won't have smp issues since we have qemu_mutex and it will be 
simpler too.


If you agree, please help merging the qemu patches.
Otherwise argue against the above :)

Cheers, Dor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] reserved-ram for pci-passthrough without VT-d capable hardware

2008-07-30 Thread Dor Laor

Andrea Arcangeli wrote:

On Wed, Jul 30, 2008 at 11:50:43AM +0530, Amit Shah wrote:
  

* On Tuesday 29 July 2008 18:47:35 Andi Kleen wrote:


I'm not so interested to go there right now, because while this code
is useful right now because the majority of systems out there lacks
VT-d/iommu, I suspect this code could be nuked in the long
run when all systems will ship with that, which is why I kept it all


Actually at least on Intel platforms and if you exclude the lowest end
VT-d is shipping universally for quite some time now. If you
buy a Intel box today or bought it in the last year the chances are pretty
high that it has VT-d support.
  
I think you mean VT-x, which is virtualization extensions for the x86 
architecture. VT-d is virtualization extensions for devices (IOMMU).



I think Andi understood VT-d right but even if he was right that every
reader of this email that is buying a new VT-x system today is also
almost guaranteed to get a VT-d motherboard (which I disagree unless
you buy some really expensive toy), there are current large
installations of VT-x systems that lacks VT-d and that with recent
current dual/quadcore cpus are very fast and will be used for the next
couple of years and they will not upgrade just the motherboard to use
pci-passthrough.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


In addition KVM is used in embedded too and things are slower there, we 
know of a specific use case (production) that demands

1:1 mapping and can't use VT-d
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issues while Debugging Windows Kernel running on KVM

2008-08-21 Thread Dor Laor

Can you try http://kvm.qumranet.com/kvmwiki/WindowsGuestDebug
You can use windows host as a VM too.
Since (in the past) there was a problem with the virtual serial polling 
you can use -no-kvm and the

qemu patch, as described in the wiki.
Good luck, Dor.

Muppana, Bhaskar wrote:

Hi,

I am facing issues while trying to debug Windows XP kernel running on
top of Linux KVM. 


I have to debug Windows XP kernel running in a VM. I have dedicated
ttyS0 on the host to the guest. I am using the following command to
bring up Windows VM.


/usr/local/kvm/bin/qemu-system-x86_64 \
  -hda /opt/vdisk.img \
  -boot c \
  -m 512 \
  -net nic,model=rtl8139,macaddr=52:54:00:12:34:56 \
  -net tap,ifname=qtap0,script=no \
  -smp 1 \
  -usb \
  -usbdevice tablet \
  -localtime \
  -serial /dev/ttyS0 




I have another machine, running Windows XP, connected to the Linux host
through serial cable. 



 |Windows |
 |  VM| 
 |(target)|

 --- 
|  Windows Host | - | Linux with KVM |
 --- 

I am able to send messages between Windows host and target through
serial ports (tested using windows power shell). But, I am not able to
use Win DBG (Kernel Debugger) in host to connect to target. Target is
getting stuck while booting. 


Debug enabled Windows entry in boot.ini:

multi(0)disk(0)rdisk(0)partition(1)\WINDOWS=Microsoft Windows XP
Professional /fastdetect /debugport=COM1 /baudrate=115200


Can someone please help me regarding this?



Thanks,
Bhaskar



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reserving CPU resources for a KVM guest

2008-08-23 Thread Dor Laor

Yuksel Gunal wrote:

Hi,

I have been playing with KVM and was wondering about the following 
question: is there a resource configuration setting that would enforce 
a fraction of CPU to be guaranteed for a KVM guest?  What I have on 
mind is something similar to the reservation setting on VMware (used 
to be called minimum CPU), which guarantees a number of CPU cycles to 
a VM.  Also, any configuration setting similar to CPU/Memory Shares 
setting in VMware, which will kick in under contention for resources?


VM is like any other process in Linux, you can use cpu controller, 
cgroups or any other scheduling option for your VMs.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: paravirtualized windows net driver stop after some days on XP guest

2008-08-26 Thread Dor Laor

Can you please try an update version of the windows drivers?
I also added a dummy installer you can use too:

http://kvm.qumranet.com/kvmwiki/VirtioWindowsDrivers

Regards,
Dor

Yann Dupont wrote:

Hello. I'm using kvm whit great succes for various OS. Very good job.

In June I started using paravirtualized drivers.

Since that we encountered sporadic loss of connectivity on some Xp
guests after some days of uptime.
This was with KVM 70. I upgraded to KVM 73 4 days ago, and this morning
1 off my Xp guests had no connectivity.

Putting the interface off then on via panel control  revives network
instantly.  Seems like a bug on the windows driver side .
This is occurring on 2 Xp guests, they have moderate to low network load.

They have 1 CPU , the hal is the non acpi one (because they were
installed in KVM-23 timeframe)

I also have 2003 guests, and so far I haven't encountered the problem.

Also have linux guests, with net AND disk virtio , AND high load without
problem.
Best Regards,



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: paravirtualized windows net driver for vista does not work on windows 2008 (64-bit)

2008-09-10 Thread Dor Laor

Sorry for that, seems like there where some instructions missing.
Since we did not (yet  soon) sign the drivers you need to install a 
certificate workaround manually:


There are 2 things to do on 64-bit before installation.
1. Install certificate using installcertificate.bat
2. If Test mode does not appear on the screen, run bcdedit /set 
testsigning on and reboot


The system diagnostic, related to installation, on 2008 is in 
%windir%\inf\setupapi.dev.log
Please compress the file and send, if both 2 things done but the install 
does not work.


Regards,
Dor

Martin Maurer wrote:

Hi all,

I tried to use the vista virtio driver on win2008 (64-bit) but the install 
failed, I got this in the windows event log:
I am working on a Debian Etch 64 bit Kernel 2.6.24 with KVM 74 (internal testing Kernel of http://pve.proxmox.com) 


I used the following driver: http://people.qumranet.com/dor/Drivers-0-3107.iso

___
Log Name:  Security
Source:Microsoft-Windows-Security-Auditing
Date:  09.09.2008 17:06:20
Event ID:  5038
Task Category: System Integrity
Level: Information
Keywords:  Audit Failure
User:  N/A
Computer:  WIN-0Z71CK0XVXP
Description:
Code integrity determined that the image hash of a file is not valid.  The file 
could be corrupt due to unauthorized modification or the invalid hash could 
indicate a potential disk device error.
File Name: \Device\HarddiskVolume1\Windows\System32\drivers\kvmnet6.sys 
Event Xml:

Event xmlns=http://schemas.microsoft.com/win/2004/08/events/event;
  System
Provider Name=Microsoft-Windows-Security-Auditing 
Guid={54849625-5478-4994-a5ba-3e3b0328c30d} /
EventID5038/EventID
Version0/Version
Level0/Level
Task12290/Task
Opcode0/Opcode
Keywords0x8010/Keywords
TimeCreated SystemTime=2008-09-09T15:06:20.562Z /
EventRecordID364/EventRecordID
Correlation /
Execution ProcessID=4 ThreadID=88 /
ChannelSecurity/Channel
ComputerWIN-0Z71CK0XVXP/Computer
Security /
  /System
  EventData
Data 
Name=param1\Device\HarddiskVolume1\Windows\System32\drivers\kvmnet6.sys/Data
  /EventData
/Event

Best Regards,

Martin Maurer

[EMAIL PROTECTED]
http://pve.proxmox.com


  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: paravirtualized windows net driver for vista does not work on windows 2008 (64-bit)

2008-09-11 Thread Dor Laor


 Maurer] 

YES, working! 


Testing again (I already have now a KVM 75, but I assume this does not make any 
difference here).

I followed your instructions, the driver installed without any warning as 
expected after installing the certificate.

The only issue: the connection shows only 100mbit - after changing this via the 
windows device manager, the 1 GBIT is up. Default should be 1 Gbit, is this 
possible? I assume a lot of people forget about changing this and then they got 
bad performance due to 100mbit.
  

The 100mb is not the bandwidth limitation. Nevertheless is should change.
The only worry is that in order to certify (Microsoft sign) the drivers 
I was told that 1Gb device needs to support 802.1q.
It might require some simple qemu virtio changes like like vlan 
filtering and tag on/off options.


Regards,
Dor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvmnet.sys BSOD w/ WinXP...

2008-09-17 Thread Dor Laor

Daniel J Blueman wrote:

When using Windows XP 32 installed with TCP/IP and microsoft client
networking, I can reproduce an intermittent BSOD [1] with kvmnet.sys
1.0.0 and 1.2.0, by aborting a large data transfer in an application.

Since this reproduces with 1.0.0 kvmnet.sys, it looks unrelated to the
locking changes that went into 1.2.0, but something relating to when
sockets are closed, flushed or data discarded.

Perhaps the offset into the driver at 0xF761A5A9 - 0xF7618000 may tell
us what is needed to reproduce and hint at what area the fix is needed
in?

Many thanks,
  Daniel

--- [1]

DRIVER_IRQL_NOT_LESS_OR_EQUAL

*** STOP: 0x00D1 (0x001C,0x0002,0x,0xF761A5A9)
***   kvmnet.sys - Address F761A5A9 base at F7618000, DateStamp 47dd531c
  


Can you try http://people.qumranet.com/dor/Drivers-0-3107.iso this?
Also please provide the specific way of producing load.
Along with it, please note kernel version, kvm version, qemu cmd line.

Regards,
Dor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm 76 - open /dev/kvm: No such device or address

2008-10-07 Thread Dor Laor

Matias Aguirre wrote:

Hi all,

Im using 2.6.26.5 kernel and slackware-current distribution. I was 
compiled the latest 76 version of kvm and when i run kvm i return this 
error:


open /dev/kvm: No such device or address
Could not initialize KVM, will disable KVM support

The module is already loaded:

# lsmod
Module  Size  Used by
kvm_intel  33984  0
kvm   116156  1 kvm_intel
nvidia   6886800  26

And my CPU have VM support.

# cat /proc/cpuinfo | grep vmx
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx 
lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr lahf_lm
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx 
lm constant_tsc arch_perfmon pebs bts pni monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr lahf_lm


And the file permission:

# dir /dev/kvm
crw-rwxr-- 1 root kvm 250, 0 2008-10-07 18:22 /dev/kvm


Any help?

Thanks


chmod a+wx /dev/kvm will do the trick
Regards, Dor



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [RFC] Disk integrity in QEMU

2008-10-12 Thread Dor Laor

Avi Kivity wrote:

Chris Wright wrote:

I think it's safe to say the perf folks are concerned w/ data integrity
first, stable/reproducible results second, and raw performance third.

So seeing data cached in host was simply not what they expected.  I 
think

write through is sufficient.  However I think that uncached vs. wt will
show up on the radar under reproducible results (need to tune based on
cache size).  And in most overcommit scenarios memory is typically more
precious than cpu, it's unclear to me if the extra buffering is anything
other than memory overhead.  As long as it's configurable then it's
comparable and benchmarking and best practices can dictate best choice.
  


Getting good performance because we have a huge amount of free memory 
in the host is not a good benchmark.  Under most circumstances, the 
free memory will be used either for more guests, or will be given to 
the existing guests, which can utilize it more efficiently than the host.


I can see two cases where this is not true:

- using older, 32-bit guests which cannot utilize all of the cache.  I 
think Windows XP is limited to 512MB of cache, and usually doesn't 
utilize even that.  So if you have an application running on 32-bit 
Windows (or on 32-bit Linux with pae disabled), and a huge host, you 
will see a significant boost from cache=writethrough.  This is a case 
where performance can exceed native, simply because native cannot 
exploit all the resources of the host.


- if cache requirements vary in time across the different guests, and 
if some smart ballooning is not in place, having free memory on the 
host means we utilize it for whichever guest has the greatest need, so 
overall performance improves.




Another justification for ODIRECT is that many production system will 
use the base images for their VMs.
It's mainly true for desktop virtualization but probably for some server 
virtualization deployments.
In these type of scenarios, we can have all of the base image chain 
opened as default with caching for read-only while the

leaf images are open with cache=off.
Since there is ongoing effort (both by IT and developers) to keep the 
base images as big as possible, it guarantees that
this data is best suited for caching in the host while the private leaf 
images will be uncached.
This way we provide good performance and caching for the shared parent 
images while also promising correctness.

Actually this is what happens on mainline qemu with cache=off.

Cheers,
Dor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can I tell KVM is actually using AMD-V virtualization extensions?

2008-10-16 Thread Dor Laor

Veiko Kukk wrote:

Hi!

My desktop machine is HP dc5750 SFF, CPU is AMD Athlon(tm) 64 X2 Dual 
Core Processor 4600+, /proc/cpuinfo lists svm flag. I'm using 2.6.27 
kernel on FC9, qemu-system-x86_64 info version 0.9.1.


How can I be absolutely sure, that my kvm virtual machines are using 
AMD-V?



You can /sbin/lsmod | grep kvm_amd and check for ref count  0.
You can also use dmesg to check kvm messages.

Alternatively, check kvm_stat tool or run
/usr/sbin/lsof -p `pgrep qemu` | grep /dev/kvm

Dor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm XP P2V required ACPI-Standard PC HAL change, keep or revert to ACPI?

2008-10-23 Thread Dor Laor

Jeff Kowalczyk wrote:

I'm running a physical-to-virtual Windows XP Dell OEM instance on Ubuntu
8.04.1 kvm-62 with kvm-intel and bridged networking.

After early BSOD difficulty with the output of VMWare Converter
3.0.3, I did manage to get the XP P2V instance ready to run under
kvm after changing from the Windows XP HAL ACPI to Standard PC in device
manager under VMWare Player.

After a complete redetection of system hardware and resources (perhaps
this was the true reason it started to work), the instance must now be
activated again. It works very well, but must be shut down at the You may
now turn off the PC.

This is a headless kvm server for a few straggle windows apps, and the kvm
instance will seldom be rebooted.

Should I activate as Standard PC, or attempt to convert the HAL back to
ACPI.

  

Basically it should work. Maybe newer kvm will encounter less problems.

Is there still any performance penalty for ACPI with kvm-62?

  
Since we have the tpr optimization it should be fine. Nevertheless we 
did measure about 10%-20% performance

penalty on windows acpi.

What is the kvm shutdown behavior with an ACPI HAL?
  

It should be fine and turn off the process completely.
btw: you can install APM module on the standard HAL too and it power 
down the VM to exit completely too.

Thanks,
Jeff


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MTU on a virtio-net device?

2008-10-23 Thread Dor Laor

Michael Tokarev wrote:

Right now (2.6.27), there's no way to change MTU of a
virtio-net interface, since the mtu-changing method is
not provided.  Is there a simple way to add such a
beast?


It should be a nice easy patch for mtu  4k.
You can just implement a 'change_mtu' handler like:

static int virtio_change_mtu(struct net_device *netdev, int new_mtu)
{
   if(new_mtu  ETH_ZLEN || new_mtu  PAGE_SIZE)
   return -EINVAL;
   netdev-mtu = new_mtu;
   return 0;
}


I'm asking because I'm not familiar with the internals,
and because, I think, increasing MTU (so that the
resulting skb still fits in a single page) will increase
performance significantly, at least on a internal/virtual
network -- currently there are just way too many context
switches and the like while copying data from one guest
to another or between guest and host.

Thanks!

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MTU on a virtio-net device?

2008-10-23 Thread Dor Laor

Michael Tokarev wrote:

Dor Laor wrote:
  

Michael Tokarev wrote:


Dor Laor wrote:
  
  

Michael Tokarev wrote:



Right now (2.6.27), there's no way to change MTU of a
virtio-net interface, since the mtu-changing method is
not provided.  Is there a simple way to add such a
beast?
  
  

It should be a nice easy patch for mtu  4k.
You can just implement a 'change_mtu' handler like:


[]
  

Well, this isn't enough I think.  That is, new_mtu's upper cap should be
less than PAGE_SIZE due to various additional data structures.  But it
is enough to start playing.
  
  

The virtio header is in a separate ring entry so no prob.



virtio header is one thing.  Ethernet frame is another.  And
so on.  From the last experiment (sending 2000bytes-payload
pings resulting in 2008 bytes total, and 528 bytes missing
with original mtu=1500), it seems like the necessary upper
cap is PAGE_SIZE-28.  Or something similar.

Also see receive_skb() routine:

receive_skb(struct net_device *dev, struct sk_buff *skb, unsigned len)
{
  if (unlikely(len  sizeof(struct virtio_net_hdr) + ETH_HLEN)) {
/*drop*/
  }
  len -= sizeof(struct virtio_net_hdr);
  if (len = MAX_PACKET_LEN) {
  ...

So it seems that virtio_net_hdr is in here, just like
ethernet header.

[]
  

So something else has to be changed for this to work, it seems.
  

You're right, this was needs to be changed to:
/* FIXME: MTU in config. */
#define MAX_PACKET_LEN (ETH_HLEN+ETH_DATA_LEN)

You can change it to PAGE_SIZE or have the current mtu.



so s/MAX_PACKET_LEN/dev-mtu/g for the whole driver, it
seems.  Plus/minus sizeof(virtio_net_hdr) - checking this now.
This constant is used in 3 places:

receive_skb(): if (len = MAX_PACKET_LEN) {
 (this one seems to be wrong, but again I don't know much
  internals of all this stuff)
 here, dev-mtu is what we want.

try_fill_recv(): skb = netdev_alloc_skb(vi-dev, MAX_PACKET_LEN);
 here, we don't have dev, but have vi-dev, should be ok too.
try_fill_recv(): skb_put(skb, MAX_PACKET_LEN);
 ditto

  

I was too lazy to write a complete patch.

And by the way, what is big_packets here?
  

It's a bit harder here, IIRC qemu also has a 4k limit.
Not that it can be done in a short period.

Anyway you can use GSO and achieve similar performance.

Ok, so I changed MAX_PACKET_LEN to be PAGE_SIZE (current MTU
seems to be more appropriate but PAGE_SIZE is enough for
testing anyway).  It seems to be working, and network
speed increased significantly with MTU=3500 compared with
former 1500 - it seems it's about 2 times faster (which is
quite expectable, since there's 2x less context switches,
transmissions and the like).

  

I'm asking because I'm not familiar with the internals,
  


Still... ;)

Thanks!

/mjt

  

You seems to be a fast learner :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can we hope a stable version in the near future?

2008-11-19 Thread Dor Laor

Farkas Levente wrote:

Avi Kivity wrote:
  

Farkas Levente wrote:


There is the maint/ series on git.kernel.org.  It doesn't have formal
releases though.



do you plan any formal release? and it'd be nice to see the relationship
between the current devel tree and the stable tree to eg. last stable
0.5 current devel 0.78.
  
  

The key to a formal release is a formal test suite.  We've been building
one (for a long while) but it isn't in production yet.

The plan is for it to be open so people can add their favorite guests,
to ensure they will not regress.



the question is not when but what happened with those bugs which cause
test fail? the problem currently not that we don't know problems, but
there are many known bugs just the reason and the solution not known. so
 test suite can't help too much here (may be find more bugs).

  
Test suite will help since it's job is to run regression tests each 
night or even each commit.

Once a new regression is introduced it will immediately revert it.

Now, when we have only very poor, old regression suite, it does not 
happen so regressions

are detected by users, weeks after being committed.

We'll publish the test suite (based on autotest) next week. The more 
users will use it the better.

Anyway our maintainer will run it each night.

on the other hand the real question are you plan to somehow stabilize
any of the following release in the near future? in the last 1.5 years
we wait for this. or you currently not recommend and not plan to use kvm
in production? it's also an option but would be useful to know. in this
case we (and probably many others) switch to xen, virtualbox, vmware or
anything else as a virtualization platform.
  
  

kvm is used in production on several products.  Just not the kvm-nn
releases I make.  The production versions of kvm are backed by testing,
which makes all the difference.  Slapping a 'stable' label over a
release doesn't make it so.



there are many open source project which has stable and devel
versions:-) actually almost all projects have a stable release along
with the devel version. but kvm has not any in the last few years,
that's why i think it's high time to stabilize 'a' version ie. frozen
feature list and fix all known bugs.
  
You're right about the need for stable release, that's the idea of the 
'maint' branches.
maint/2.6.26 for both kernel and userspace is stable (using userspace 
irqchip).

Now we'll stabilize another user/kernel pair based on 2.6.28

Thanks, Dor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: can we hope a stable version in the near future?

2008-11-19 Thread Dor Laor

Farkas Levente wrote:

Dor Laor wrote:
  

on the other hand the real question are you plan to somehow stabilize
any of the following release in the near future? in the last 1.5 years
we wait for this. or you currently not recommend and not plan to use
kvm
in production? it's also an option but would be useful to know. in this
case we (and probably many others) switch to xen, virtualbox, vmware or
anything else as a virtualization platform.

  

kvm is used in production on several products.  Just not the kvm-nn
releases I make.  The production versions of kvm are backed by testing,
which makes all the difference.  Slapping a 'stable' label over a
release doesn't make it so.



there are many open source project which has stable and devel
versions:-) actually almost all projects have a stable release along
with the devel version. but kvm has not any in the last few years,
that's why i think it's high time to stabilize 'a' version ie. frozen
feature list and fix all known bugs.
  
  

You're right about the need for stable release, that's the idea of the
'maint' branches.
maint/2.6.26 for both kernel and userspace is stable (using userspace
irqchip).
Now we'll stabilize another user/kernel pair based on 2.6.28



that's a good news:-)
but does this means there will be a new kvm-x.y.z release and i can
build the userspace from it _and_ build a kmod for eg. the latest
rhel-5's kernel-2.6.18-92.1.18.el5? ie. i'll be able to install it on
rhel-5 a kvm and kvm-kmod and it'll work? or it'll just run on the not
even released 2.6.28 kernel?
and what is the relationship between maint release and kvm-nn and the
next stable release?
is there a tarball for the current maint release? and the same question
here can i build a kmod and userspace from that for rhel-5?

  

As always you'll have the option of using kvm as a kernel module.
So even if the stable branch is based on 2.6.28, you can always take the
kvm bits through 'make -C kernel sync LINUX=PATH' in the userspace.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1-1 mapping of devices without VT-d

2008-12-01 Thread Dor Laor

Passera, Pablo R wrote:

Hi everyone,
I want to assign a PCI device directly to a VM (PCI passthrough) in a 
machine that does not have VT-d. I found something related with this in a 
presentation done at the 2008 KVM Forum called 1-1 mapping and a patch for this 
at http://thread.gmane.org/gmane.comp.emulators.kvm.devel/18722/focus=18753. I 
am wondering if this is included or are there plans to include it in the latest 
KVM version?

  
Although it had worked for us out of tree, there is no immediate need to 
pursue it.

If anyone would like to nurture these patches he is more than welcome.
ps: you also have pv-dma option for Linux guests (same status though).
As time goes by most host will have either vt-d or amd iommu.

Regards,
Dor


Thanks in advance,

Pablo Pássera

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: STOP error with virtio on KVM-79/2.6.18/Win2k3 x64 guest

2008-12-01 Thread Dor Laor

Adrian Schmitz wrote:

Sorry for the repost.. I forgot the subject line!
Hi, I'm having problems with STOP errors (0x00d1) under
KVM-79/2.6.18 whenever I try to use the virtio drivers. This post
(http://marc.info/?l=kvmm=121089259211638w=2) describes the issue
exactly, except that I'm using a Win2k3 x64 guest with the x64
paravirtual drivers instead of 32-bit guest/drivers. I am able to
reproduce the problem reliably using iperf, the same as in the above
post. When I disable virtio, the guest is very stable. Any suggestions
are greatly appreciated.

  

What driver version are you using? Version 2 is obsolete.
I posted ver 3 few months ago, Avi can you please upload it to sourceforge.
My old public space was blocked so I'll send you a private attachment to 
test.


Dor.

-Adrian
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1-1 mapping of devices without VT-d

2008-12-01 Thread Dor Laor

Michael Tokarev wrote:

Dor Laor wrote:
[]
  

Although it had worked for us out of tree, there is no immediate need to
pursue it.
If anyone would like to nurture these patches he is more than welcome.
ps: you also have pv-dma option for Linux guests (same status though).
As time goes by most host will have either vt-d or amd iommu.



Hmm.  Well, as time goes by, most hosts will be 64 bit or more.
But it does not mean that there's no need to maintain 32bits
arch anymore...  i hope anyway :)

  

But of course

Are you saying that PCI passthrough without hardware support will
not be available in (standard) kvm, even if patches exists for that?

  
No, just might take a some time to go to mainline. Patches need further 
polishing and we

also need wider demand for it.
Actually pvdma can help vt-d so we won't have to make all the guest 
memory unswappable.

/mjt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Virtio network performance problem

2008-12-04 Thread Dor Laor

Adrian Schmitz wrote:

On Wed, Dec 03, 2008 at 11:20:08AM -0800, Chris Wedgwood wrote:

  

TSC instability?  Is this an SMP guest?



Ok, I tried pinning the kvm process to two cores (0,2) on a single
socket, but that didn't seem to make any difference for my virtio
network performance. I also tried pinning the process to a single core,
which also didn't seem to have any effect.

  

I think it is an unsync tsc problem.
First, make sure you pin all of the process threads. There is thread per 
vcpu + io thread +more non relevant.

You can do it by adding the taskset before the cmdline.
Second, you said that you use smp guest. So windows also sees unsync tsc.
So, either test with UP guest or learn how to pin windows receiving ISR, 
DPC and the user app.


Well, testing on Intel or newer AMD is another option.
I tested it again now on Intel with UP guest and there is no such a problem.
Hope to test it next week on AMD SMP guest.

Regards,
Dor

Someone on IRC suggested that it sounded like a clocking issue, since
some of my ping times are negative. He suggested trying a different
clock source. I tried it with dynticks, rtc, and unix. None of them seem
better, although all of them seem different in terms of patterns in
the ping times. Sorry if this makes it a long post, but I don't know how
to describe it other than to paste an example (below). Not sure if this
indicates that it is clock-related or if it is meaningless.

In any event, I'm not sure where to go from here. Another suggestion
from IRC was that it was due to the age of my host kernel (2.6.18) and
the fact that it doesn't support high-res timers. If I can avoid
replacing the distro kernel, I'd like to, but I'll do what I have to, I
suppose.

With dynticks (these are all with -net user, as I had some trouble with
my tap interface last night while testing this. The results are roughly
the same as when I was using tap before, though):

Reply from 10.0.2.2: bytes=32 time=1ms TTL=255
Reply from 10.0.2.2: bytes=32 time=1ms TTL=255
Reply from 10.0.2.2: bytes=32 time=1ms TTL=255
Reply from 10.0.2.2: bytes=32 time=143ms TTL=255
Reply from 10.0.2.2: bytes=32 time=143ms TTL=255
Reply from 10.0.2.2: bytes=32 time=1ms TTL=255
Reply from 10.0.2.2: bytes=32 time=143ms TTL=255
Reply from 10.0.2.2: bytes=32 time=1ms TTL=255
Reply from 10.0.2.2: bytes=32 time=1ms TTL=255
Reply from 10.0.2.2: bytes=32 time=1ms TTL=255
Reply from 10.0.2.2: bytes=32 time=-139ms TTL=255
Reply from 10.0.2.2: bytes=32 time=-141ms TTL=255
Reply from 10.0.2.2: bytes=32 time=-133ms TTL=255
Reply from 10.0.2.2: bytes=32 time=1ms TTL=255
Reply from 10.0.2.2: bytes=32 time=143ms TTL=255
Reply from 10.0.2.2: bytes=32 time=1ms TTL=255

With rtc:

Reply from 10.0.2.2: bytes=32 time=-224ms TTL=255
Reply from 10.0.2.2: bytes=32 time=-223ms TTL=255
Reply from 10.0.2.2: bytes=32 time=4ms TTL=255
Reply from 10.0.2.2: bytes=32 time1ms TTL=255
Reply from 10.0.2.2: bytes=32 time1ms TTL=255
Reply from 10.0.2.2: bytes=32 time1ms TTL=255
Reply from 10.0.2.2: bytes=32 time=225ms TTL=255
Reply from 10.0.2.2: bytes=32 time=-223ms TTL=255
Reply from 10.0.2.2: bytes=32 time=-224ms TTL=255
Reply from 10.0.2.2: bytes=32 time1ms TTL=255
Reply from 10.0.2.2: bytes=32 time1ms TTL=255
Reply from 10.0.2.2: bytes=32 time=225ms TTL=255
Reply from 10.0.2.2: bytes=32 time1ms TTL=255
Reply from 10.0.2.2: bytes=32 time1ms TTL=255
Reply from 10.0.2.2: bytes=32 time1ms TTL=255
Reply from 10.0.2.2: bytes=32 time=225ms TTL=255
Reply from 10.0.2.2: bytes=32 time=225ms TTL=255

With unix:

Reply from 10.0.2.2: bytes=32 time=-191ms TTL=255
Reply from 10.0.2.2: bytes=32 time1ms TTL=255
Reply from 10.0.2.2: bytes=32 time1ms TTL=255
Reply from 10.0.2.2: bytes=32 time=-191ms TTL=255
Reply from 10.0.2.2: bytes=32 time1ms TTL=255
Reply from 10.0.2.2: bytes=32 time1ms TTL=255
Reply from 10.0.2.2: bytes=32 time1ms TTL=255
Reply from 10.0.2.2: bytes=32 time1ms TTL=255
Reply from 10.0.2.2: bytes=32 time=-190ms TTL=255
Reply from 10.0.2.2: bytes=32 time=-191ms TTL=255
Reply from 10.0.2.2: bytes=32 time=1ms TTL=255
Reply from 10.0.2.2: bytes=32 time=192ms TTL=255
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Using signals to communicate two Qemu processes

2008-12-13 Thread Dor Laor

Passera, Pablo R wrote:

Hi all,
I am trying to communicate two VMs using a virtio driver. Once a data 
is moved to the driver I want to notify the other Qemu process that there is 
new data available in the buffer. I was thinking about using linux signals to 
synchronize both processes but when I register my SIGUSR1 handler in Qemu I am 
seeing an strange behavior. After starting the VM and Linux gets loaded, Qemu 
is receiving SIGUSR2 at a regular time period. Looking a little bit at the code 
I realize that signals are being used for other purposes in Qemu, however, 
SIGUSR1 is not used. Is it possible to use signals to synchronize these 
processes or should I think about using a different mechanism?

  
SIGUSR2 is used as aio completion signal. You can use SIGUSR1 but you 
need to know what you're doing (some threads block signals).

Better fit would be a pipe.

The vcpu

Thanks,

Pablo Pássera
Intel - Software Innovation Pathfinding Group
Cordoba - Argentina
Phone: +54 351 526 5611

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] AF_VMCHANNEL address family for guest-host communication.

2008-12-16 Thread Dor Laor

Evgeniy Polyakov wrote:

On Tue, Dec 16, 2008 at 08:57:27AM +0200, Gleb Natapov (g...@redhat.com) wrote:
  

Another approach is to implement that virtio backend with netlink based
userspace interface (like using connector or genetlink). This does not
differ too much from what you have with special socket family, but at
least it does not duplicate existing functionality of
userspace-kernelspace communications.

  

I implemented vmchannel using connector initially (the downside is that
message can be dropped). Is this more expectable for upstream? The
implementation was 300 lines of code.



Hard to tell, it depends on implementation. But if things are good, I
have no objections as connector maintainer :)

Messages in connector in particular and netlink in general are only
dropped, when receiving buffer is full (or when there is no memory), you
can tune buffer size to match virtual queue size or vice versa.

  
Gleb was aware of that and it's not a problem since all of the 
anticipated usages may
drop msgs (guest statistics, cutpaste, mouse movements, single sign on 
commands, etc).

Service that would need reliability could use basic acks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gettimeofday slow in RHEL4 guests

2008-12-29 Thread Dor Laor

Avi Kivity wrote:

Marcelo Tosatti wrote:

The tsc clock on older Linux 2.6 kernels compensates for lost ticks.
The algorithm uses the PIT count (latched) to measure the delay between
interrupt generation and handling, and sums that value, on the next
interrupt, to the TSC delta.

Sheng investigated this problem in the discussions before in-kernel PIT
was merged:

http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13873.html 



The algorithm overcompensates for lost ticks and the guest time runs
faster than the hosts.

There are two issues:

1) A bug in the in-kernel PIT which miscalculates the count value.

2) For the case where more than one interrupt is lost, and later
reinjected, the value read from PIT count is meaningless for the purpose
of the tsc algorithm. The count is interpreted as the delay until the
next interrupt, which is not the case with reinjection.

As Sheng mentioned in the thread above, Xen pulls back the TSC value
when reinjecting interrupts. VMWare ESX has a notion of virtual TSC,
which I believe is similar in this context.

For KVM I believe the best immediate solution (for now) is to provide an
option to disable reinjection, behaving similarly to real hardware. The
advantage is simplicity compared to virtualizing the time sources.

The QEMU PIT emulation has a limit on the rate of interrupt reinjection,
perhaps something similar should be investigated in the future.

The following patch (which contains the bugfix for 1) and disabled
reinjection) fixes the severe time drift on RHEL4 with clock=tsc.
What I'm proposing is to condition reinjection with an option
(-kvm-pit-no-reinject or something).

Comments or better ideas?


diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index e665d1c..608af7b 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -201,13 +201,16 @@ static int __pit_timer_fn(struct kvm_kpit_state 
*ps)

 if (!atomic_inc_and_test(pt-pending))
 set_bit(KVM_REQ_PENDING_TIMER, vcpu0-requests);
 
+if (atomic_read(pt-pending)  1)

+atomic_set(pt-pending, 1);
+
  


Replace the atomic_inc() with atomic_set(, 1) instead? One less test, 
and more important, the logic is scattered less around the source.
But having only a pending bit instead of a counter will cause kvm to 
drop pit irqs on rare high load situations.

The disable reinjection option is better.



 if (vcpu0  waitqueue_active(vcpu0-wq))
 wake_up_interruptible(vcpu0-wq);
 
 hrtimer_add_expires_ns(pt-timer, pt-period);

 pt-scheduled = hrtimer_get_expires_ns(pt-timer);
 if (pt-period)
-ps-channels[0].count_load_time = 
hrtimer_get_expires(pt-timer);

+ps-channels[0].count_load_time = ktime_get();
 
 return (pt-period == 0 ? 0 : 1);

 }
  


I don't like the idea of punting to the user but looks like we don't 
have a choice.  Hopefully vendors will port kvmclock to these kernels 
and release them as updates -- time simply doesn't work will with 
virtualization, especially Linux guests.


Except for these 'tsc compensate' guest, what are the occasions where 
the guest writes his tsc?

If this is the only case we can disable reinjection once we trap tsc writes.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: Reset PIT irq injection logic when the PIT IRQ is unmasked

2009-01-06 Thread Dor Laor

Avi Kivity wrote:

Marcelo Tosatti wrote:


I'm worried about:

- boot guest using local apic timer
- reset
- boot with pit timer
- a zillion interrupts

So at the very least, we need a limiter.



Or have a new notifier on kvm_pic_reset, instead of simply acking one
pending irq? That seems the appropriate place to zero the counter.
  


Clearing the counter on reset is good, but it doesn't solve the 
underlying problem, which is that there are two separate cases that 
appear to the host as the same thing:


- guest masks irqs, does a lot of work, unmasks irqs
- host deschedules guest, does a lot of work, reschedules guest

Right now we assume any missed interrupts are due to host load.  In 
the reboot case, that's clearly wrong, but that is only an example.  
Maybe we can use preempt notifiers to detect whether the timer tick 
happened while the guest was scheduled or not.



It might get too complex. It can be done inside the vcpu_run function too:
An irq needs reinjection if the irq window was not  open  from the  
timer  tick till  the next  timer tick
minus the deschedule time. You also need to know on the right vcpu that 
the pit irq it routed to.


Since scenarios like guests masking their pit and do a lot of work are 
rare and a bad guest behaviour anyway,

I don't think we should special case them. So the pit reset hook is enough.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM, Entropy and Windows

2011-02-17 Thread Dor Laor

On 02/17/2011 12:09 PM, Vadim Rozenfeld wrote:

On Thu, 2011-02-17 at 11:11 +0200, Avi Kivity wrote:

On 02/16/2011 09:54 PM, --[ UxBoD ]-- wrote:

Hello all,

I believe I am hitting a problem on one of our Windows 2003 KVM guests were I 
believe it is running out of Entropy and causing SSL issues.

I see that there is a module called virtio-rng which I believe passes the HW 
entropy source through to the guest but does this work on Windows as-well ?



AFAIK there is no Windows driver for virtio-rng.  Seems like a good
idea.  Vadim?

virtio-rng driver for windows is not a big deal. IMO, the real problem
will be to force Windows to use for CriptoApi.


What's the implication of it? good or bad?
Do you know what hyper-v is doing for it?




If it doesn't any ideas on how I can increase the amount of entropy being 
generated on a headless system ? or even monitor entropy on a Windows system ?


No idea.  Maybe you could ask Windows to collect entropy from packet
timings.




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KSM For All Via LD_PRELOAD?

2010-06-09 Thread Dor Laor

On 06/08/2010 09:43 PM, Gordan Bobic wrote:

Is this plausible?

I'm trying to work out if it's even worth considering this approach to
enable all memory used by in a system to be open to KSM page merging,
rather than only memory used by specific programs aware of it (e.g.
kvm/qemu).

Something like this would address the fact that container based
virtualization (OpenVZ, VServer, LXC) cannot benefit from KSM.

What I'm thinking about is somehow intercepting malloc() and wrapping it
so that all malloc()-ed memory gets madvise()-d as well.

Has this been done?

Or is this too crazy an idea?


It should work. Note that the the malloced memory should be aligned in 
order to get better sharing.




Gordan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KSM For All Via LD_PRELOAD?

2010-06-10 Thread Dor Laor

On 06/09/2010 01:31 PM, Gordan Bobic wrote:

On 06/09/2010 09:56 AM, Paolo Bonzini wrote:

Or is this too crazy an idea?


It should work. Note that the the malloced memory should be aligned in
order to get better sharing.


Within glibc malloc large blocks are mmaped, so they are automatically
aligned. Effective sharing of small blocks would take too much luck or
too much wasted memory, so probably madvising brk memory is not too
useful.

Of course there are exceptions. Bitmaps are very much sharable, but not
big. And some programs have their own allocator, using mmap in all
likelihood and slicing the resulting block. Typically these will be
virtual machines for garbage collected languages (but also GCC for
example does this). They will store a lot of pointers in there too, so
in this case KSM would likely work a lot for little benefit.

So if you really want to apply it to _all_ processes, it comes to mind
to wrap both mmap and malloc so that you can set a flag only for
mmap-within-malloc... It will take some experimentation and heuristics
to actually not degrade performance (and of course it will depend on the
workload), but it should work.


Arguably, the way QEMU KVM does it for the VM's entire memory block
doesn't seem to be distinguishing the types of memory allocation inside
the VM, so simply covering all mmap()/brk() calls would probably do no
worse in terms of performance. Or am I missing something?


There won't be drastic effect for qemu-kvm since the non guest ram areas 
are minimal. I thought you were trying to trap mmap/brk/malloc for other 
general applications regardless of virt.




Gordan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM test: Disable HPET on windows timedrift tests

2010-07-04 Thread Dor Laor

On 07/01/2010 07:05 PM, Lucas Meneghel Rodrigues wrote:

On Thu, 2010-07-01 at 17:42 +0300, Avi Kivity wrote:

On 06/30/2010 06:39 PM, Lucas Meneghel Rodrigues wrote:

By default, HPET is enabled on qemu and no time drift
mitigation is being made for it. So, add -no-hpet
if qemu supports it, during windows timedrift tests.




Hm, you're compensating for a qemu bug by not testing it.

Can we have an XFAIL for this test instead?


Certainly we can. In actuality, that's what's being done on our internal
autotest server - this particular test is linked to the upstream bug
https://bugs.launchpad.net/qemu/+bug/599958

We've discussed about this issue this morning, it boils down to the way
people are more comfortable with handling this issue. My first thought
was to disable HPET until someone come up with a time drift mitigation
strategy for it.

But your approach makes more sense, unless someone has something else to
say about it, I'll drop the patch from autotest shortly.


Actually we should do both - XFAIL when hpet is used and in addition 
(and even more importantly) test other clock sources by disabling hpet.




Lucas


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM Processor cache size

2010-08-03 Thread Dor Laor

On 08/03/2010 02:36 AM, Anthony Liguori wrote:

On 08/02/2010 05:42 PM, Andre Przywara wrote:

Anthony Liguori wrote:

On 08/02/2010 08:49 AM, Ulrich Drepper wrote:

glibc uses the cache size information returned by cpuid to perform
optimizations. For instance, copy operations which would pollute too
much of the cache because they are large will use non-temporal
instructions. There are real performance benefits.


I imagine that there would be real performance problems from doing
live migration with -cpu host too if we don't guarantee these values
remain stable across migration...

Again, -cpu host is not meant to be migrated.


Then it needs to prevent migration from happening. Otherwise, it's a bug
waiting to happen.


There are other virtualization use cases than cloud-like server
virtualization. Sometimes users don't care about migration (or even
the live version), but want full CPU exposure for performance reasons
(think of virtualizing Windows on a Linux desktop).
I agree that -cpu host and migration should be addressed, but only to
a certain degree. And missing migration experience should not be a
road blocker for -cpu host.


When we can reasonably prevent it, we should prevent users from shooting
themselves in the foot. Honestly, I think -cpu host is exactly what you
would want to use in a cloud. A lot of private clouds and even public
clouds are largely based on homogenous hardware.


There are two good solutions for that:
a. keep adding newer -cpu definition like the Penryn, Nehalem,
   Opteron_gx, so newer models will be abstracted as similar to the
   physical properties
b. Use strict flag with -cpu host and pass the info with the live
   migration protocol.
   Our live migration protocol can do better job with validation the
   cmdline and the current set of devices/hw on the src/dst and fail
   migration if there is a diff. Today we relay on libvirt for that,
   another mechanism will surely help, especially for -cpu host.
   The goodie is that there won't be a need to wait for the non-live
   migration part, and more cpu cycles will be saved.



I actually think the case where you want to migrate between heterogenous
hardware is grossly overstated.

Regards,

Anthony Liguori



Regards,
Andre.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bad O_DIRECT read and write performance with small block sizes with virtio

2010-08-03 Thread Dor Laor

On 08/02/2010 11:50 PM, Stefan Hajnoczi wrote:

On Mon, Aug 2, 2010 at 6:46 PM, Anthony Liguorianth...@codemonkey.ws  wrote:

On 08/02/2010 12:15 PM, John Leach wrote:


Hi,

I've come across a problem with read and write disk IO performance when
using O_DIRECT from within a kvm guest.  With O_DIRECT, reads and writes
are much slower with smaller block sizes.  Depending on the block size
used, I've seen 10 times slower.

For example, with an 8k block size, reading directly from /dev/vdb
without O_DIRECT I see 750 MB/s, but with O_DIRECT I see 79 MB/s.

As a comparison, reading in O_DIRECT mode in 8k blocks directly from the
backend device on the host gives 2.3 GB/s.  Reading in O_DIRECT mode
from a xen guest on the same hardware manages 263 MB/s.



Stefan has a few fixes for this behavior that help a lot.  One of them
(avoiding memset) is already upstream but not in 0.12.x.

The other two are not done yet but should be on the ML in the next couple
weeks.  They involve using ioeventfd for notification and unlocking the
block queue lock while doing a kick notification.


Thanks for mentioning those patches.  The ioeventfd patch will be sent
this week, I'm checking that migration works correctly and then need
to check that vhost-net still works.


Writing is affected in the same way, and exhibits the same behaviour
with O_SYNC too.

Watching with vmstat on the host, I see the same number of blocks being
read, but about 14 times the number of context switches in O_DIRECT mode
(4500 cs vs. 63000 cs) and a little more cpu usage.

The device I'm writing to is a device-mapper zero device that generates
zeros on read and throws away writes, you can set it up
at /dev/mapper/zero like this:

echo 0 21474836480 zero | dmsetup create zero

My libvirt config for the disk is:

disk type='block' device='disk'
   driver cache='none'/
   source dev='/dev/mapper/zero'/
   target dev='vdb' bus='virtio'/
   address type='pci' domain='0x' bus='0x00' slot='0x06'
function='0x0'/
/disk

which translates to the kvm arg:

-device
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
-drive file=/dev/mapper/zero,if=none,id=drive-virtio-disk1,cache=none


aio=native and change the io scheduler on the host to deadline should 
help as well.




I'm testing with dd:

dd if=/dev/vdb of=/dev/null bs=8k iflag=direct

As a side note, as you increase the block size read performance in
O_DIRECT mode starts to overtake non O_DIRECT mode reads (from about
150k block size). By 550k block size I'm seeing 1 GB/s reads with
O_DIRECT and 770 MB/s without.


Can you take QEMU out of the picture and run the same test on the host:

dd if=/dev/vdb of=/dev/null bs=8k iflag=direct
vs
dd if=/dev/vdb of=/dev/null bs=8k

This isn't quite the same because QEMU will use a helper thread doing
preadv.  I'm not sure what syscall dd will use.

It should be close enough to determine whether QEMU and device
emulation are involved at all though, or whether these differences are
due to the host kernel code path down to the device mapper zero device
being different for normal vs O_DIRECT.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RHEL 4.5 guest virtual network performace

2010-08-16 Thread Dor Laor

On 08/16/2010 10:00 PM, Alex Rixhardson wrote:

Hi guys,

I have the following configuration:

1. host is RHEL 5.5, 64bit with KVM (version that comes out of the box
with RHEL 5.5)
2. two guests:
2a: RHEL 5.5, 32bit,
2b: RHEL 4.5, 64bit

If I run iperf between host RHEL 5.5 and guest RHEL 5.5 inside the
virtual network subnet I get great results (  4Gbit/sec). But if I run
iperf between guest RHEL 4.5 and either of the two RHELs 5.5 I get bad
network performance (around 140Mbit/sec).


Please try netperf, iperf known to be buggy and might consume cpu w/o 
real justification




The configuration was made thru virtual-manager utility, nothing
special. I just added virtual network device to both guests.

Could you guys give me some tips on what should I check?

Regards,
Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RHEL 4.5 guest virtual network performace

2010-08-16 Thread Dor Laor

On 08/17/2010 12:22 AM, Alex Rixhardson wrote:

Thanks for the suggestion.

I tried with the netperf. I ran netserver on host and netperf on RHEL
5.5 and RHEL 4.5 guests. This are the results of 60 seconds long
tests:

RHEL 4.5 guest:
Throughput (10^6bits/sec) = 145.80


At least it bought you another 5Mb/s over iperf ...

It might be time related, 5.5 has kvmclock but rhel4 does not.
If it's 64 bit guest add this to the 4.5 guest cmdline  'notsc 
divider=10'. If it's 32 use 'clock=pmtmr divider=10'.

The divider is probably new and is in rhel4.8 only, it's ok w/o it too.

What's the host load for the 4.5 guest?



RHEL 5.5 guest:
Throughput (10^6bits/sec) = 3760.24

The results are really bad on RHEL 4.5 guest. What could be wrong?

Regards,
Alex

On Mon, Aug 16, 2010 at 9:49 PM, Dor Laordl...@redhat.com  wrote:

On 08/16/2010 10:00 PM, Alex Rixhardson wrote:


Hi guys,

I have the following configuration:

1. host is RHEL 5.5, 64bit with KVM (version that comes out of the box
with RHEL 5.5)
2. two guests:
2a: RHEL 5.5, 32bit,
2b: RHEL 4.5, 64bit

If I run iperf between host RHEL 5.5 and guest RHEL 5.5 inside the
virtual network subnet I get great results (4Gbit/sec). But if I run
iperf between guest RHEL 4.5 and either of the two RHELs 5.5 I get bad
network performance (around 140Mbit/sec).


Please try netperf, iperf known to be buggy and might consume cpu w/o real
justification



The configuration was made thru virtual-manager utility, nothing
special. I just added virtual network device to both guests.

Could you guys give me some tips on what should I check?

Regards,
Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RHEL 4.5 guest virtual network performace

2010-08-16 Thread Dor Laor

On 08/17/2010 12:51 AM, Alex Rixhardson wrote:

I tried with 'notsc divider=10' (since it's 64 bit guest), but the
results are the still same :-(. The guest is idle at the time of
testing. It has 2 CPU and 1024 MB RAM available.


Hmm, are you using e1000 or virtio for the 4.5 guest?
e1000 should be slow since it's less suitable for virtualization (3 
mmio/packet)





On Mon, Aug 16, 2010 at 11:35 PM, Dor Laordl...@redhat.com  wrote:

On 08/17/2010 12:22 AM, Alex Rixhardson wrote:


Thanks for the suggestion.

I tried with the netperf. I ran netserver on host and netperf on RHEL
5.5 and RHEL 4.5 guests. This are the results of 60 seconds long
tests:

RHEL 4.5 guest:
Throughput (10^6bits/sec) = 145.80


At least it bought you another 5Mb/s over iperf ...

It might be time related, 5.5 has kvmclock but rhel4 does not.
If it's 64 bit guest add this to the 4.5 guest cmdline  'notsc divider=10'.
If it's 32 use 'clock=pmtmr divider=10'.
The divider is probably new and is in rhel4.8 only, it's ok w/o it too.

What's the host load for the 4.5 guest?



RHEL 5.5 guest:
Throughput (10^6bits/sec) = 3760.24

The results are really bad on RHEL 4.5 guest. What could be wrong?

Regards,
Alex

On Mon, Aug 16, 2010 at 9:49 PM, Dor Laordl...@redhat.comwrote:


On 08/16/2010 10:00 PM, Alex Rixhardson wrote:


Hi guys,

I have the following configuration:

1. host is RHEL 5.5, 64bit with KVM (version that comes out of the box
with RHEL 5.5)
2. two guests:
2a: RHEL 5.5, 32bit,
2b: RHEL 4.5, 64bit

If I run iperf between host RHEL 5.5 and guest RHEL 5.5 inside the
virtual network subnet I get great results (  4Gbit/sec). But if I
run
iperf between guest RHEL 4.5 and either of the two RHELs 5.5 I get bad
network performance (around 140Mbit/sec).


Please try netperf, iperf known to be buggy and might consume cpu w/o
real
justification



The configuration was made thru virtual-manager utility, nothing
special. I just added virtual network device to both guests.

Could you guys give me some tips on what should I check?

Regards,
Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The HPET issue on Linux

2010-01-06 Thread Dor Laor

On 01/06/2010 12:09 PM, Gleb Natapov wrote:

On Wed, Jan 06, 2010 at 05:48:52PM +0800, Sheng Yang wrote:

Hi Beth

I still found the emulated HPET would result in some boot failure. For
example, on my 2.6.30, with HPET enabled, the kernel would fail check_timer(),
especially in timer_irq_works().

The testing of timer_irq_works() is let 10 ticks pass(using mdelay()), and
want to confirm the clock source with at least 5 ticks advanced in jiffies.
I've checked that, on my machine, it would mostly get only 4 ticks when HPET
enabled, then fail the test. On the other hand, if I using PIT, it would get
more than 10 ticks(maybe understandable if some complementary ticks there). Of
course, extend the ticks count/mdelay() time can work.

I think it's a major issue of HPET. And it maybe just due to a too long
userspace path for interrupt injection... If it's true, I think it's not easy
to deal with it.


PIT tick are reinjected automatically, HPET should probably do the same
although it may just create another set of problems.


Older Linux do automatic adjustment for lost ticks so automatic 
reinjection causes time to run too fast. This is why we added the 
-no-kvm-pit-reinject flag...


It took lots of time to pit/rtc to stabilize, in order of seriously 
consider the hpet emulation, lots of testing should be done.




--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Dor Laor

On 01/06/2010 05:16 PM, Anthony Liguori wrote:

On 01/06/2010 08:48 AM, Dor Laor wrote:

On 01/06/2010 04:32 PM, Avi Kivity wrote:

On 01/06/2010 04:22 PM, Michael S. Tsirkin wrote:

We can probably default -enable-kvm to -cpu host, as long as we
explain
very carefully that if users wish to preserve cpu features across
upgrades, they can't depend on the default.

Hardware upgrades or software upgrades?


Yes.



I just want to remind all the the main motivation for using -cpu
realModelThatWasOnceShiped is to provide correct cpu emulation for the
guest. Using a random qemu|kvm64+flag1-flag2 might really cause
trouble for the guest OS or guest apps.

On top of -cpu nehalem we can always add fancy features like x2apic, etc.


I think it boils down to, how are people going to use this.

For individuals, code names like Nehalem are too obscure. From my own
personal experience, even power users often have no clue whether there
processor is a Nehalem or not.

For management tools, Nehalem is a somewhat imprecise target because it
covers a wide range of potential processors. In general, I think what we
really need to do is simplify the process of going from, here's the
output of /proc/cpuinfo for a 100 nodes, what do I need to pass to qemu
so that migration always works for these systems.

I don't think -cpu nehalem really helps with that problem. -cpu none
helps a bit, but I hope we can find something nicer.


We can debate about the exact name/model to represent the Nehalem 
family, I don't have an issue with that and actually Intel and Amd 
should define it.


There are two main motivations behind the above approach:
1. Sound guest cpu definition.
   Using a predefined model should automatically set all the relevant
   vendor/stepping/cpuid flags/cache sizes/etc.
   We just can let every management application deal with it. It breaks
   guest OS/apps. For instance there are MSI support in windows guest
   relay on the stepping.

2. Simplifying end user and mgmt tools.
   qemu/kvm have the best knowledge about these low levels. If we push
   it up in the stack, eventually it reaches the user. The end user,
   not a 'qemu-devel user' which is actually far better from the
   average user.

   This means that such users will have to know what is popcount and
   whether or not to limit migration on one host by adding sse4.2 or
   not.

This is exactly what vmware are doing:
 - Intel CPUs : 
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1991
 - AMD CPUs : 
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1992


Why should we invent the wheel (qemu64..)? Let's learn from their 
experience.


This is the test description of the original patch by John:


# Intel
# -

# Management layers remove pentium3 by default.
# It primarily remains here for testing of 32-bit migration.
#
[0:Pentium 3 Intel
:vmx
:pentium3;]

# Core 2, 65nm
# possible option sets: (+nx,+cx16), (+nx,+cx16,+ssse3)
#
1:Merom
:vmx,sse2
:qemu64,-nx,+sse2;

# Core2 45nm
#
2:Penryn
:vmx,sse2,nx,cx16,ssse3,sse4_1
:qemu64,+sse2,+cx16,+ssse3,+sse4_1;

# Core i7 45/32nm
#
3:Nehalem
:vmx,sse2,nx,cx16,ssse3,sse4_1,sse4_2,popcnt
:qemu64,+sse2,+cx16,+ssse3,+sse4_1,+sse4_2,+popcnt;


# AMD
# ---

# Management layers remove pentium3 by default.
# It primarily remains here for testing of 32-bit migration.
#
[0:Pentium 3 AMD
:svm
:pentium3;]

# Opteron 90nm stepping E1/E4/E6
# possible option sets: (-nx) for 130nm
#
1:Opteron G1
:svm,sse2,nx
:qemu64,+sse2;

# Opteron 90nm stepping F2/F3
#
2:Opteron G2
:svm,sse2,nx,cx16,rdtscp
:qemu64,+sse2,+cx16,+rdtscp;

# Opteron 65/45nm
#
3:Opteron G3
:svm,sse2,nx,cx16,sse4a,misalignsse,popcnt,abm
:qemu64,+sse2,+cx16,+sse4a,+misalignsse,+popcnt,+abm;





Regards,

Anthony Liguori




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Dor Laor

On 01/07/2010 10:18 AM, Avi Kivity wrote:

On 01/07/2010 10:03 AM, Dor Laor wrote:


We can debate about the exact name/model to represent the Nehalem
family, I don't have an issue with that and actually Intel and Amd
should define it.


AMD and Intel already defined their names (in cat /proc/cpuinfo). They
don't define families, the whole idea is to segment the market.


The idea here is to minimize the number of models we should have the 
following range for Intel for example:

  pentium3 - merom -  penry - Nehalem - host - kvm/qemu64
So we're supplying wide range of cpus, p3 for maximum flexibility and 
migration, nehalem for performance and migration, host for maximum 
performance and qemu/kvm64 for custom maid.






There are two main motivations behind the above approach:
1. Sound guest cpu definition.
Using a predefined model should automatically set all the relevant
vendor/stepping/cpuid flags/cache sizes/etc.
We just can let every management application deal with it. It breaks
guest OS/apps. For instance there are MSI support in windows guest
relay on the stepping.

2. Simplifying end user and mgmt tools.
qemu/kvm have the best knowledge about these low levels. If we push
it up in the stack, eventually it reaches the user. The end user,
not a 'qemu-devel user' which is actually far better from the
average user.

This means that such users will have to know what is popcount and
whether or not to limit migration on one host by adding sse4.2 or
not.

This is exactly what vmware are doing:
- Intel CPUs :
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1991

- AMD CPUs :
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1992



They don't have to deal with different qemu and kvm versions.



Both our customers - the end users. It's not their problem.
IMO what's missing today is a safe and sound cpu emulation that is 
simply and friendly to represent. qemu64,+popcount is not simple for the 
end user. There is no reason to through it on higher level mgmt.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Dor Laor

On 01/07/2010 11:24 AM, Avi Kivity wrote:

On 01/07/2010 11:11 AM, Dor Laor wrote:

On 01/07/2010 10:18 AM, Avi Kivity wrote:

On 01/07/2010 10:03 AM, Dor Laor wrote:


We can debate about the exact name/model to represent the Nehalem
family, I don't have an issue with that and actually Intel and Amd
should define it.


AMD and Intel already defined their names (in cat /proc/cpuinfo). They
don't define families, the whole idea is to segment the market.


The idea here is to minimize the number of models we should have the
following range for Intel for example:
pentium3 - merom - penry - Nehalem - host - kvm/qemu64
So we're supplying wide range of cpus, p3 for maximum flexibility and
migration, nehalem for performance and migration, host for maximum
performance and qemu/kvm64 for custom maid.


There's no such thing as Nehalem.


Intel were ok with it. Again, you can name is corei7 or xeon34234234234, 
I don't care, the principle remains the same.






This is exactly what vmware are doing:
- Intel CPUs :
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1991


- AMD CPUs :
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1992




They don't have to deal with different qemu and kvm versions.



Both our customers - the end users. It's not their problem.
IMO what's missing today is a safe and sound cpu emulation that is
simply and friendly to represent. qemu64,+popcount is not simple for
the end user. There is no reason to through it on higher level mgmt.


There's no simple solution except to restrict features to what was
available on the first processors.


What's not simple about the above 4 options?
What's a better alternative (that insures users understand it and use it 
and guest msi and even skype application is happy about it)?



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Dor Laor

On 01/07/2010 01:39 PM, Anthony Liguori wrote:

On 01/07/2010 03:40 AM, Dor Laor wrote:

There's no simple solution except to restrict features to what was
available on the first processors.


What's not simple about the above 4 options?
What's a better alternative (that insures users understand it and use
it and guest msi and even skype application is happy about it)?


Even if you have -cpu Nehalem, different versions of the KVM kernel
module may additionally filter cpuid flags.

So if you had a 2.6.18 kernel and a 2.6.33 kernel, it may be necessary
to say:

(2.6.33) qemu -cpu Nehalem,-syscall
(2.6.18) qemu -cpu Nehalem


Or let qemu do it automatically for you.



In order to be compatible.

Regards,

Anthony Liguori



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Dor Laor

On 01/07/2010 02:00 PM, Avi Kivity wrote:

On 01/07/2010 01:44 PM, Dor Laor wrote:

So if you had a 2.6.18 kernel and a 2.6.33 kernel, it may be necessary
to say:

(2.6.33) qemu -cpu Nehalem,-syscall
(2.6.18) qemu -cpu Nehalem



Or let qemu do it automatically for you.


qemu on 2.6.33 doesn't know that you're running qemu on 2.6.18 on
another node.



We can live with it, either have qemu realize the kernel version out of 
another existing feature or query uname.


Alternatively, the matching libvirt package can be the one adding or 
removing it in the right distribution.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Dor Laor

On 01/07/2010 03:14 PM, Anthony Liguori wrote:

On 01/07/2010 06:40 AM, Avi Kivity wrote:

On 01/07/2010 02:33 PM, Anthony Liguori wrote:


There's another option.

Make cpuid information part of live migration protocol, and then
support something like -cpu Xeon-3550. We would remember the exact
cpuid mask we present to the guest and then we could validate that we
can obtain the same mask on the destination.


It solves controlling the destination qemu execution all right but does 
not change the initial spawning of the original guest - to know whether 
,-syscall is needed or not.


Anyway, I'm in favor of it too.



Currently, our policy is to only migrate dynamic (from the guest's
point of view) state, and specify static state on the command line [1].

I think your suggestion makes a lot of sense, but I'd like to expand
it to move all guest state, whether dynamic or static. So '-m 1G'
would be migrated as well (but not -mem-path). Similarly, in -drive
file=...,if=ide,index=1, everything but file=... would be migrated.


Yes, I agree with this and it should be in the form of an fdt. This
means we need full qdev conversion.

But I think cpuid is somewhere in the middle with respect to static vs.
dynamic. For instance, -cpu host is very dynamic in that you get very
difficult results on different systems. Likewise, because of kvm
filtering, even -cpu qemu64 can be dynamic.

So if we didn't have filtering and -cpu host, I'd agree that it's
totally static but I think in the current state, it's dynamic.


This has an advantage wrt hotplug: since qemu is responsible for
migrating all guest visible information, the migrator is no longer
responsible for replaying hotplug events in the exact sequence they
happened.


Yup, 100% in agreement as a long term goal.


In short, I think we should apply your suggestion as broadly as possible.

[1] cpuid state is actually dynamic; repeated cpuid instruction
execution with the same operands can return different results. kvm
supports querying and setting this state.


Yes, and we save some cpuid state in cpu. We just don't save all of it.

Regards,

Anthony Liguori



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-25 Thread Dor Laor

On 01/21/2010 05:05 PM, Anthony Liguori wrote:

On 01/20/2010 07:18 PM, john cooper wrote:

Chris Wright wrote:

* Daniel P. Berrange (berra...@redhat.com) wrote:

To be honest all possible naming schemes for '-cpuname' are just as
unfriendly as each other. The only user friendly option is '-cpu host'.

IMHO, we should just pick a concise naming scheme document it. Given
they are all equally unfriendly, the one that has consistency with
vmware
naming seems like a mild winner.

Heh, I completely agree, and was just saying the same thing to John
earlier today. May as well be -cpu {foo,bar,baz} since the meaning for
those command line options must be well-documented in the man page.

I can appreciate the concern of wanting to get this
as correct as possible.


This is the root of the trouble. At the qemu layer, we try to focus on
being correct.

Management tools are typically the layer that deals with being correct.

A good compromise is making things user tunable which means that a
downstream can make correctness decisions without forcing those
decisions on upstream.

In this case, the idea would be to introduce a new option, say something
like -cpu-def. The syntax would be:

-cpu-def
name=coreduo,level=10,family=6,model=14,stepping=8,features=+vme+mtrr+clflush+mca+sse3+monitor,xlevel=0x8008,model_id=Genuine
Intel(R) CPU T2600 @ 2.16GHz

Which is not that exciting since it just lets you do -cpu coreduo in a
much more complex way. However, if we take advantage of the current
config support, you can have:

[cpu-def]
name=coreduo
level=10
family=6
model=14
stepping=8
features=+vme+mtrr+clflush+mca+sse3..
model_id=Genuine Intel...

And that can be stored in a config file. We should then parse
/etc/qemu/target-targetname.conf by default. We'll move the current
x86_defs table into this config file and then downstreams/users can
define whatever compatibility classes they want.

With this feature, I'd be inclined to take correct compatibility
classes like Nehalem as part of the default qemurc that we install
because it's easily overridden by a user. It then becomes just a
suggestion on our part verses a guarantee.

It should just be a matter of adding qemu_cpudefs_opts to
qemu-config.[ch], taking a new command line that parses the argument via
QemuOpts, then passing the parsed options to a target-specific function
that then builds the table of supported cpus.


Isn't the outcome of John's patches and these configs will be exactly 
the same? Since these cpu models won't ever change, there is no reason 
why not to hard code them. Adding configs or command lines is a good 
idea but it is more friendlier to have basic support to the common cpus.

This is why qemu today offers: -cpu ?
x86   qemu64
x86   phenom
x86 core2duo
x86kvm64
x86   qemu32
x86  coreduo
x86  486
x86  pentium
x86 pentium2
x86 pentium3
x86   athlon
x86 n270

So bottom line, my point is to have John's base + your configs. We need 
to keep also the check verb and the migration support for sending those.


btw: IMO we should deal with this complexity ourselves and save 99.9% of 
the users the need to define such models, don't ask this from a java 
programmer, he is running on a JVM :-)





Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..

2010-01-25 Thread Dor Laor

On 01/25/2010 04:21 PM, Anthony Liguori wrote:

On 01/25/2010 03:08 AM, Dor Laor wrote:

qemu-config.[ch], taking a new command line that parses the argument via
QemuOpts, then passing the parsed options to a target-specific function
that then builds the table of supported cpus.

It should just be a matter of adding qemu_cpudefs_opts to

Isn't the outcome of John's patches and these configs will be exactly
the same? Since these cpu models won't ever change, there is no reason
why not to hard code them. Adding configs or command lines is a good
idea but it is more friendlier to have basic support to the common cpus.
This is why qemu today offers: -cpu ?
x86 qemu64
x86 phenom
x86 core2duo
x86 kvm64
x86 qemu32
x86 coreduo
x86 486
x86 pentium
x86 pentium2
x86 pentium3
x86 athlon
x86 n270

So bottom line, my point is to have John's base + your configs. We
need to keep also the check verb and the migration support for sending
those.

btw: IMO we should deal with this complexity ourselves and save 99.9%
of the users the need to define such models, don't ask this from a
java programmer, he is running on a JVM :-)


I'm suggesting John's base should be implemented as a default config
that gets installed by default in QEMU. The point is that a smart user
(or a downstream) can modify this to suite their needs more appropriately.

Another way to look at this is that implementing a somewhat arbitrary
policy within QEMU's .c files is something we should try to avoid.
Implementing arbitrary policy in our default config file is a fine thing
to do. Default configs are suggested configurations that are modifiable
by a user. Something baked into QEMU is something that ought to work for


If we get the models right, users and mgmt stacks won't need to define 
them. It seems like almost impossible task for us, mgmt stack/users 
won't do a better job, the opposite I guess. The configs are great, I 
have no argument against them, my case is that if we can pin down some 
definitions, its better live in the code, like the above models.
It might even help to get the same cpus across the various vendors, 
otherwise we might end up with IBM's core2duo, RH's core2duo, Suse's,..



everyone in all circumstances.

Regards,

Anthony Liguori




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] KVM test: Control files automatic generation to save memory

2010-02-14 Thread Dor Laor

On 02/14/2010 07:07 PM, Michael Goldish wrote:


- Lucas Meneghel Rodriguesl...@redhat.com  wrote:


As our configuration system generates a list of dicts
with test parameters, and that list might be potentially
*very* large, keeping all this information in memory might
be a problem for smaller virtualization hosts due to
the memory pressure created. Tests made on my 4GB laptop
show that most of the memory is being used during a
typical kvm autotest session.

So, instead of keeping all this information in memory,
let's take a different approach and unfold all the
tests generated by the config system and generate a
control file:

job.run_test('kvm', params={param1, param2, ...}, tag='foo', ...)
job.run_test('kvm', params={param1, param2, ...}, tag='bar', ...)

By dumping all the dicts that were before in the memory to
a control file, the memory usage of a typical kvm autotest
session is drastically reduced making it easier to run in smaller
virt hosts.

The advantages of taking this new approach are:
  * You can see what tests are going to run and the dependencies
between them by looking at the generated control file
  * The control file is all ready to use, you can for example
paste it on the web interface and profit
  * As mentioned, a lot less memory consumption, avoiding
memory pressure on virtualization hosts.

This is a crude 1st pass at implementing this approach, so please
provide comments.

Signed-off-by: Lucas Meneghel Rodriguesl...@redhat.com
---


Interesting idea!

- Personally I don't like the renaming of kvm_config.py to
generate_control.py, and prefer to keep them separate, so that
generate_control.py has the create_control() function and
kvm_config.py has everything else.  It's just a matter of naming;
kvm_config.py deals mostly with config files, not with control files,
and it can be used for other purposes than generating control files.

- I wonder why so much memory is used by the test list.  Our daily
test sets aren't very big, so although the parser should use a huge
amount of memory while parsing, nearly all of that memory should be
freed by the time the parser is done, because the final 'only'
statement reduces the number of tests to a small fraction of the total
number in a full set.  What test set did you try with that 4 GB
machine, and how much memory was used by the test list?  If a
ridiculous amount of memory was used, this might indicate a bug in
kvm_config.py (maybe it keeps references to deleted tests, forcing
them to stay in memory).


I agree, it's worth getting to the bottom of it - I wonder how many 
objects are created on kvm unstable set. It should be a huge number.
Besides that, one can always call the python garbage collection 
interface in order to free unreferenced memory immediately.




- I don't think this approach will work for control.parallel, because
the tests have to be assigned dynamically to available queues, and
AFAIK this can't be done by a simple static control file.

- Whether or not this is a good idea probably depends on the users.
On one hand, users will be required to run generate_control.py before
autotest.py, and the generated control files will be very big and
ugly; on the other hand, maybe they won't care.

I probably haven't given this enough thought so I might have missed a
few things.



  client/tests/kvm/control |   64 
  client/tests/kvm/generate_control.py |  586
++
  client/tests/kvm/kvm_config.py   |  524
--
  3 files changed, 586 insertions(+), 588 deletions(-)
  delete mode 100644 client/tests/kvm/control
  create mode 100755 client/tests/kvm/generate_control.py
  delete mode 100755 client/tests/kvm/kvm_config.py

diff --git a/client/tests/kvm/control b/client/tests/kvm/control
deleted file mode 100644
index 163286e..000
--- a/client/tests/kvm/control
+++ /dev/null
@@ -1,64 +0,0 @@
-AUTHOR = 
-u...@redhat.com (Uri Lublin)
-dru...@redhat.com (Dror Russo)
-mgold...@redhat.com (Michael Goldish)
-dh...@redhat.com (David Huff)
-aerom...@redhat.com (Alexey Eromenko)
-mbu...@redhat.com (Mike Burns)
-
-TIME = 'MEDIUM'
-NAME = 'KVM test'
-TEST_TYPE = 'client'
-TEST_CLASS = 'Virtualization'
-TEST_CATEGORY = 'Functional'
-
-DOC = 
-Executes the KVM test framework on a given host. This module is
separated in
-minor functions, that execute different tests for doing Quality
Assurance on
-KVM (both kernelspace and userspace) code.
-
-For online docs, please refer to
http://www.linux-kvm.org/page/KVM-Autotest
-
-
-import sys, os, logging
-# Add the KVM tests dir to the python path
-kvm_test_dir = os.path.join(os.environ['AUTODIR'],'tests/kvm')
-sys.path.append(kvm_test_dir)
-# Now we can import modules inside the KVM tests dir
-import kvm_utils, kvm_config
-
-# set English environment (command output might be localized, need to
be safe)
-os.environ['LANG'] = 'en_US.UTF-8'
-
-build_cfg_path = os.path.join(kvm_test_dir, build.cfg)
-build_cfg = 

Re: Recommended network driver for a windows KVM guest

2010-02-18 Thread Dor Laor

On 02/17/2010 12:51 PM, carlopmart wrote:

Hi all,

I need to install several windows KVM (rhel5.4 host fully updated)
guests for iSCSI boot. iSCSI servers are Solaris/OpenSolaris storage
servers and I need to boot windows guests (2008R2 and Win7) using gpxe.
Can i use virtio net dirver during windows install or e1000 driver??


rhel5.4 does not have gpxe so it won't work. rhel5.5 will have such but
I don't recall someone testing iScsi with kvm+gpxe on upstream too, 
worth testing.


Anyway, virtio performs better than e1000 and potentially more stable 
than it.




Many thanks.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make QEmu HPET disabled by default for KVM?

2010-03-14 Thread Dor Laor

On 03/14/2010 09:10 AM, Gleb Natapov wrote:

On Sun, Mar 14, 2010 at 09:05:50AM +0200, Avi Kivity wrote:

On 03/11/2010 09:08 PM, Marcelo Tosatti wrote:





I have kept --no-hpet in my setup for
months...

Any details about the problems?  HPET is important to some guests.

As Gleb mentioned in the other thread, reinjection will introduce
another set of problems.

Ideally all this timer related problems should be fixed by correlating
timer interrupts and time source reads.


This still needs reinjection (or slewing of the timer frequency).
Correlation doesn't fix drift.


But only when all time sources are synchronised and correlated with
interrupts we can slew time frequency without guest noticing (and only
if guest disables NTP)


In the mean time we should definitely disable hpet by default.
Besides this we need to fully virtualize the tsc, fix win7 64bit rtc 
time drift and some pvclock potential issues. Before we add new timer, 
better fix existing ones.


What about creating a pv time keeping device that will be aware of lost 
ticks and host wall clock time? It's similar to hyper-v enlightenment 
virt timers.





Since one already has to use special timer parameters (-rtc-td-hack,
-no-kvm-pit-reinjection), using -no-hpet for problematic Linux
guests seems fine?


Depends on how common the problematic ones are.  If they're common,
better to have a generic fix.

--
error compiling committee.c: too many arguments to function


--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Make QEmu HPET disabled by default for KVM?

2010-03-14 Thread Dor Laor

On 03/14/2010 12:27 PM, Avi Kivity wrote:

On 03/14/2010 12:23 PM, Dor Laor wrote:

On 03/14/2010 09:10 AM, Gleb Natapov wrote:

On Sun, Mar 14, 2010 at 09:05:50AM +0200, Avi Kivity wrote:

On 03/11/2010 09:08 PM, Marcelo Tosatti wrote:





I have kept --no-hpet in my setup for
months...

Any details about the problems? HPET is important to some guests.

As Gleb mentioned in the other thread, reinjection will introduce
another set of problems.

Ideally all this timer related problems should be fixed by correlating
timer interrupts and time source reads.


This still needs reinjection (or slewing of the timer frequency).
Correlation doesn't fix drift.


But only when all time sources are synchronised and correlated with
interrupts we can slew time frequency without guest noticing (and only
if guest disables NTP)


In the mean time we should definitely disable hpet by default.


Definitely not. Windows needs it. Some pre-kvmclock Linux may also work
with it.

Without hpet, there is no fast high resolution timer in the system.


It's all depends on how hard would it be to re-inject to windows guest.
We still need to fix the win2k3 64 bit and win2k8 64 bit (and not win7 
as I told initially) since the irq is broadcasted to all the vcpus and 
we do not track who acknowledged the irq.





Besides this we need to fully virtualize the tsc, fix win7 64bit rtc
time drift and some pvclock potential issues. Before we add new timer,
better fix existing ones.

What about creating a pv time keeping device that will be aware of
lost ticks and host wall clock time? It's similar to hyper-v
enlightenment virt timers.


That's kvmclock.



I meant a device that can be used to generate timeouts. We do use today 
pit/rtc along with kvmclock time source but it's not perfect and 
probably the same for hpet. This is why I tough that a pv device will be 
beneficial.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: [PATCH]: An implementation of HyperV KVP functionality

2010-11-14 Thread Dor Laor

FYI. Long ago we discussed key value approach on top of virtio-serial.

 Original Message 
Subject: [PATCH]: An implementation of HyperV  KVP  functionality
Date: Thu, 11 Nov 2010 13:03:10 -0700
From: Ky Srinivasan ksriniva...@novell.com
To: de...@driverdev.osuosl.org, virtualizat...@lists.osdl.org
CC: Haiyang Zhang haiya...@microsoft.com, Greg KH gre...@suse.de

I am enclosing a patch that implements the KVP (Key Value Pair) 
functionality for Linux guests on HyperV. This functionality allows 
Microsoft Management stack to query information from the guest. This 
functionality is implemented in two parts: (a) A kernel component that 
communicates with the host and (b) A user level daemon that implements 
data gathering. The attached patch (kvp.patch) implements the kernel 
component. I am also attaching the code for the user-level daemon 
(kvp_daemon.c)  for reference.


Regards,

K. Y


From: K. Y. Srinivasan ksriniva...@novell.com

Subject: An implementation of key/value pair feature (KVP) for Linux 
on HyperV.

Signed-off-by: K. Y. Srinivasan ksriniva...@novell.com 

Index: linux.trees.git/drivers/staging/hv/kvp.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux.trees.git/drivers/staging/hv/kvp.c2010-11-11 13:45:17.0 
-0500
@@ -0,0 +1,404 @@
+/*
+ * An implementation of key value pair (KVP) functionality for Linux.
+ *
+ *
+ * Copyright (C) 2010, Novell, Inc.
+ * Author : K. Y. Srinivasan ksriniva...@novell.com
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT.  See the GNU General Public License for more
+ * details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+
+#include linux/net.h
+#include linux/nls.h
+#include linux/connector.h
+
+#include logging.h
+#include osd.h
+#include vmbus.h
+#include vmbus_packet_format.h
+#include vmbus_channel_interface.h
+#include version_info.h
+#include channel.h
+#include vmbus_private.h
+#include vmbus_api.h
+#include utils.h
+#include kvp.h
+
+
+/*
+ *
+ * The following definitions are shared with the user-mode component; do not
+ * change any of this without making the corresponding changes in
+ * the KVP user-mode component.
+ */
+
+#define CN_KVP_VAL 0x1 /* This supports queries from the kernel */
+#define CN_KVP_USER_VAL   0x2 /* This supports queries from the user */
+
+
+/*
+ * KVP protocol: The user mode component first registers with the
+ * the kernel component. Subsequently, the kernel component requests, data
+ * for the specified keys. In response to this message the user mode component
+ * fills in the value corresponding to the specified key. We overload the
+ * sequence field in the cn_msg header to define our KVP message types.
+ *
+ * XXXKYS: Have a shared header file between the user and kernel (TODO)
+ */
+
+enum kvp_op {
+   KVP_REGISTER = 0, /* Register the user mode component */
+   KVP_KERNEL_GET,/*Kernel is requesting the value for the specified key*/
+   KVP_KERNEL_SET, /*Kernel is providing the value for the specified key*/
+   KVP_USER_GET, /*User is requesting the value for the specified key*/
+   KVP_USER_SET /*User is providing the value for the specified key*/
+};
+
+
+
+#define KVP_KEY_SIZE512
+#define KVP_VALUE_SIZE  2048
+
+
+typedef struct kvp_msg {
+   __u32 kvp_key; /* Key */
+   __u8  kvp_value[0]; /* Corresponding value */
+} kvp_msg_t;
+
+/*
+ * End of shared definitions.
+ */
+
+/*
+ * Registry value types.
+ */
+
+#define REG_SZ 1
+
+/*
+ * Array of keys we support in Linux.
+ *
+ */
+#define KVP_MAX_KEY10
+#define KVP_LIC_VERSION 1
+
+
+static char *kvp_keys[KVP_MAX_KEY] = {FullyQualifiedDomainName,
+   IntegrationServicesVersion,
+   NetworkAddressIPv4,
+   NetworkAddressIPv6,
+   OSBuildNumber,
+   OSName,
+   OSMajorVersion,
+   OSMinorVersion,
+   OSVersion,
+   ProcessorArchitecture,
+   };
+
+/*
+ * Global state maintained for transaction that is being processed.
+ * Note that only one transaction can be active at any point in time.
+ *
+ * This state is set when we receive a request from the host; we
+ * cleanup this state when the 

Re: [Qemu-devel] [PATCH] qemu-kvm: introduce cpu_start/cpu_stop commands

2010-11-23 Thread Dor Laor

On 11/23/2010 08:41 AM, Avi Kivity wrote:

On 11/23/2010 01:00 AM, Anthony Liguori wrote:

qemu-kvm vcpu threads don't response to SIGSTOP/SIGCONT. Instead of
teaching
them to respond to these signals, introduce monitor commands that stop
and start
individual vcpus.

The purpose of these commands are to implement CPU hard limits using
an external
tool that watches the CPU consumption and stops the CPU as appropriate.


Why not use cgroup for that?



The monitor commands provide a more elegant solution that signals
because it
ensures that a stopped vcpu isn't holding the qemu_mutex.



 From signal(7):

The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.

Perhaps this is a bug in kvm?

If we could catch SIGSTOP, then it would be easy to unblock it only
while running in guest context. It would then stop on exit to userspace.

Using monitor commands is fairly heavyweight for something as high
frequency as this. What control period do you see people using? Maybe we
should define USR1 for vcpu start/stop.

What happens if one vcpu is stopped while another is running? Spin
loops, synchronous IPIs will take forever. Maybe we need to stop the
entire process.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2

2010-11-29 Thread Dor Laor

On 11/29/2010 06:23 PM, Stefan Hajnoczi wrote:

On Mon, Nov 29, 2010 at 3:00 PM, Yoshiaki Tamura
tamura.yoshi...@lab.ntt.co.jp  wrote:

2010/11/29 Paul Brookp...@codesourcery.com:

If devices incorrectly claim support for live migration, then that should
also be fixed, either by removing the broken code or by making it work.


I totally agree with you.


AFAICT your current proposal is just feeding back the results of some
fairly specific QA testing.  I'd rather not get into that game.  The
correct response in the context of upstream development is to file a bug
and/or fix the code. We already have config files that allow third party
packagers to remove devices they don't want to support.


Sorry, I didn't get what you're trying to tell me.  My plan would
be to initially start from a subset of devices, and gradually
grow the number of devices that Kemari works with.  While this
process, it'll include what you said above, file a but and/or fix
the code.  Am I missing what you're saying?


My point is that the whitelist shouldn't exist at all.  Devices either support
migration or they don't.  Having some sort of separate whitelist is the wrong
way to determine which devices support migration.


Alright!

Then if a user encounters a problem with Kemari, we'll fix Kemari
or the devices or both. Correct?


Is this a fair summary: any device that supports live migration workw
under Kemari?


It might be fair summary but practically we barely have live migration 
working w/o Kemari. In addition, last I checked Kemari needs additional 
hooks and it will be too hard to keep that out of tree until all devices 
get it.




(If such a device does not work under Kemari then this is a bug that
needs to be fixed in live migration, Kemari, or the device.)

Stefan


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Freezing Windows 2008 x64bit guest

2010-12-13 Thread Dor Laor

On 12/13/2010 09:42 PM, Manfred Heubach wrote:



Gleb Natapovglebat  redhat.com  writes:



On Wed, Jul 28, 2010 at 12:53:02AM +0300, Harri Olin wrote:

Gleb Natapov wrote:

On Wed, Jul 21, 2010 at 09:25:31AM +0300, Harri Olin wrote:

Gleb Natapov kirjoitti:

On Mon, Jul 19, 2010 at 10:17:02AM +0300, Harri Olin wrote:

Gleb Natapov kirjoitti:

On Thu, Jul 15, 2010 at 03:19:44PM +0200, Christoph Adomeit wrote:

But one Windows 2008 64 Bit Server Standard is freezing regularly.
This happens sometimes 3 times a day, sometimes it takes 2 days
until freeze. The Windows Machine is a clean fresh install.

I think I have seen same problem occur on my Windows 2008 SBS SP2
64bit system, but a bit less often, only like once a week.
Now I haven't seen crashes but only freezes with qemu on 100% and
virtual system unresponsive.

Does sendkey from monitor works? qemu-kvm-0.11.1 is very old and this is
not total freeze which even harder to debug. I don't see anything
extraordinary in your logs. 4643 interrupt per second for 4 cpus is
normal if windows runs multimedia or other app that need hi-res timers.
Does your host swapping? Is there any chance that you can try upstream

qemu-kvm?


I tried running qemu-kvm from git but it exhibited the same problem
as 12.x that I tried before, BSODing once in a while, running kernel
2.6.34.1.


That should be pretty stable config, although it would be nice if you
could try running in qemy-kvm.git head.


sample BSOD failure details:
These two with Realtec nic and qemu cpu
0x0019 (0x0020, 0xf88007e65970,
0xf88007e65990, 0x0502040f)
0x0019 (0x0020, 0xf88007a414c0,
0xf88007a414e0, 0x0502044c)

These are with e1000 and -cpu host
0x003b (0xc005, 0xf80001c5d842,
0xfa60093ddb70, 0x)
0x003b (0xc005, 0xf80001cb8842,
0xfa600c94ab70, 0x)
0x000a (0x0080, 0x000c,
0x0001, 0xf80001cadefd)


Can you attach screenshots of BSODs? Have you reinstalled your guests or
are you running the same images you ran in 11.x?


I'll see if I can analyze minidumps later.

In addition to these there have been as many reboots that have been
only logged as 'disruptive shutdown'.

Right now I'm running the problematic guest under Xen
3.2.1-something from Debian to see if it works better.

--
Harri.



  Hello,

is there a solution for that problem? I'm experiencing the same problems ever
since I installed SBS 2008 on KVM.

I was running the host with Ubuntu 10.04 but upgraded to 10.10 - mainly because
of performance problems which were solved by the upgrade.

After the upgrade the system became extremly unstable. It was crashing as soon
as disk io and network io load was growing. 100% reproduceable with windows
server backup to an iscsi volume.

i had virtio drivers for storage and network installed (redhat/fedora 1.1.11).


Which fedora/rhel release is that?
What's the windows virtio driver version?

Have you tried using virt-manager/virhs instead of raw cmdline?
About e1000, some windows comes with buggy driver and an update e1000 
from Intel fixes some issues.




At each BSOD I had the following line in the log of the guest:

  virtio_ioport_write: unexpected address 0x13 value 0x1

I changed the network interface back to e1000. What I experience now (and I had
that a the very beginning before i switched to virtio network) are freezes. The
guest doesn't respond anymore (doesn't answer to pings and doesn't interact via
mouse/keyboard anymore). Host CPU usage of the kvm process is 100% on as many
cores as there are virtual cpus (in this case 4).

I'm a bit frustrated about this. I have 2 windows 2003 32bit, 1 windows xp and 3
linux guests (2x 32bit, 1x64 bit). They are all running without any problems
(except that the windows xp guest cannot boot without an ntldr cd image). Only
the SBS2008 guest regulary freezes.

The host system has 2 Intel Xeon 5504, Intel Chipset 5500, Adaptec Raid 5805, 24
GB DDR3 RAM.

I know there is a lack of detailed information right now. I first need to know
if anybody is working on this or has similar problems. I can deliver minidumps,
and any debugging information you need.

I don't want to give up now. We will switch to Hyper-V if we cannot solve this,
because we need a stable virtualization plattform for Windows Guests. I would
like to use KVM it is so much more flexibel.

Best regards
Manfred




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   >