Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-02-27 Thread Avi Kivity

On 02/24/2011 07:58 PM, Anthony Liguori wrote:
If you move the cdrom to a different IDE channel, you have to update 
the stateful non-config file.


Whereas if you do

   $ qemu-img create -f cd-tray -b ~/foo.img ~/foo-media-tray.img
   $ qemu -cdrom ~/foo-media-tray.img

the cd-rom tray state will be tracked in the image file.



Yeah, but how do you move it? 


There is no need to move the file at all.  Simply point the new drive at 
the media tray.


If you do a remove/add through QMP, then the config file will reflect 
things just fine.


If all access to the state file is through QMP then it becomes more 
palatable.  A bit on that later.




If you want to do it outside of QEMU, then you can just ignore the 
config file and manage all of the state yourself.  But it's never 
going to work as well (it will be racy) and you're pushing a 
tremendous amount of knowledge that ultimately belongs in QEMU (what 
state needs to persist) to something that isn't QEMU which means it's 
probably not going to be done correctly.


I know you're a big fan of the omnipotent management tool but my 
experience has been that we need to help the management tooling folks 
more by expecting less from them.


I thought that's what I'm doing by separating the state out.  It's easy 
for management to assemble configuration from their database and convert 
it into a centralized representation (like a qemu command line).  It's a 
lot harder to disassemble a central state representation and move it 
back to the database.


Using QMP is better than directly accessing the state file since qemu 
does the disassembly for you (provided the command references the device 
using its normal path, not some random key).  The file just becomes a 
way to survive a crash, and all management needs to know about is to 
make it available and back it up.  But it means that everything must be 
done via QMP, including assembly of the machine, otherwise the state 
file can become stale.


Separating the state out to the device is even easier, since management 
is already expected to take care of disk images.  All that's needed is 
to create the media tray image once, then you can forget about it 
completely.




Again the question is who is the authoritative source of the 
configuration.  Is it the management tool or is it qemu?


QEMU.   No question about it.  At any point in time, we are the 
authoritative source of what the guest's configuration is.  There's no 
doubt about it.  A management tool can try to keep up with us, but 
ultimately we are the only ones that know for sure.


We have all of this information internally.  Just persisting it is not 
a major architectural change.  It's something we should have been 
doing (arguably) from the very beginning.


That's a huge divergence from how management tools are written.  
Currently they contain the required guest configuration, a 
representation of what's the current live configuration, and they issue 
monitor commands to move the live configuration towards the required 
configuration (or just generate a qemu command line).  What you're 
describing is completely different, I'm not even sure what it is.




The management tool already has to keep track of (the optional parts 
of) the guest device tree.  It cannot start reading the stateful 
non-config file at random points in time.  So all that is left is the 
guest controlled portions of the device tree, which are pretty rare, 
and random events like live-copy migration.  I think that introducing 
a new authoritative source of information will create a lot of problems.


QEMU has always been the authoritative source.  Nothing new has been 
introduced.  We never persisted the machine's configuration which 
meant management tools had to try to aggressively keep up with us 
which is intrinsically error prone.  Fixing this will only improve 
existing management tools.


If you look at management tools, they believe they are the authoritative 
source of configuration information (not guest state, which is more or 
less ignored).




Right, but we should make it easy, not hard.


Yeah, I fail to see how this makes it hard.  We conveniently are 
saying, hey, this is all the state that needs to be persisted.  We'll 
persist it for you if you want, otherwise, we'll expose it in a 
central location.


The state-in-a-file is just a blob.  Don't expect the tool to parse it 
and reassociate the various bits to its own representation.  Exposing it 
via QMP commands is a lot better though.


If the tool wants to ignore it and guess based on various combinations 
of other commands, more power to it.

I don't see why it doesn't work.  Please explain.


1) guest eject
2) qemu posts eject event
3) qemu acknowledges eject to the guest
4) management tool sees eject event and updates guest config

There's a race between 3  4.  It can only be addressed by 
interposing 4 between 2 and 3 OR making qemu persist this state 
between 2 and 3 such that the management tool 

Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-02-27 Thread Avi Kivity

On 02/24/2011 07:45 PM, Anthony Liguori wrote:


Yeah, it feels like we're introducing QEMU level RAID. 


We already did, with sheepdog (well the RAID code is really outside qemu 
itself).


At what point are we going to add RAID5 support and not just RAID1.  
And it certainly begs the question of whether this use-case can be 
satisfied by just using Linux's RAID stack leaving one drive degraded 
and enabling the other drive when the time comes to fail over.


The magic word is btrfs, we already have atomic live copy outside qemu 
('cp --reflink').  If you also want an atomic switch you have to move it 
inside qemu so you can switch to the new file.


Using one of the Linux RAID-1 implementations (md, dm, or drbd) is also 
possible but these are really designed for block devices, not files.


--
error compiling committee.c: too many arguments to function




[Qemu-devel] Re: [PATCH v2 upstream 07/22] add assertions on the owner of a QemuMutex

2011-02-27 Thread Jan Kiszka
On 2011-02-26 16:40, Paolo Bonzini wrote:
 These are already present in the Win32 implementation, add them to
 the pthread wrappers as well.  Use PTHREAD_MUTEX_ERRORCHECK for mutex
 operations, and track the owner separately for cond_signal/broadcast.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  qemu-thread-posix.c |   23 +--
  qemu-thread-posix.h |1 +
  2 files changed, 22 insertions(+), 2 deletions(-)
 
 diff --git a/qemu-thread-posix.c b/qemu-thread-posix.c
 index e307773..a4c6e25 100644
 --- a/qemu-thread-posix.c
 +++ b/qemu-thread-posix.c
 @@ -16,9 +16,12 @@
  #include time.h
  #include signal.h
  #include stdint.h
 +#include assert.h
  #include string.h
  #include qemu-thread.h
  
 +static pthread_t pthread_null;
 +
  static void error_exit(int err, const char *msg)
  {
  fprintf(stderr, qemu: %s: %s\n, msg, strerror(err));
 @@ -28,8 +31,13 @@ static void error_exit(int err, const char *msg)
  void qemu_mutex_init(QemuMutex *mutex)
  {
  int err;
 +pthread_mutexattr_t mutexattr;
  
 -err = pthread_mutex_init(mutex-lock, NULL);
 +mutex-owner = pthread_null;
 +pthread_mutexattr_init(mutexattr);
 +pthread_mutexattr_settype(mutexattr, PTHREAD_MUTEX_ERRORCHECK);
 +err = pthread_mutex_init(mutex-lock, mutexattr);
 +pthread_mutexattr_destroy(mutexattr);
  if (err)
  error_exit(err, __func__);
  }
 @@ -48,13 +56,20 @@ void qemu_mutex_lock(QemuMutex *mutex)
  int err;
  
  err = pthread_mutex_lock(mutex-lock);
 +mutex-owner = pthread_self();
  if (err)
  error_exit(err, __func__);
  }
  
  int qemu_mutex_trylock(QemuMutex *mutex)
  {
 -return pthread_mutex_trylock(mutex-lock);
 +int err;
 +err = pthread_mutex_trylock(mutex-lock);
 +if (err == 0) {
 +mutex-owner = pthread_self();
 +}
 +
 +return !!err;
  }
  
  static void timespec_add_ms(struct timespec *ts, uint64_t msecs)
 @@ -85,6 +100,7 @@ void qemu_mutex_unlock(QemuMutex *mutex)
  {
  int err;
  
 +mutex-owner = pthread_null;
  err = pthread_mutex_unlock(mutex-lock);
  if (err)
  error_exit(err, __func__);
 @@ -130,7 +146,10 @@ void qemu_cond_wait(QemuCond *cond, QemuMutex *mutex)
  {
  int err;
  
 +assert(pthread_equal(mutex-owner, pthread_self()));
 +mutex-owner = pthread_null;
  err = pthread_cond_wait(cond-cond, mutex-lock);

Though POSIX is not 100% explicit on this, every sane pthread_cond_wait
implementation will apply the same error checking as on
pthread_mutex_unlock when the given mutex is of
PTHREAD_MUTEX_ERRORCHECK. So, this assert is actually redundant as well.

Now that we are left without any assertions, I start wondering about one
of the original missions: enforce qemu_cond_signal/broadcast to be
called under a mutex. What about extending those services with a mutex
argument and applying the assert there? Could become a static-inline
wrapper so that the argument is optimized away if assert() is inactive.

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] Re: [PATCH v2 upstream 19/22] move blocking of signals to qemu_signalfd_init

2011-02-27 Thread Jan Kiszka
On 2011-02-26 16:40, Paolo Bonzini wrote:
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  cpus.c |   87 ++-
  1 files changed, 36 insertions(+), 51 deletions(-)
 
 diff --git a/cpus.c b/cpus.c
 index 32e9352..8c440f1 100644
 --- a/cpus.c
 +++ b/cpus.c
 @@ -346,11 +346,37 @@ static void sigfd_handler(void *opaque)
  }
  }
  
 -static int qemu_signalfd_init(sigset_t mask)
 +static int qemu_signalfd_init(void)
  {
  int sigfd;
 +sigset_t set;
  
 -sigfd = qemu_signalfd(mask);
 +#ifdef CONFIG_IOTHREAD
 +/* SIGUSR2 used by posix-aio-compat.c */
 +sigemptyset(set);
 +sigaddset(set, SIGUSR2);
 +pthread_sigmask(SIG_UNBLOCK, set, NULL);

Didn't you want to rename the function for the sake of non-signalfd
blocks like above?

 +
 +sigemptyset(set);
 +sigaddset(set, SIGIO);
 +sigaddset(set, SIGALRM);
 +sigaddset(set, SIG_IPI);
 +sigaddset(set, SIGBUS);
 +pthread_sigmask(SIG_BLOCK, set, NULL);
 +#else
 +sigemptyset(set);

This line is shared and can be moved out of the #ifdef.

 +sigaddset(set, SIGBUS);
 +if (kvm_enabled()) {
 +/*
 + * We need to process timer signals synchronously to avoid a race
 + * between exit_request check and KVM vcpu entry.
 + */
 +sigaddset(set, SIGIO);
 +sigaddset(set, SIGALRM);
 +}
 +#endif
 +
 +sigfd = qemu_signalfd(set);
  if (sigfd == -1) {
  fprintf(stderr, failed to create signalfd\n);
  return -errno;
 @@ -438,6 +464,12 @@ static void qemu_event_increment(void)
  static void qemu_kvm_eat_signals(CPUState *env)
  {
  }
 +
 +static int qemu_signalfd_init(void)
 +{
 +return 0;
 +}
 +
  #endif /* _WIN32 */
  
  #ifndef CONFIG_IOTHREAD
 @@ -471,39 +503,14 @@ static void qemu_kvm_init_cpu_signals(CPUState *env)
  #endif
  }
  
 -#ifndef _WIN32
 -static sigset_t block_synchronous_signals(void)
 -{
 -sigset_t set;
 -
 -sigemptyset(set);
 -sigaddset(set, SIGBUS);
 -if (kvm_enabled()) {
 -/*
 - * We need to process timer signals synchronously to avoid a race
 - * between exit_request check and KVM vcpu entry.
 - */
 -sigaddset(set, SIGIO);
 -sigaddset(set, SIGALRM);
 -}
 -
 -return set;
 -}
 -#endif
 -
  int qemu_init_main_loop(void)
  {
 -#ifndef _WIN32
 -sigset_t blocked_signals;
  int ret;
  
 -blocked_signals = block_synchronous_signals();
 -
 -ret = qemu_signalfd_init(blocked_signals);
 +ret = qemu_signalfd_init();
  if (ret) {
  return ret;
  }
 -#endif
  
  qemu_init_sigbus();
  
 @@ -651,35 +658,13 @@ static void qemu_tcg_init_cpu_signals(void)
  pthread_sigmask(SIG_UNBLOCK, set, NULL);
  }
  
 -static sigset_t block_io_signals(void)
 -{
 -sigset_t set;
 -
 -/* SIGUSR2 used by posix-aio-compat.c */
 -sigemptyset(set);
 -sigaddset(set, SIGUSR2);
 -pthread_sigmask(SIG_UNBLOCK, set, NULL);
 -
 -sigemptyset(set);
 -sigaddset(set, SIGIO);
 -sigaddset(set, SIGALRM);
 -sigaddset(set, SIG_IPI);
 -sigaddset(set, SIGBUS);
 -pthread_sigmask(SIG_BLOCK, set, NULL);
 -
 -return set;
 -}
 -
  int qemu_init_main_loop(void)
  {
  int ret;
 -sigset_t blocked_signals;
  
  qemu_init_sigbus();
  
 -blocked_signals = block_io_signals();
 -
 -ret = qemu_signalfd_init(blocked_signals);
 +ret = qemu_signalfd_init();
  if (ret) {
  return ret;
  }

Beside the minor nits, this looks good.

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] Re: [PATCH v2 uq/master 00/22] Win32 iothread support

2011-02-27 Thread Jan Kiszka
On 2011-02-26 16:39, Paolo Bonzini wrote:
 After gathering the comments about the two series I sent separately,
 here is the full series for Win32 iothread support, ready to be
 applied to uq/master.
 
 Patches 1 to 5 are generic Win32 improvements, including the qemu-thread
 implementation.  Because of complex dependencies, I think it's better
 if this part is also routed through uq/master.

Agreed. This also ensures that the non-Win32 parts receive autotest
blessing before being applied.

 
 Patches 6 to 8 are generic threading improvements, including using
 PTHREAD_MUTEX_ERRORCHECK as suggested by Jan.
 
 Patches 9 to 17 eliminate polling, replacing condition variable
 timedwait with wait.
 
 Patch 18 removes a redundant condition from the TCG cpu_exec_all
 function.
 
 Patches 19 to 21 add all necessary stubs to make iothread compile
 with Win32, except the IPI calls.  These are provided by patch 22.
 
 Tested on Wine and Linux, not on real Windows.  The series introduces
 a dependency on Windows 2K or newer.  I don't think either 95/98/ME
 or Windows NT 3.x are reasonable host systems for QEMU, anyway.
 
 I incorporated all suggestions from Jan, including his renaming patch
 for qemu_*_is_self, and included Aurelien's sh4 tweak to cpu_halted.
 

#ifndef _WIN32
Except for the minor remarks, looks very good to me.
#else
Can't asses.
#endif

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-02-27 Thread Dor Laor

On 02/27/2011 11:10 AM, Avi Kivity wrote:

On 02/24/2011 07:58 PM, Anthony Liguori wrote:

If you move the cdrom to a different IDE channel, you have to update
the stateful non-config file.

Whereas if you do

$ qemu-img create -f cd-tray -b ~/foo.img ~/foo-media-tray.img
$ qemu -cdrom ~/foo-media-tray.img

the cd-rom tray state will be tracked in the image file.



Yeah, but how do you move it?


There is no need to move the file at all. Simply point the new drive at
the media tray.


If you do a remove/add through QMP, then the config file will reflect
things just fine.


If all access to the state file is through QMP then it becomes more
palatable. A bit on that later.



If you want to do it outside of QEMU, then you can just ignore the
config file and manage all of the state yourself. But it's never going
to work as well (it will be racy) and you're pushing a tremendous
amount of knowledge that ultimately belongs in QEMU (what state needs
to persist) to something that isn't QEMU which means it's probably not
going to be done correctly.

I know you're a big fan of the omnipotent management tool but my
experience has been that we need to help the management tooling folks
more by expecting less from them.


I thought that's what I'm doing by separating the state out. It's easy
for management to assemble configuration from their database and convert
it into a centralized representation (like a qemu command line). It's a
lot harder to disassemble a central state representation and move it
back to the database.

Using QMP is better than directly accessing the state file since qemu
does the disassembly for you (provided the command references the device
using its normal path, not some random key). The file just becomes a way
to survive a crash, and all management needs to know about is to make it
available and back it up. But it means that everything must be done via
QMP, including assembly of the machine, otherwise the state file can
become stale.

Separating the state out to the device is even easier, since management
is already expected to take care of disk images. All that's needed is to
create the media tray image once, then you can forget about it completely.



Again the question is who is the authoritative source of the
configuration. Is it the management tool or is it qemu?


QEMU. No question about it. At any point in time, we are the
authoritative source of what the guest's configuration is. There's no
doubt about it. A management tool can try to keep up with us, but
ultimately we are the only ones that know for sure.

We have all of this information internally. Just persisting it is not
a major architectural change. It's something we should have been doing
(arguably) from the very beginning.


That's a huge divergence from how management tools are written.
Currently they contain the required guest configuration, a
representation of what's the current live configuration, and they issue
monitor commands to move the live configuration towards the required
configuration (or just generate a qemu command line). What you're
describing is completely different, I'm not even sure what it is.




The management tool already has to keep track of (the optional parts
of) the guest device tree. It cannot start reading the stateful
non-config file at random points in time. So all that is left is the
guest controlled portions of the device tree, which are pretty rare,
and random events like live-copy migration. I think that introducing
a new authoritative source of information will create a lot of problems.


QEMU has always been the authoritative source. Nothing new has been
introduced. We never persisted the machine's configuration which meant
management tools had to try to aggressively keep up with us which is
intrinsically error prone. Fixing this will only improve existing
management tools.


If you look at management tools, they believe they are the authoritative
source of configuration information (not guest state, which is more or
less ignored).



Right, but we should make it easy, not hard.


Yeah, I fail to see how this makes it hard. We conveniently are
saying, hey, this is all the state that needs to be persisted. We'll
persist it for you if you want, otherwise, we'll expose it in a
central location.


The state-in-a-file is just a blob. Don't expect the tool to parse it
and reassociate the various bits to its own representation. Exposing it
via QMP commands is a lot better though.


If the tool wants to ignore it and guess based on various combinations
of other commands, more power to it.

I don't see why it doesn't work. Please explain.


1) guest eject
2) qemu posts eject event
3) qemu acknowledges eject to the guest
4) management tool sees eject event and updates guest config

There's a race between 3  4. It can only be addressed by
interposing 4 between 2 and 3 OR making qemu persist this state
between 2 and 3 such that the management tool can reliably query it.


If it is my cd-rom tray block 

Re: [Qemu-devel] Re: KVM call agenda for Jan 25

2011-02-27 Thread Stefan Hajnoczi
On Sat, Feb 26, 2011 at 9:50 PM, Dushyant Bansal
cs5070...@cse.iitd.ac.in wrote:
 Disk block size is usually 512 bytes and in qemu-img, sector size is also
 512B. And, this change would  copy n sectors even if only one of them
 actually contains data (while cp checks and copies one block(=sector) at a
 time). Therefore, it will end up writing more data than cp.

cp(1) from GNU coreutils 8.5 works in units of 32 KB on my system.  It
reads 32 KB and checks for all zeroes.  If there are all zeroes, it
seeks ahead 32 KB in the output file.  Otherwise it writes the entire
32 KB.

You can check what cp(1) is doing using strace(1).

 virtual size: 10G (10737418240 bytes)
 disk size: 569M

 convert- original
 time    0m52.522s

 convert- modified (resultant disk size: 5.3G)
 time    2m12.744s

 cp
 time    0m51.724s (same disk size)
 ---
 virtual size: 10G (10737418240 bytes)
 disk size: 3.6G

 convert- original
 time    1m52.249s

 convert- modified (resultant disk size: 7.1G)
 time    3m2.891s

 cp
 time    1m55.320s (same disk size)
 ---
 In these results, we can see that resultant disk size has increased.

If I'm reading this correctly then qemu-img convert is within a few
seconds of cp(1) for you?

Stefan



Re: [Qemu-devel] Re: [PATCH] Split machine creation from the main loop

2011-02-27 Thread Avi Kivity

On 02/24/2011 07:25 PM, Anthony Liguori wrote:
Is it really necessary?  What's blocking us from initializing 
chardevs early?



Well

We initialize all chardevs at once right now and what set of chardevs 
there are depends on the machine (by the way defaults are applied).  
You could initialize chardevs in two stages although that requires 
quite a bit of additional complexity.


We could initialize chardevs on demand - that should resolve any 
dependencies?






It would be a pity to divorce the monitor from chardevs, they're 
really flexible.


Couple considerations:

1) chardevs don't support multiple simultaneous connections.  I view 
this as a blocker for QMP.


What do you mean by that?   Something like ,server which keeps on 
listening after it a connection is established?




2) Because chardevs don't support multiple connections, we can't 
reasonably hook on things like connect/disconnect which means that 
fd's sent via SCM_RIGHTs have to be handled in a very special way.  By 
going outside of the chardev layer, we can let fd's via SCM_RIGHTS 
queue up naturally and have getfd/setfd refer to the fd at the top of 
the queue.  It makes it quite a bit easier to work with (I believe 
Daniel had actually requested this a while ago).


I really don't follow... what's the connection between SCM_RIGHTS and 
multiple connections?


3) By treating QMP as a special case, we don't have to treat chardevs 
overall as a special case.  This feels more right to me although I 
can't say I have a strong opinion formed yet.




2) Make qemu_machine_init() take no parameters and just reference 
global state.


3) Teach all QMP functions to behave themselves if called before 
qemu_machine_init()


4) Introduce QMP function to call qemu_machine_init()


An alternative is to remove all guest-visible content from 
qemu_machine_init().  So machine-init() would take no parameters and 
only build the static devices (power supply?).  Everything else would 
be hot-plugged (perhaps some would fail if the machine was started - 
cold-plug only).


All that qemu_machine_init() is is guest-visible content.  That's the 
point of refactoring this.


Sorry, poorly phrased.  Configurable guest visible content.



(6) can be started right now.  (1) comes with the QAPI merge.  (2) 
is pretty easy to do after applying this patch.  (3) is probably 
something that can be done shortly after (1).  (4) and (5) really 
require everything but (6) to be in place before we can meaningful 
do it.


I think we can lay out much of the ground work for this in 0.15 and 
I think we can have a total conversion realistically for 0.16.  That 
means that by EOY, we could invoke QEMU with no options and do 
everything through QMP.


It's something that I've agitated for a long while, but when I see 
all the work needed, I'm not sure it's cost effective.


There's a lot of secondary benefits that come from doing this.  QMP 
becomes a much stronger interface.  A lot of operations that right now 
are only specifiable by the command line become dynamic which 
mitigates reboots in the long term. 


Only the hot-pluggable ones.

It also lays the ground work for a fully decoupled device model 
whereas the only interface between the devices and the outside world 
is a subset of QMP (think seccomp()).


Whether creating a machine with no command line options is high value 
is probably irrelevant.  I think we want to go in this direction 
regardless.


I agree it's a good thing.

--
error compiling committee.c: too many arguments to function




[Qemu-devel] [Bug 725991] [NEW] cpu-exec.c:766: handle_cpu_signal: Assertion failed.

2011-02-27 Thread Marat Radchenko
Public bug reported:

qemu-user 0.14.0

Host: amd64 (Phenom X6) host, Gentoo Linux
Guest: armv6l Gentoo Linux

Full error message:
cpu-exec.c:766: handle_cpu_signal: Assertion `({ unsigned long __guest = 
(unsigned long)(address) - guest_base; __guest  (1ul  32); })' failed.

Happened in gcc process, no reliable steps to reproduce.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/725991

Title:
  cpu-exec.c:766: handle_cpu_signal: Assertion failed.

Status in QEMU:
  New

Bug description:
  qemu-user 0.14.0

  Host: amd64 (Phenom X6) host, Gentoo Linux
  Guest: armv6l Gentoo Linux

  Full error message:
  cpu-exec.c:766: handle_cpu_signal: Assertion `({ unsigned long __guest = 
(unsigned long)(address) - guest_base; __guest  (1ul  32); })' failed.

  Happened in gcc process, no reliable steps to reproduce.



[Qemu-devel] Re: [PATCH 1/3] kvm-unit-tests: add x86 port io accessors

2011-02-27 Thread Avi Kivity

On 02/24/2011 11:48 PM, Anthony Liguori wrote:

Signed-off-by: Anthony Liguorialigu...@us.ibm.com

diff --git a/lib/x86/io.h b/lib/x86/io.h
new file mode 100644
index 000..bd6341c
--- /dev/null
+++ b/lib/x86/io.h
@@ -0,0 +1,40 @@
+#ifndef IO_H
+#define IO_H
+
+static inline unsigned char inb(unsigned short port)
+{
+unsigned char value;
+asm volatile(inb %w1, %0 : =a (value) : Nd (port));
+return value;
+}


Are those %[wb] really needed?  gcc should do the right thing based on 
the argument type.


Might as well put all of that into processor.h.

--
error compiling committee.c: too many arguments to function




[Qemu-devel] Re: [PATCH 2/3] kvm-unit-tests: do not set level sensitive irq when initializing the PIC

2011-02-27 Thread Avi Kivity

On 02/24/2011 11:48 PM, Anthony Liguori wrote:

I'm not sure if this was intentional but the QEMU i8259 does not support this
flag.  I haven't observed any issues with this but I'll happily admit that
I'm not very aware of what I'm doing here.

Signed-off-by: Anthony Liguorialigu...@us.ibm.com

  static u32 xapic_read(unsigned reg)
  {
  return *(volatile u32 *)(g_apic + reg);
@@ -133,7 +129,7 @@ void ioapic_write_redir(unsigned line, ioapic_redir_entry_t 
e)
  void enable_apic(void)
  {
  printf(enabling apic\n);
-xapic_write(0xf0, 0x1ff); /* spurious vector register */
+xapic_write(0xf0, 0x1f7); /* spurious vector register */
  }


Not sure what you're doing here.  You're changing the APIC Spurious 
Vector from 0xff to 0xf7?  This has nothing to do with the i8259 or 
level triggeredness as far as I can tell - it just enables the APIC (bit 
8) and selects a vector for reporting spurious interrupts (0xff happens 
to be the reset value).


--
error compiling committee.c: too many arguments to function




[Qemu-devel] Re: [PATCH 3/3] kvm-unit-tests: make I/O more friendly to existing QEMU hardware

2011-02-27 Thread Avi Kivity

On 02/24/2011 11:48 PM, Anthony Liguori wrote:

Use the serial port for printf() and use the Bochs bios exit port if the
testdev port isn't available.

This unconditionally switches to use the serial port but tries to use the
testdev exit port since that lets you pass an exit status.


The only issue I can see is that the serial port is much slower than the 
testdev port (since we can use rep outsb there).  That only matters for 
access.flat in full output mode (not default).  So please keep the old 
code under #if 0 so we can easily switch back to it.


--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-02-27 Thread Anthony Liguori

On 02/27/2011 03:55 AM, Dor Laor wrote:
What about a simpler approach were QMP events will be written to a 
event-log-file (or even named pipe).


The management tool can just use a small daemon that does nothing other 
than write QMP events to a log.  There's no real need for this code to 
live in QEMU.


Since events are posted, even if we wrote it in QEMU, the event wouldn't 
be guaranteed on disk by the time the event invocation returns.


Regards,

Anthony Liguori




Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-02-27 Thread Anthony Liguori

On 02/27/2011 03:10 AM, Avi Kivity wrote:

On 02/24/2011 07:58 PM, Anthony Liguori wrote:
If you move the cdrom to a different IDE channel, you have to update 
the stateful non-config file.


Whereas if you do

   $ qemu-img create -f cd-tray -b ~/foo.img ~/foo-media-tray.img
   $ qemu -cdrom ~/foo-media-tray.img

the cd-rom tray state will be tracked in the image file.



Yeah, but how do you move it? 


There is no need to move the file at all.  Simply point the new drive 
at the media tray.


No, I was asking, how do you move the cdrom to a different IDE channel.  
Are you using QMP?  Are you changing the command line arguments?




If you do a remove/add through QMP, then the config file will reflect 
things just fine.


If all access to the state file is through QMP then it becomes more 
palatable.  A bit on that later.


As I think I've mentioned before, I hadn't really thought about an 
opaque state file but I'm not necessary opposed to it.  I don't see an 
obvious advantage to making it opaque but I agree it should be 
accessible via QMP.


If you want to do it outside of QEMU, then you can just ignore the 
config file and manage all of the state yourself.  But it's never 
going to work as well (it will be racy) and you're pushing a 
tremendous amount of knowledge that ultimately belongs in QEMU (what 
state needs to persist) to something that isn't QEMU which means it's 
probably not going to be done correctly.


I know you're a big fan of the omnipotent management tool but my 
experience has been that we need to help the management tooling folks 
more by expecting less from them.


I thought that's what I'm doing by separating the state out.  It's 
easy for management to assemble configuration from their database and 
convert it into a centralized representation (like a qemu command 
line).  It's a lot harder to disassemble a central state 
representation and move it back to the database.


Using QMP is better than directly accessing the state file since qemu 
does the disassembly for you (provided the command references the 
device using its normal path, not some random key).  The file just 
becomes a way to survive a crash, and all management needs to know 
about is to make it available and back it up.  But it means that 
everything must be done via QMP, including assembly of the machine, 
otherwise the state file can become stale.


Separating the state out to the device is even easier, since 
management is already expected to take care of disk images.  All 
that's needed is to create the media tray image once, then you can 
forget about it completely.


Except that instead of having one state file, we might have a dozen 
additional device state files.




Again the question is who is the authoritative source of the 
configuration.  Is it the management tool or is it qemu?


QEMU.   No question about it.  At any point in time, we are the 
authoritative source of what the guest's configuration is.  There's 
no doubt about it.  A management tool can try to keep up with us, but 
ultimately we are the only ones that know for sure.


We have all of this information internally.  Just persisting it is 
not a major architectural change.  It's something we should have been 
doing (arguably) from the very beginning.


That's a huge divergence from how management tools are written.


This is one of the reasons why management tooling around QEMU needs 
quite a bit of improving.


There is simply no way a management tool can do a good job of being an 
authoritative source of configuration.  The races we're discussion is a 
good example of why.


But beyond those races, QEMU is the only entity that knows with 
certainty what bits of information are important to persist in order to 
preserve a guest across shutdown/restart.  The fact that we've punted 
this problem for so long has only ensured that management tools are 
either intrinsically broken or only support the most minimal subset of 
functionality we actually support.


  Currently they contain the required guest configuration, a 
representation of what's the current live configuration, and they 
issue monitor commands to move the live configuration towards the 
required configuration (or just generate a qemu command line).  What 
you're describing is completely different, I'm not even sure what it is.


Management tools shouldn't have to think about how the monitor commands 
they issue impact the invocation options of QEMU.






The management tool already has to keep track of (the optional parts 
of) the guest device tree.  It cannot start reading the stateful 
non-config file at random points in time.  So all that is left is 
the guest controlled portions of the device tree, which are pretty 
rare, and random events like live-copy migration.  I think that 
introducing a new authoritative source of information will create a 
lot of problems.


QEMU has always been the authoritative source.  Nothing new has been 
introduced.  We never persisted 

Re: [Qemu-devel] Re: [PATCH 1/3] kvm-unit-tests: add x86 port io accessors

2011-02-27 Thread Anthony Liguori

On 02/27/2011 06:44 AM, Avi Kivity wrote:

On 02/24/2011 11:48 PM, Anthony Liguori wrote:

Signed-off-by: Anthony Liguorialigu...@us.ibm.com

diff --git a/lib/x86/io.h b/lib/x86/io.h
new file mode 100644
index 000..bd6341c
--- /dev/null
+++ b/lib/x86/io.h
@@ -0,0 +1,40 @@
+#ifndef IO_H
+#define IO_H
+
+static inline unsigned char inb(unsigned short port)
+{
+unsigned char value;
+asm volatile(inb %w1, %0 : =a (value) : Nd (port));
+return value;
+}


Are those %[wb] really needed?  gcc should do the right thing based on 
the argument type.


It's just a little extra type safety.



Might as well put all of that into processor.h.


Seems reasonable.

Regards,

Anthony Liguori





[Qemu-devel] [PATCH] simpletrace: Thread-safe tracing

2011-02-27 Thread Stefan Hajnoczi
Trace events outside the global mutex cannot be used with the simple
trace backend since it is not thread-safe.  There is no check to prevent
them being enabled so people sometimes learn this the hard way.

This patch restructures the simple trace backend with a ring buffer
suitable for multiple concurrent writers.  A writeout thread empties the
trace buffer when threshold fill levels are reached.  Should the
writeout thread be unable to keep up with trace generation, records will
simply be dropped.

Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 docs/tracing.txt |5 -
 simpletrace.c|  297 +++---
 simpletrace.h|8 ++
 vl.c |   16 +--
 4 files changed, 207 insertions(+), 119 deletions(-)

diff --git a/docs/tracing.txt b/docs/tracing.txt
index a6cc56f..f15069c 100644
--- a/docs/tracing.txt
+++ b/docs/tracing.txt
@@ -141,11 +141,6 @@ source tree.  It may not be as powerful as 
platform-specific or third-party
 trace backends but it is portable.  This is the recommended trace backend
 unless you have specific needs for more advanced backends.
 
-Warning: the simple backend is not thread-safe so only enable trace events
-that are executed while the global mutex is held.  Much of QEMU meets this
-requirement but some utility functions like qemu_malloc() or thread-related
-code cannot be safely traced using the simple backend.
-
  Monitor commands 
 
 * info trace
diff --git a/simpletrace.c b/simpletrace.c
index 9ea0d1f..9403ea2 100644
--- a/simpletrace.c
+++ b/simpletrace.c
@@ -12,6 +12,9 @@
 #include stdint.h
 #include stdio.h
 #include time.h
+#include signal.h
+#include pthread.h
+#include qerror.h
 #include qemu-timer.h
 #include trace.h
 
@@ -24,6 +27,9 @@
 /** Trace file version number, bump if format changes */
 #define HEADER_VERSION 0
 
+/** Trace record is valid */
+#define TRACE_RECORD_VALID ((uint64_t)1  63)
+
 /** Trace buffer entry */
 typedef struct {
 uint64_t event;
@@ -37,126 +43,130 @@ typedef struct {
 } TraceRecord;
 
 enum {
-TRACE_BUF_LEN = 64 * 1024 / sizeof(TraceRecord),
+TRACE_BUF_LEN = 4096,
+TRACE_BUF_FLUSH_THRESHOLD = TRACE_BUF_LEN / 4,
 };
 
+/*
+ * Trace records are written out by a dedicated thread.  The thread waits for
+ * records to become available, writes them out, and then waits again.
+ */
+static pthread_mutex_t trace_lock = PTHREAD_MUTEX_INITIALIZER;
+static pthread_cond_t trace_available_cond = PTHREAD_COND_INITIALIZER;
+static pthread_cond_t trace_empty_cond = PTHREAD_COND_INITIALIZER;
+static bool trace_available;
+static bool trace_writeout_enabled;
+
 static TraceRecord trace_buf[TRACE_BUF_LEN];
 static unsigned int trace_idx;
 static FILE *trace_fp;
 static char *trace_file_name = NULL;
-static bool trace_file_enabled = false;
 
-void st_print_trace_file_status(FILE *stream, int (*stream_printf)(FILE 
*stream, const char *fmt, ...))
+/**
+ * Read a trace record from the trace buffer
+ *
+ * @idx Trace buffer index
+ * @record  Trace record to fill
+ *
+ * Returns false if the record is not valid.
+ */
+static bool get_trace_record(unsigned int idx, TraceRecord *record)
 {
-stream_printf(stream, Trace file \%s\ %s.\n,
-  trace_file_name, trace_file_enabled ? on : off);
-}
+if (!(trace_buf[idx].event  TRACE_RECORD_VALID)) {
+return false;
+}
 
-static bool write_header(FILE *fp)
-{
-static const TraceRecord header = {
-.event = HEADER_EVENT_ID,
-.timestamp_ns = HEADER_MAGIC,
-.x1 = HEADER_VERSION,
-};
+__sync_synchronize(); /* read memory barrier before accessing record */
 
-return fwrite(header, sizeof header, 1, fp) == 1;
+*record = trace_buf[idx];
+record-event = ~TRACE_RECORD_VALID;
+return true;
 }
 
 /**
- * set_trace_file : To set the name of a trace file.
- * @file : pointer to the name to be set.
- * If NULL, set to the default name-pid set at config time.
+ * Kick writeout thread
+ *
+ * @waitWhether to wait for writeout thread to complete
  */
-bool st_set_trace_file(const char *file)
+static void flush_trace_file(bool wait)
 {
-st_set_trace_file_enabled(false);
+pthread_mutex_lock(trace_lock);
+trace_available = true;
+pthread_cond_signal(trace_available_cond);
 
-free(trace_file_name);
-
-if (!file) {
-if (asprintf(trace_file_name, CONFIG_TRACE_FILE, getpid())  0) {
-trace_file_name = NULL;
-return false;
-}
-} else {
-if (asprintf(trace_file_name, %s, file)  0) {
-trace_file_name = NULL;
-return false;
-}
+if (wait) {
+pthread_cond_wait(trace_empty_cond, trace_lock);
 }
 
-st_set_trace_file_enabled(true);
-return true;
+pthread_mutex_unlock(trace_lock);
 }
 
-static void flush_trace_file(void)
+static void wait_for_trace_records_available(void)
 {
-/* If the trace file is not open yet, 

[Qemu-devel] Re: [PATCH v2 upstream 07/22] add assertions on the owner of a QemuMutex

2011-02-27 Thread Paolo Bonzini

On 02/27/2011 10:33 AM, Jan Kiszka wrote:

Now that we are left without any assertions, I start wondering about one
of the original missions: enforce qemu_cond_signal/broadcast to be
called under a mutex. What about extending those services with a mutex
argument and applying the assert there?


That is one of the patches in my queue that I haven't submitted yet. :)

Paolo



[Qemu-devel] Re: [PATCH v2 upstream 19/22] move blocking of signals to qemu_signalfd_init

2011-02-27 Thread Paolo Bonzini

On 02/27/2011 10:41 AM, Jan Kiszka wrote:

  +#ifdef CONFIG_IOTHREAD
  +/* SIGUSR2 used by posix-aio-compat.c */
  +sigemptyset(set);
  +sigaddset(set, SIGUSR2);
  +pthread_sigmask(SIG_UNBLOCK,set, NULL);


Didn't you want to rename the function for the sake of non-signalfd
blocks like above?


Right.


  +
  +sigemptyset(set);
  +sigaddset(set, SIGIO);
  +sigaddset(set, SIGALRM);
  +sigaddset(set, SIG_IPI);
  +sigaddset(set, SIGBUS);
  +pthread_sigmask(SIG_BLOCK,set, NULL);
  +#else
  +sigemptyset(set);


This line is shared and can be moved out of the #ifdef.


It's shared but for different purposes (UNBLOCK for iothread, 
BLOCK/signalfd for !iothread), so I decided not to hoist it out.


Paolo



[Qemu-devel] Re: [PATCH] simpletrace: Thread-safe tracing

2011-02-27 Thread Paolo Bonzini

On 02/27/2011 03:58 PM, Stefan Hajnoczi wrote:

+ * Trace records are written out by a dedicated thread.  The thread waits for
+ * records to become available, writes them out, and then waits again.
+ */
+static pthread_mutex_t trace_lock = PTHREAD_MUTEX_INITIALIZER;
+static pthread_cond_t trace_available_cond = PTHREAD_COND_INITIALIZER;
+static pthread_cond_t trace_empty_cond = PTHREAD_COND_INITIALIZER;
+static bool trace_available;
+static bool trace_writeout_enabled;


Please use QemuThread.


+
+pthread_attr_init(attr);
+pthread_attr_setdetachstate(attr, PTHREAD_CREATE_DETACHED);

-tp = find_trace_event_by_name(tname);
-if (tp) {
-tp-state = tstate;
-return true;
+sigfillset(set);
+pthread_sigmask(SIG_SETMASK, set, oldset);
+ret = pthread_create(thread, attr, writeout_thread, NULL);
+pthread_sigmask(SIG_SETMASK, oldset, NULL);


This is also taken care of by QemuThread.

Paolo



Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-02-27 Thread Avi Kivity

On 02/27/2011 04:00 PM, Anthony Liguori wrote:

On 02/27/2011 03:10 AM, Avi Kivity wrote:

On 02/24/2011 07:58 PM, Anthony Liguori wrote:
If you move the cdrom to a different IDE channel, you have to 
update the stateful non-config file.


Whereas if you do

   $ qemu-img create -f cd-tray -b ~/foo.img ~/foo-media-tray.img
   $ qemu -cdrom ~/foo-media-tray.img

the cd-rom tray state will be tracked in the image file.



Yeah, but how do you move it? 


There is no need to move the file at all.  Simply point the new drive 
at the media tray.


No, I was asking, how do you move the cdrom to a different IDE 
channel.  Are you using QMP?  Are you changing the command line 
arguments?


Yes.

If we're doing hot-move (not really relevant to ide-cd) then you'd use 
QMP.  If you're editing a virtual machine that is down, or scheduling a 
change for the next reboot, then you're using command line arguments (or 
cold-plugging into a stopped guest).


Requiring management to remember the old configuration and issue delta 
commands to move the device for the cold-plug case is increased 
complexity IMO.






If you do a remove/add through QMP, then the config file will 
reflect things just fine.


If all access to the state file is through QMP then it becomes more 
palatable.  A bit on that later.


As I think I've mentioned before, I hadn't really thought about an 
opaque state file but I'm not necessary opposed to it.  I don't see an 
obvious advantage to making it opaque but I agree it should be 
accessible via QMP.


The advantage is that we keep the management tool talking to one 
interface (I don't think we should prevent users from interpreting it, 
just make it unnecessary).




I thought that's what I'm doing by separating the state out.  It's 
easy for management to assemble configuration from their database and 
convert it into a centralized representation (like a qemu command 
line).  It's a lot harder to disassemble a central state 
representation and move it back to the database.


Using QMP is better than directly accessing the state file since qemu 
does the disassembly for you (provided the command references the 
device using its normal path, not some random key).  The file just 
becomes a way to survive a crash, and all management needs to know 
about is to make it available and back it up.  But it means that 
everything must be done via QMP, including assembly of the machine, 
otherwise the state file can become stale.


Separating the state out to the device is even easier, since 
management is already expected to take care of disk images.  All 
that's needed is to create the media tray image once, then you can 
forget about it completely.


Except that instead of having one state file, we might have a dozen 
additional device state files.


That is fine.  We already have one state file per block device.

QEMU.   No question about it.  At any point in time, we are the 
authoritative source of what the guest's configuration is.  There's 
no doubt about it.  A management tool can try to keep up with us, 
but ultimately we are the only ones that know for sure.


We have all of this information internally.  Just persisting it is 
not a major architectural change.  It's something we should have 
been doing (arguably) from the very beginning.


That's a huge divergence from how management tools are written.


This is one of the reasons why management tooling around QEMU needs 
quite a bit of improving.


There is simply no way a management tool can do a good job of being an 
authoritative source of configuration.  The races we're discussion is 
a good example of why.


What we're discussing is not configuration.  It is non-volatile state.  
Configuration comes from the user; state comes from the guest (the 
management tool may edit state; but the guest cannot edit the 
configuration).


I agree 100% the management tool cannot be the authoritative source of 
state.


My position is:
- the management tool should be 100% in control of configuration (how 
the guest is put together from its components)
- qemu should be 100% in control of state (memory, disk state, NVRAM in 
various components, cd-rom eject state, explosive bolts for payload 
separation, self-destruct mechanism, etc.)
- the management tool should have access to state using the same 
identifiers it used to create the devices that contain the state
- it is preferable to store state in the device so that when the 
configuration changes, state is maintained (like hot-unplug of a network 
card with NVRAM followed by hot-plug of the same card).
- the angular momentum of the planet we (presumably) are on won't 
change, whatever we do [1]




But beyond those races, QEMU is the only entity that knows with 
certainty what bits of information are important to persist in order 
to preserve a guest across shutdown/restart.  The fact that we've 
punted this problem for so long has only ensured that management tools 
are either intrinsically broken or 

Re: [Qemu-devel] Re: [PATCH 1/3] kvm-unit-tests: add x86 port io accessors

2011-02-27 Thread Avi Kivity

On 02/27/2011 04:01 PM, Anthony Liguori wrote:

On 02/27/2011 06:44 AM, Avi Kivity wrote:

On 02/24/2011 11:48 PM, Anthony Liguori wrote:

Signed-off-by: Anthony Liguorialigu...@us.ibm.com

diff --git a/lib/x86/io.h b/lib/x86/io.h
new file mode 100644
index 000..bd6341c
--- /dev/null
+++ b/lib/x86/io.h
@@ -0,0 +1,40 @@
+#ifndef IO_H
+#define IO_H
+
+static inline unsigned char inb(unsigned short port)
+{
+unsigned char value;
+asm volatile(inb %w1, %0 : =a (value) : Nd (port));
+return value;
+}


Are those %[wb] really needed?  gcc should do the right thing based 
on the argument type.


It's just a little extra type safety.


Yeah, but those constraints aren't really documented.  Linux does use 
them in a handful of places so they're unlikely to go away though.


--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-02-27 Thread Dor Laor

On 02/27/2011 03:49 PM, Anthony Liguori wrote:

On 02/27/2011 03:55 AM, Dor Laor wrote:

What about a simpler approach were QMP events will be written to a
event-log-file (or even named pipe).


The management tool can just use a small daemon that does nothing other
than write QMP events to a log. There's no real need for this code to
live in QEMU.



IIUC in case the management daemon will run qemu using named pipe for 
qmp it will happen automatically. If you agree to this approach it will 
simplify the more complex config file option (although it is nice to 
have as independent option for single hosts managed by simpler mgmts)



Since events are posted, even if we wrote it in QEMU, the event wouldn't
be guaranteed on disk by the time the event invocation returns.

Regards,

Anthony Liguori






Re: [Qemu-devel] [PATCH] simpletrace: Thread-safe tracing

2011-02-27 Thread Avi Kivity

On 02/27/2011 04:58 PM, Stefan Hajnoczi wrote:

Trace events outside the global mutex cannot be used with the simple
trace backend since it is not thread-safe.  There is no check to prevent
them being enabled so people sometimes learn this the hard way.

This patch restructures the simple trace backend with a ring buffer
suitable for multiple concurrent writers.  A writeout thread empties the
trace buffer when threshold fill levels are reached.  Should the
writeout thread be unable to keep up with trace generation, records will
simply be dropped.


It would be good to have an indication of the fact that records were 
dropped in the file.


--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] Stupid question?

2011-02-27 Thread Avi Kivity

On 02/27/2011 08:22 AM, Amos Kong wrote:

On Sun, Feb 27, 2011 at 6:22 AM, Dushyant Bansal
cs5070...@cse.iitd.ac.in  wrote:
  On Sunday 27 February 2011 03:45 AM, Frans de Boer wrote:

  Hi all,

  This is the only QEMU list, so I put my question her.
  How can I copy the contents of a *.raw image to a real HD partition or
  vice versa.
  I often use a virtual image using qemu/kvm to test some OS's and want to
  migrate them to a HD partition if the time is right. Also, I like to
  preserve old partitions for future use/reference.

  Any suggestions?

  Regards, Frans.

  You can mount it using losetup.
  http://blog.piotrj.org/2009/03/mounting-raw-kvmqemu-image.html

losetup is fine, can we directly use dd ?



dd words for a raw file; qemu-img works for all qemu supported formats.

--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] Stupid question?

2011-02-27 Thread Stefan Hajnoczi
kpartx is easier than losetup for getting at individual partitions
inside a raw image.

http://linux.die.net/man/8/kpartx

Stefan



Re: [Qemu-devel] Re: [PATCH] simpletrace: Thread-safe tracing

2011-02-27 Thread Stefan Hajnoczi
On Sun, Feb 27, 2011 at 3:13 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 On 02/27/2011 03:58 PM, Stefan Hajnoczi wrote:

 + * Trace records are written out by a dedicated thread.  The thread waits
 for
 + * records to become available, writes them out, and then waits again.
 + */
 +static pthread_mutex_t trace_lock = PTHREAD_MUTEX_INITIALIZER;
 +static pthread_cond_t trace_available_cond = PTHREAD_COND_INITIALIZER;
 +static pthread_cond_t trace_empty_cond = PTHREAD_COND_INITIALIZER;
 +static bool trace_available;
 +static bool trace_writeout_enabled;

 Please use QemuThread.

The tracing code itself should use avoid core QEMU code.  Otherwise we
can't trace QemuThread - we'd have an infinite loop.

Stefan



Re: [Qemu-devel] [PATCH] simpletrace: Thread-safe tracing

2011-02-27 Thread Stefan Hajnoczi
On Sun, Feb 27, 2011 at 4:14 PM, Avi Kivity a...@redhat.com wrote:
 On 02/27/2011 04:58 PM, Stefan Hajnoczi wrote:

 Trace events outside the global mutex cannot be used with the simple
 trace backend since it is not thread-safe.  There is no check to prevent
 them being enabled so people sometimes learn this the hard way.

 This patch restructures the simple trace backend with a ring buffer
 suitable for multiple concurrent writers.  A writeout thread empties the
 trace buffer when threshold fill levels are reached.  Should the
 writeout thread be unable to keep up with trace generation, records will
 simply be dropped.

 It would be good to have an indication of the fact that records were dropped
 in the file.

Good idea.  Trace files begin with a record that has a special ID.
I'll look at extending this either by picking another special ID or by
reusing the header record.

Stefan



Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-02-27 Thread Anthony Liguori

On 02/27/2011 10:02 AM, Dor Laor wrote:

On 02/27/2011 03:49 PM, Anthony Liguori wrote:

On 02/27/2011 03:55 AM, Dor Laor wrote:

What about a simpler approach were QMP events will be written to a
event-log-file (or even named pipe).


The management tool can just use a small daemon that does nothing other
than write QMP events to a log. There's no real need for this code to
live in QEMU.



IIUC in case the management daemon will run qemu using named pipe for 
qmp it will happen automatically.


No, the event model is changing (it was always intended to change 
though).  Events will need explicit registration so it's necessary to 
have bidirectional communication.


So you can't just do -qmp foo -chardev file:event.log,id=foo.  You won't 
actually see most of the events.


Regards,

Anthony Liguori

If you agree to this approach it will simplify the more complex config 
file option (although it is nice to have as independent option for 
single hosts managed by simpler mgmts)



Since events are posted, even if we wrote it in QEMU, the event wouldn't
be guaranteed on disk by the time the event invocation returns.

Regards,

Anthony Liguori









Re: [Qemu-devel] Re: [patch 2/3] Add support for live block copy

2011-02-27 Thread Anthony Liguori

On 02/27/2011 09:31 AM, Avi Kivity wrote:

On 02/27/2011 04:00 PM, Anthony Liguori wrote:

On 02/27/2011 03:10 AM, Avi Kivity wrote:

On 02/24/2011 07:58 PM, Anthony Liguori wrote:
If you move the cdrom to a different IDE channel, you have to 
update the stateful non-config file.


Whereas if you do

   $ qemu-img create -f cd-tray -b ~/foo.img ~/foo-media-tray.img
   $ qemu -cdrom ~/foo-media-tray.img

the cd-rom tray state will be tracked in the image file.



Yeah, but how do you move it? 


There is no need to move the file at all.  Simply point the new 
drive at the media tray.


No, I was asking, how do you move the cdrom to a different IDE 
channel.  Are you using QMP?  Are you changing the command line 
arguments?


Yes.

If we're doing hot-move (not really relevant to ide-cd) then you'd use 
QMP.  If you're editing a virtual machine that is down, or scheduling 
a change for the next reboot, then you're using command line arguments 
(or cold-plugging into a stopped guest).


Requiring management to remember the old configuration and issue delta 
commands to move the device for the cold-plug case is increased 
complexity IMO.






If you do a remove/add through QMP, then the config file will 
reflect things just fine.


If all access to the state file is through QMP then it becomes more 
palatable.  A bit on that later.


As I think I've mentioned before, I hadn't really thought about an 
opaque state file but I'm not necessary opposed to it.  I don't see 
an obvious advantage to making it opaque but I agree it should be 
accessible via QMP.


The advantage is that we keep the management tool talking to one 
interface (I don't think we should prevent users from interpreting it, 
just make it unnecessary).




I thought that's what I'm doing by separating the state out.  It's 
easy for management to assemble configuration from their database 
and convert it into a centralized representation (like a qemu 
command line).  It's a lot harder to disassemble a central state 
representation and move it back to the database.


Using QMP is better than directly accessing the state file since 
qemu does the disassembly for you (provided the command references 
the device using its normal path, not some random key).  The file 
just becomes a way to survive a crash, and all management needs to 
know about is to make it available and back it up.  But it means 
that everything must be done via QMP, including assembly of the 
machine, otherwise the state file can become stale.


Separating the state out to the device is even easier, since 
management is already expected to take care of disk images.  All 
that's needed is to create the media tray image once, then you can 
forget about it completely.


Except that instead of having one state file, we might have a dozen 
additional device state files.


That is fine.  We already have one state file per block device.

QEMU.   No question about it.  At any point in time, we are the 
authoritative source of what the guest's configuration is.  There's 
no doubt about it.  A management tool can try to keep up with us, 
but ultimately we are the only ones that know for sure.


We have all of this information internally.  Just persisting it is 
not a major architectural change.  It's something we should have 
been doing (arguably) from the very beginning.


That's a huge divergence from how management tools are written.


This is one of the reasons why management tooling around QEMU needs 
quite a bit of improving.


There is simply no way a management tool can do a good job of being 
an authoritative source of configuration.  The races we're discussion 
is a good example of why.


What we're discussing is not configuration.  It is non-volatile 
state.  Configuration comes from the user; state comes from the guest 
(the management tool may edit state; but the guest cannot edit the 
configuration).


I agree 100% the management tool cannot be the authoritative source of 
state.


My position is:
- the management tool should be 100% in control of configuration (how 
the guest is put together from its components)
- qemu should be 100% in control of state (memory, disk state, NVRAM 
in various components, cd-rom eject state, explosive bolts for payload 
separation, self-destruct mechanism, etc.)


There simply is not such a clean separation between the two because 
things that the guest does affects the configuration of the guest.


Hot plug, removable media eject, persistent device settings (whether 
it's CMOS or EEPROM) all disrupt this model.


If you really wanted to have this separation, you'd have to be very 
strict about making all guest settings not be specified in config.  You 
would need to do:


qemu-img create -f e1000-eprom -o macaddr=12:23:45:67:78:90 e1000.0.rom
qemu-img create -f e1000-eprom -o macaddr=12:23:45:67:78:91 e1000.1.rom

qemu -device e1000,id=e1000.0,eeprom=e1000.0.rom -device 
e1000,id=e1000.1,eeprom=e1000.1.rom


And now I need a tool that lets 

[Qemu-devel] [PATCH 1/3] w32: Add new directory hierarchy for MinGW extensions

2011-02-27 Thread Stefan Weil
This is a first step which will reduce the number
of conditional compilations caused by MinGW.

Include files will be added to hosts/w32/include.

Signed-off-by: Stefan Weil w...@mail.berlios.de
---
 configure|1 +
 hosts/w32/README |1 +
 2 files changed, 2 insertions(+), 0 deletions(-)
 create mode 100644 hosts/w32/README

diff --git a/configure b/configure
index 3036faf..47779b6 100755
--- a/configure
+++ b/configure
@@ -474,6 +474,7 @@ if test $mingw32 = yes ; then
   QEMU_CFLAGS=-DWIN32_LEAN_AND_MEAN -DWINVER=0x501 $QEMU_CFLAGS
   # enable C99/POSIX format strings (needs mingw32-runtime 3.15 or later)
   QEMU_CFLAGS=-D__USE_MINGW_ANSI_STDIO=1 $QEMU_CFLAGS
+  QEMU_CFLAGS=-I\$(SRC_PATH)/hosts/w32/include $QEMU_CFLAGS
   LIBS=-lwinmm -lws2_32 -liberty -liphlpapi $LIBS
   prefix=c:/Program Files/Qemu
   mandir=\${prefix}
diff --git a/hosts/w32/README b/hosts/w32/README
new file mode 100644
index 000..d5aafe9
--- /dev/null
+++ b/hosts/w32/README
@@ -0,0 +1 @@
+This directory contains extensions of MinGW which are needed by QEMU.
-- 
1.7.2.3




[Qemu-devel] [PATCH 2/3] w32: Add macro timersub to sys/time.h

2011-02-27 Thread Stefan Weil
timersub is needed by the latest vnc code.

Signed-off-by: Stefan Weil w...@mail.berlios.de
---
 hosts/w32/include/sys/time.h |   24 
 1 files changed, 24 insertions(+), 0 deletions(-)
 create mode 100644 hosts/w32/include/sys/time.h

diff --git a/hosts/w32/include/sys/time.h b/hosts/w32/include/sys/time.h
new file mode 100644
index 000..94056ff
--- /dev/null
+++ b/hosts/w32/include/sys/time.h
@@ -0,0 +1,24 @@
+/*
+ * Extensions of MinGW sys/time.h
+ *
+ * Copyright (C) 2011 Stefan Weil
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include_next sys/time.h
+
+#ifndef timersub
+/* This is a copy from GNU C Library (GNU LGPL 2.1), sys/time.h. */
+# define timersub(a, b, result)   \
+  do {\
+(result)-tv_sec = (a)-tv_sec - (b)-tv_sec; \
+(result)-tv_usec = (a)-tv_usec - (b)-tv_usec;  \
+if ((result)-tv_usec  0) {  \
+  --(result)-tv_sec; \
+  (result)-tv_usec += 100;   \
+} \
+  } while (0)
+#endif
-- 
1.7.2.3




[Qemu-devel] [PATCH 3/3] osdep: Remove conditional compilation (fixes w32 compilation)

2011-02-27 Thread Stefan Weil
sys/time.h also exists for MinGW, and it is needed because
it declares struct timeval (needed by latest vnc code).

Signed-off-by: Stefan Weil w...@mail.berlios.de
---
 osdep.h |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/osdep.h b/osdep.h
index 27eedcf..5a667b4 100644
--- a/osdep.h
+++ b/osdep.h
@@ -8,9 +8,7 @@
 #include sys/signal.h
 #endif
 
-#ifndef _WIN32
 #include sys/time.h
-#endif
 
 #ifndef glue
 #define xglue(x, y) x ## y
-- 
1.7.2.3




[Qemu-devel] Re: [PATCH] simpletrace: Thread-safe tracing

2011-02-27 Thread Paolo Bonzini

On 02/27/2011 06:02 PM, Stefan Hajnoczi wrote:

On Sun, Feb 27, 2011 at 3:13 PM, Paolo Bonzinipbonz...@redhat.com  wrote:

On 02/27/2011 03:58 PM, Stefan Hajnoczi wrote:


+ * Trace records are written out by a dedicated thread.  The thread waits
for
+ * records to become available, writes them out, and then waits again.
+ */
+static pthread_mutex_t trace_lock = PTHREAD_MUTEX_INITIALIZER;
+static pthread_cond_t trace_available_cond = PTHREAD_COND_INITIALIZER;
+static pthread_cond_t trace_empty_cond = PTHREAD_COND_INITIALIZER;
+static bool trace_available;
+static bool trace_writeout_enabled;


Please use QemuThread.


The tracing code itself should use avoid core QEMU code.  Otherwise we
can't trace QemuThread - we'd have an infinite loop.


Hmm, right... they'll use stdio to trace Win32 then... :)  I was 
actually thinking more of the code duplication.


But do you really need tracing at such a low level?  I'd expect tracing 
wrappers like qemu_lock_mutex_iothread, not mutexes in general.


Paolo



Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

2011-02-27 Thread Alon Levy
On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
 On 2011-02-26 12:43, xming wrote:
  When trying to start X (and it loads qxl driver) the kvm process just 
  crashes.

This is fixed by Gerd's attached patch (taken from rhel repository, don't know
why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as well 
(separate email).

  
  qemu-kvm 0.14
  
  startup line
  
  /usr/bin/kvm -name spaceball,process=spaceball -m 1024 -kernel
  /boot/bzImage-2.6.37.2-guest -append root=/dev/vda ro -smp 1 -netdev
  type=tap,id=spaceball0,script=kvm-ifup-brloc,vhost=on -device
  virtio-net-pci,netdev=spaceball0,mac=00:16:3e:00:08:01 -drive
  file=/dev/volume01/G-spaceball,if=virtio -vga qxl -spice
  port=5957,disable-ticketing -monitor
  telnet:192.168.0.254:10007,server,nowait,nodelay -pidfile
  /var/run/kvm/spaceball.pid
  
  host is running vanilla 2.6.37.1 on amd64.
  
  Here is the bt
  
  # gdb /usr/bin/qemu-system-x86_64
  GNU gdb (Gentoo 7.2 p1) 7.2
  Copyright (C) 2010 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later 
  http://gnu.org/licenses/gpl.html
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type show copying
  and show warranty for details.
  This GDB was configured as x86_64-pc-linux-gnu.
  For bug reporting instructions, please see:
  http://bugs.gentoo.org/...
  Reading symbols from /usr/bin/qemu-system-x86_64...done.
  (gdb) set args -name spaceball,process=spaceball -m 1024 -kernel
  /boot/bzImage-2.6.37.2-guest -append root=/dev/vda ro -smp 1 -netdev
  type=tap,id=spaceball0,script=kvm-ifup-brloc,vhost=on -device
  virtio-net-pci,netdev=spaceball0,mac=00:16:3e:00:08:01 -drive
  file=/dev/volume01/G-spaceball,if=virtio -vga qxl -spice
  port=5957,disable-ticketing -monitor
  telnet:192.168.0.254:10007,server,nowait,nodelay -pidfile
  /var/run/kvm/spaceball.pid
  (gdb) run
  Starting program: /usr/bin/qemu-system-x86_64 -name
  spaceball,process=spaceball -m 1024 -kernel
  /boot/bzImage-2.6.37.2-guest -append root=/dev/vda ro -smp 1 -netdev
  type=tap,id=spaceball0,script=kvm-ifup-brloc,vhost=on -device
  virtio-net-pci,netdev=spaceball0,mac=00:16:3e:00:08:01 -drive
  file=/dev/volume01/G-spaceball,if=virtio -vga qxl -spice
  port=5957,disable-ticketing -monitor
  telnet:192.168.0.254:10007,server,nowait,nodelay -pidfile
  /var/run/kvm/spaceball.pid
  [Thread debugging using libthread_db enabled]
  do_spice_init: starting 0.6.0
  spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
  spice_server_add_interface: SPICE_INTERFACE_MOUSE
  [New Thread 0x74802710 (LWP 30294)]
  spice_server_add_interface: SPICE_INTERFACE_QXL
  [New Thread 0x7fffaacae710 (LWP 30295)]
  red_worker_main: begin
  handle_dev_destroy_surfaces:
  handle_dev_destroy_surfaces:
  handle_dev_input: start
  [New Thread 0x7fffaa4ad710 (LWP 30298)]
  [New Thread 0x7fffa9cac710 (LWP 30299)]
  [New Thread 0x7fffa94ab710 (LWP 30300)]
  [New Thread 0x7fffa8caa710 (LWP 30301)]
  [New Thread 0x7fffa3fff710 (LWP 30302)]
  [New Thread 0x7fffa37fe710 (LWP 30303)]
  [New Thread 0x7fffa2ffd710 (LWP 30304)]
  [New Thread 0x7fffa27fc710 (LWP 30305)]
  [New Thread 0x7fffa1ffb710 (LWP 30306)]
  [New Thread 0x7fffa17fa710 (LWP 30307)]
  reds_handle_main_link:
  reds_show_new_channel: channel 1:0, connected successfully, over Non Secure 
  link
  reds_main_handle_message: net test: latency 5.636000 ms, bitrate
  11027768 bps (10.516899 Mbps)
  reds_show_new_channel: channel 2:0, connected successfully, over Non Secure 
  link
  red_dispatcher_set_peer:
  handle_dev_input: connect
  handle_new_display_channel: jpeg disabled
  handle_new_display_channel: zlib-over-glz disabled
  reds_show_new_channel: channel 4:0, connected successfully, over Non Secure 
  link
  red_dispatcher_set_cursor_peer:
  handle_dev_input: cursor connect
  reds_show_new_channel: channel 3:0, connected successfully, over Non Secure 
  link
  inputs_link:
  [New Thread 0x7fffa07f8710 (LWP 30312)]
  [New Thread 0x7fff9fff7710 (LWP 30313)]
  [New Thread 0x7fff9f7f6710 (LWP 30314)]
  [New Thread 0x7fff9eff5710 (LWP 30315)]
  [New Thread 0x7fff9e7f4710 (LWP 30316)]
  [New Thread 0x7fff9dff3710 (LWP 30317)]
  [New Thread 0x7fff9d7f2710 (LWP 30318)]
  qemu-system-x86_64:
  /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1724:
  kvm_mutex_unlock: Assertion `!cpu_single_env' failed.
  
  Program received signal SIGABRT, Aborted.
  [Switching to Thread 0x74802710 (LWP 30294)]
  0x75daa165 in raise () from /lib/libc.so.6
  (gdb)
  (gdb)
  (gdb)
  (gdb)
  (gdb) bt
  #0  0x75daa165 in raise () from /lib/libc.so.6
  #1  0x75dab580 in abort () from /lib/libc.so.6
  #2  0x75da3201 in __assert_fail () from /lib/libc.so.6
  #3  0x00436f7e in kvm_mutex_unlock ()
  at 
  /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1724
  #4  qemu_mutex_unlock_iothread ()

Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

2011-02-27 Thread Jan Kiszka
On 2011-02-27 20:03, Alon Levy wrote:
 On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
 On 2011-02-26 12:43, xming wrote:
 When trying to start X (and it loads qxl driver) the kvm process just 
 crashes.
 
 This is fixed by Gerd's attached patch (taken from rhel repository, don't know
 why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as well 
 (separate email).

Patch looks OK on first glance, but the changelog is misleading: This
was broken for _both_ trees, but upstream didn't detect the bug.

My concerns regarding other side effects of juggling with global mutex
in spice code remain.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

2011-02-27 Thread Alon Levy
On Sun, Feb 27, 2011 at 08:11:26PM +0100, Jan Kiszka wrote:
 On 2011-02-27 20:03, Alon Levy wrote:
  On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
  On 2011-02-26 12:43, xming wrote:
  When trying to start X (and it loads qxl driver) the kvm process just 
  crashes.
  
  This is fixed by Gerd's attached patch (taken from rhel repository, don't 
  know
  why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as 
  well (separate email).
 
 Patch looks OK on first glance, but the changelog is misleading: This
 was broken for _both_ trees, but upstream didn't detect the bug.
 

The trees the patch commit message refers to are qemu and qemu-kvm. qemu 
doesn't even
have cpu_single_env. It didn't talk about two qemu-kvm trees.

 My concerns regarding other side effects of juggling with global mutex
 in spice code remain.

I know there used to be a mutex in spice code and during the upstreaming 
process it
got ditched in favor of the qemu global io mutex. I would have rather deferred 
this
to Gerd since he wrote this, but he is not available atm.

 
 Jan
 





[Qemu-devel] [PATCH] w32: Add support for curses

2011-02-27 Thread Stefan Weil
MinGW optionally includes pdcurses, so add support for it.

Signed-off-by: Stefan Weil w...@mail.berlios.de
---
 configure |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/configure b/configure
index 47779b6..ca15632 100755
--- a/configure
+++ b/configure
@@ -1562,7 +1562,11 @@ fi
 
 ##
 # curses probe
-curses_list=-lncurses -lcurses
+if test $mingw32 = yes ; then
+curses_list=-lpdcurses
+else
+curses_list=-lncurses -lcurses
+fi
 
 if test $curses != no ; then
   curses_found=no
-- 
1.7.2.3




Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

2011-02-27 Thread Jan Kiszka
On 2011-02-27 20:16, Alon Levy wrote:
 On Sun, Feb 27, 2011 at 08:11:26PM +0100, Jan Kiszka wrote:
 On 2011-02-27 20:03, Alon Levy wrote:
 On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
 On 2011-02-26 12:43, xming wrote:
 When trying to start X (and it loads qxl driver) the kvm process just 
 crashes.

 This is fixed by Gerd's attached patch (taken from rhel repository, don't 
 know
 why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as 
 well (separate email).

 Patch looks OK on first glance, but the changelog is misleading: This
 was broken for _both_ trees, but upstream didn't detect the bug.

 
 The trees the patch commit message refers to are qemu and qemu-kvm.

The same did I.

 qemu doesn't even have cpu_single_env.

Really? Check again. :)

 It didn't talk about two qemu-kvm trees.
 
 My concerns regarding other side effects of juggling with global mutex
 in spice code remain.
 
 I know there used to be a mutex in spice code and during the upstreaming 
 process it
 got ditched in favor of the qemu global io mutex. I would have rather 
 deferred this
 to Gerd since he wrote this, but he is not available atm.

It's not necessarily bad to drop the io mutex, but it is more tricky
than it may appear on first glance.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

2011-02-27 Thread Alon Levy
On Sun, Feb 27, 2011 at 08:27:01PM +0100, Jan Kiszka wrote:
 On 2011-02-27 20:16, Alon Levy wrote:
  On Sun, Feb 27, 2011 at 08:11:26PM +0100, Jan Kiszka wrote:
  On 2011-02-27 20:03, Alon Levy wrote:
  On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
  On 2011-02-26 12:43, xming wrote:
  When trying to start X (and it loads qxl driver) the kvm process just 
  crashes.
 
  This is fixed by Gerd's attached patch (taken from rhel repository, don't 
  know
  why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as 
  well (separate email).
 
  Patch looks OK on first glance, but the changelog is misleading: This
  was broken for _both_ trees, but upstream didn't detect the bug.
 
  
  The trees the patch commit message refers to are qemu and qemu-kvm.
 
 The same did I.
 
  qemu doesn't even have cpu_single_env.
 
 Really? Check again. :)

Sorry, grepped the wrong repo. I'll send this to qemu-devel too then.

 
  It didn't talk about two qemu-kvm trees.
  
  My concerns regarding other side effects of juggling with global mutex
  in spice code remain.
  
  I know there used to be a mutex in spice code and during the upstreaming 
  process it
  got ditched in favor of the qemu global io mutex. I would have rather 
  deferred this
  to Gerd since he wrote this, but he is not available atm.
 
 It's not necessarily bad to drop the io mutex, but it is more tricky
 than it may appear on first glance.
 
 Jan
 





Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

2011-02-27 Thread Alon Levy
On Sun, Feb 27, 2011 at 08:27:01PM +0100, Jan Kiszka wrote:
 On 2011-02-27 20:16, Alon Levy wrote:
  On Sun, Feb 27, 2011 at 08:11:26PM +0100, Jan Kiszka wrote:
  On 2011-02-27 20:03, Alon Levy wrote:
  On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
  On 2011-02-26 12:43, xming wrote:
  When trying to start X (and it loads qxl driver) the kvm process just 
  crashes.
 
  This is fixed by Gerd's attached patch (taken from rhel repository, don't 
  know
  why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as 
  well (separate email).
 
  Patch looks OK on first glance, but the changelog is misleading: This
  was broken for _both_ trees, but upstream didn't detect the bug.
 
  
  The trees the patch commit message refers to are qemu and qemu-kvm.
 
 The same did I.
 
  qemu doesn't even have cpu_single_env.
 
 Really? Check again. :)
 
  It didn't talk about two qemu-kvm trees.
  
  My concerns regarding other side effects of juggling with global mutex
  in spice code remain.
  
  I know there used to be a mutex in spice code and during the upstreaming 
  process it
  got ditched in favor of the qemu global io mutex. I would have rather 
  deferred this
  to Gerd since he wrote this, but he is not available atm.
 
 It's not necessarily bad to drop the io mutex, but it is more tricky
 than it may appear on first glance.

The problem with not dropping it is that we may be in vga mode and create
updates synthtically (i.e. qemu created and not driver created) that access the
framebuffer and need to be locked so the framebuffer isn't updated at the same
time. We drop the mutex only when we are about to call the dispatcher, which 
basically
waits on red_worker (a libspice-server thread) to do some work. red_worker may
in turn callback into qxl in qemu, which may try to acquire the lock. (the
many may's here are just reflections of the codepaths).

 
 Jan
 





[Qemu-devel] [PATCH] spice/qxl: locking fix for qemu-kvm

2011-02-27 Thread Alon Levy
From: Gerd Hoffmann kra...@redhat.com

qxl needs to release the qemu lock before calling some libspice
functions (and re-aquire it later).  In upstream qemu qxl can just
use qemu_mutex_{unlock,lock}_iothread.  In qemu-kvm this doesn't
work, qxl needs additionally save+restore the cpu_single_env pointer
on unlock+lock.

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
This patch fixes the segmentation faults reported from gentoo and fedora
on cpu_single_env asserts.

---
 hw/qxl.c   |   37 +
 ui/spice-display.c |   12 ++--
 ui/spice-display.h |6 ++
 3 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index fe4212b..117f7c8 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -125,6 +125,27 @@ static void qxl_reset_memslots(PCIQXLDevice *d);
 static void qxl_reset_surfaces(PCIQXLDevice *d);
 static void qxl_ring_set_dirty(PCIQXLDevice *qxl);
 
+/* qemu-kvm locking ... */
+void qxl_unlock_iothread(SimpleSpiceDisplay *ssd)
+{
+if (cpu_single_env) {
+assert(ssd-env == NULL);
+ssd-env = cpu_single_env;
+cpu_single_env = NULL;
+}
+qemu_mutex_unlock_iothread();
+}
+
+void qxl_lock_iothread(SimpleSpiceDisplay *ssd)
+{
+qemu_mutex_lock_iothread();
+if (ssd-env) {
+assert(cpu_single_env == NULL);
+cpu_single_env = ssd-env;
+ssd-env = NULL;
+}
+}
+
 static inline uint32_t msb_mask(uint32_t val)
 {
 uint32_t mask;
@@ -662,10 +683,10 @@ static void qxl_hard_reset(PCIQXLDevice *d, int loadvm)
 dprint(d, 1, %s: start%s\n, __FUNCTION__,
loadvm ?  (loadvm) : );
 
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(d-ssd);
 d-ssd.worker-reset_cursor(d-ssd.worker);
 d-ssd.worker-reset_image_cache(d-ssd.worker);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(d-ssd);
 qxl_reset_surfaces(d);
 qxl_reset_memslots(d);
 
@@ -795,9 +816,9 @@ static void qxl_reset_surfaces(PCIQXLDevice *d)
 {
 dprint(d, 1, %s:\n, __FUNCTION__);
 d-mode = QXL_MODE_UNDEFINED;
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(d-ssd);
 d-ssd.worker-destroy_surfaces(d-ssd.worker);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(d-ssd);
 memset(d-guest_surfaces.cmds, 0, sizeof(d-guest_surfaces.cmds));
 }
 
@@ -866,9 +887,9 @@ static void qxl_destroy_primary(PCIQXLDevice *d)
 dprint(d, 1, %s\n, __FUNCTION__);
 
 d-mode = QXL_MODE_UNDEFINED;
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(d-ssd);
 d-ssd.worker-destroy_primary_surface(d-ssd.worker, 0);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(d-ssd);
 }
 
 static void qxl_set_mode(PCIQXLDevice *d, int modenr, int loadvm)
@@ -938,10 +959,10 @@ static void ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
 case QXL_IO_UPDATE_AREA:
 {
 QXLRect update = d-ram-update_area;
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(d-ssd);
 d-ssd.worker-update_area(d-ssd.worker, d-ram-update_surface,
update, NULL, 0, 0);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(d-ssd);
 break;
 }
 case QXL_IO_NOTIFY_CMD:
diff --git a/ui/spice-display.c b/ui/spice-display.c
index 020b423..defe652 100644
--- a/ui/spice-display.c
+++ b/ui/spice-display.c
@@ -186,18 +186,18 @@ void qemu_spice_create_host_primary(SimpleSpiceDisplay 
*ssd)
 surface.mem= (intptr_t)ssd-buf;
 surface.group_id   = MEMSLOT_GROUP_HOST;
 
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(ssd);
 ssd-worker-create_primary_surface(ssd-worker, 0, surface);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(ssd);
 }
 
 void qemu_spice_destroy_host_primary(SimpleSpiceDisplay *ssd)
 {
 dprint(1, %s:\n, __FUNCTION__);
 
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(ssd);
 ssd-worker-destroy_primary_surface(ssd-worker, 0);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(ssd);
 }
 
 void qemu_spice_vm_change_state_handler(void *opaque, int running, int reason)
@@ -207,9 +207,9 @@ void qemu_spice_vm_change_state_handler(void *opaque, int 
running, int reason)
 if (running) {
 ssd-worker-start(ssd-worker);
 } else {
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(ssd);
 ssd-worker-stop(ssd-worker);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(ssd);
 }
 ssd-running = running;
 }
diff --git a/ui/spice-display.h b/ui/spice-display.h
index aef0464..df74828 100644
--- a/ui/spice-display.h
+++ b/ui/spice-display.h
@@ -43,6 +43,9 @@ typedef struct SimpleSpiceDisplay {
 QXLRect dirty;
 int notify;
 int running;
+
+/* qemu-kvm locking ... */
+void *env;
 } SimpleSpiceDisplay;
 
 typedef struct SimpleSpiceUpdate {
@@ -52,6 +55,9 @@ typedef struct SimpleSpiceUpdate {
 uint8_t *bitmap;
 } SimpleSpiceUpdate;
 
+void 

Re: [Qemu-devel] GSoC 2011 project ideas

2011-02-27 Thread Natalia Portillo
Hi there,

El 23/02/2011, a las 20:42, Luiz Capitulino escribió:

 Hi there,
 
 Google will begin accepting mentoring organizations applications next week, 
 but
 we count only with three projects so far.
 
 Although there doesn't seem to exist a hard deadline associated with the ideas
 page, nor with the number of projects, we had more than 20 projects
 suggestions last year and a number of volunteering mentors.

Is there any problem to repeat previously proposed and not chosen projects?

 So, if you have a project idea and/or wish to be a mentor, please add an
 entry here:
 
 http://wiki.qemu.org/Google_Summer_of_Code_2011
 

Regards,
Natalia Portillo


[Qemu-devel] [PATCH v2] move eeprom init from reset function to init function

2011-02-27 Thread Wen Congyang
When we hot plug network pci devices, the mac address is still set
to 00:00:00:00:00:00 on the guest machine. The reason is that the
kernel does not reset the device and we init eeprom in reset function.

move eeprom init from reset function into init function, as it is read only,
and does not need init again in reset function.

Signed-off-by: Wen Congyang we...@cn.fujitsu.com

---
 hw/pcnet.c   |   27 ++-
 hw/rtl8139.c |   24 
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/hw/pcnet.c b/hw/pcnet.c
index db52dc5..4499031 100644
--- a/hw/pcnet.c
+++ b/hw/pcnet.c
@@ -1557,19 +1557,6 @@ uint32_t pcnet_bcr_readw(PCNetState *s, uint32_t rap)
 void pcnet_h_reset(void *opaque)
 {
 PCNetState *s = opaque;
-int i;
-uint16_t checksum;
-
-/* Initialize the PROM */
-
-memcpy(s-prom, s-conf.macaddr.a, 6);
-s-prom[12] = s-prom[13] = 0x00;
-s-prom[14] = s-prom[15] = 0x57;
-
-for (i = 0,checksum = 0; i  16; i++)
-checksum += s-prom[i];
-*(uint16_t *)s-prom[12] = cpu_to_le16(checksum);
-
 
 s-bcr[BCR_MSRDA] = 0x0005;
 s-bcr[BCR_MSWRA] = 0x0005;
@@ -1736,6 +1723,20 @@ void pcnet_common_cleanup(PCNetState *d)
 
 int pcnet_common_init(DeviceState *dev, PCNetState *s, NetClientInfo *info)
 {
+int i;
+uint16_t checksum;
+
+/* Initialize the PROM */
+
+memcpy(s-prom, s-conf.macaddr.a, 6);
+s-prom[12] = s-prom[13] = 0x00;
+s-prom[14] = s-prom[15] = 0x57;
+
+for (i = 0, checksum = 0; i  16; i++) {
+checksum += s-prom[i];
+}
+*(uint16_t *)s-prom[12] = cpu_to_le16(checksum);
+
 s-poll_timer = qemu_new_timer(vm_clock, pcnet_poll_timer, s);
 
 qemu_macaddr_default_if_unset(s-conf.macaddr);
diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index a22530c..8356d5a 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -1189,18 +1189,6 @@ static void rtl8139_reset(DeviceState *d)
 
 rtl8139_update_irq(s);
 
-/* prepare eeprom */
-s-eeprom.contents[0] = 0x8129;
-#if 1
-// PCI vendor and device ID should be mirrored here
-s-eeprom.contents[1] = PCI_VENDOR_ID_REALTEK;
-s-eeprom.contents[2] = PCI_DEVICE_ID_REALTEK_8139;
-#endif
-
-s-eeprom.contents[7] = s-conf.macaddr.a[0] | s-conf.macaddr.a[1]  8;
-s-eeprom.contents[8] = s-conf.macaddr.a[2] | s-conf.macaddr.a[3]  8;
-s-eeprom.contents[9] = s-conf.macaddr.a[4] | s-conf.macaddr.a[5]  8;
-
 /* mark all status registers as owned by host */
 for (i = 0; i  4; ++i)
 {
@@ -3392,6 +3380,18 @@ static int pci_rtl8139_init(PCIDevice *dev)
 
 qemu_macaddr_default_if_unset(s-conf.macaddr);
 
+/* prepare eeprom */
+s-eeprom.contents[0] = 0x8129;
+#if 1
+/* PCI vendor and device ID should be mirrored here */
+s-eeprom.contents[1] = PCI_VENDOR_ID_REALTEK;
+s-eeprom.contents[2] = PCI_DEVICE_ID_REALTEK_8139;
+#endif
+
+s-eeprom.contents[7] = s-conf.macaddr.a[0] | s-conf.macaddr.a[1]  8;
+s-eeprom.contents[8] = s-conf.macaddr.a[2] | s-conf.macaddr.a[3]  8;
+s-eeprom.contents[9] = s-conf.macaddr.a[4] | s-conf.macaddr.a[5]  8;
+
 s-nic = qemu_new_nic(net_rtl8139_info, s-conf,
   dev-qdev.info-name, dev-qdev.id, s);
 qemu_format_nic_info_str(s-nic-nc, s-conf.macaddr.a);
-- 
1.7.1




[Qemu-devel] [PATCH] fix build errors when we enable acpi_piix4 debug

2011-02-27 Thread Wen Congyang
I enable acpi_piix4 debug, and got the following build errors:
# make
  CClibhw64/acpi_piix4.o
cc1: warnings being treated as errors
/home/wency/source/qemu/hw/acpi_piix4.c: In function ‘pm_ioport_write’:
/home/wency/source/qemu/hw/acpi_piix4.c:193: error: format ‘%04x’ expects type 
‘unsigned int’, but argument 2 has type ‘uint64_t’
/home/wency/source/qemu/hw/acpi_piix4.c:193: error: format ‘%04x’ expects type 
‘unsigned int’, but argument 3 has type ‘uint64_t’
/home/wency/source/qemu/hw/acpi_piix4.c: In function ‘pm_ioport_read’:
/home/wency/source/qemu/hw/acpi_piix4.c:219: error: format ‘%04x’ expects type 
‘unsigned int’, but argument 2 has type ‘uint64_t’
make[1]: *** [acpi_piix4.o] Error 1
make: *** [subdir-libhw64] Error 2

Signed-off-by: Wen Congyang we...@cn.fujitsu.com

---
 hw/acpi_piix4.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
index 5bbc2b5..b5a2762 100644
--- a/hw/acpi_piix4.c
+++ b/hw/acpi_piix4.c
@@ -190,7 +190,8 @@ static void pm_ioport_write(IORange *ioport, uint64_t addr, 
unsigned width,
 default:
 break;
 }
-PIIX4_DPRINTF(PM writew port=0x%04x val=0x%04x\n, addr, val);
+PIIX4_DPRINTF(PM writew port=0x%04x val=0x%04x\n, (unsigned int)addr,
+  (unsigned int)val);
 }
 
 static void pm_ioport_read(IORange *ioport, uint64_t addr, unsigned width,
@@ -216,7 +217,7 @@ static void pm_ioport_read(IORange *ioport, uint64_t addr, 
unsigned width,
 val = 0;
 break;
 }
-PIIX4_DPRINTF(PM readw port=0x%04x val=0x%04x\n, addr, val);
+PIIX4_DPRINTF(PM readw port=0x%04x val=0x%04x\n, (unsigned int)addr, 
val);
 *data = val;
 }
 
-- 
1.7.1



Re: [Qemu-devel] [PATCH] `qdev_free` when unplug a pci device

2011-02-27 Thread Wen Congyang
Hi Markus Armbruster

At 02/23/2011 04:30 PM, Markus Armbruster Write:
 Isaku Yamahata yamah...@valinux.co.jp writes:
 

snip

 
 I don't think this patch is correct.  Let me explain.
 
 Device hot unplug is *not* guaranteed to succeed.
 
 For some buses, such as USB, it always succeeds immediately, i.e. when
 the device_del monitor command finishes, the device is gone.  Live is
 good.
 
 But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
 doesn't wait for the dance to complete.  Why?  The dance can take an
 unpredictable amount of time, including forever.
 
 Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
 slot, and the unplug has not yet completed (race condition), or it
 failed.  Yes, Virginia, PCI hotplug *can* fail.
 
 When unplug succeeds, the qdev is automatically destroyed.
 pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
 does it for PCIE.

I got a similar problem.  When I unplug a pci device by hand, it works
as expected, and I can hotplug it again. But when I use a srcipt to
do the same thing, sometimes it failed. I think I may find another bug.

Steps to reproduce this bug:
1. cat ./test-e1000.sh # RHEL6RC is domain name
   #! /bin/bash

   while true; do
   virsh attach-interface RHEL6RC network default --mac 
52:54:00:1f:db:c7 --model e1000
   if [[ $? -ne 0 ]]; then
   break
   fi
   virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7
   if [[ $? -ne 0 ]]; then
   break
   fi
   sleep 5
   done

2. ./test-e1000.sh
   Interface attached successfully

   Interface detached successfully

   Interface attached successfully

   Interface detached successfully

   error: Failed to attach interface
   error: operation failed: adding 
e1000,netdev=hostnet1,id=net1,mac=52:54:00:1f:db:c7,bus=pci.0,addr=0x8 device 
failed: Duplicate ID 'net1' for device


I use ftrace to trace whether sci interrupt is passed to guest OS, here is log:
# cat trace | grep 'irq=9'
  idle-0 [000] 118342.634772: irq_handler_entry: irq=9 name=acpi
  idle-0 [000] 118342.634831: irq_handler_exit: irq=9 ret=handled
  idle-0 [000] 118342.693216: irq_handler_entry: irq=9 name=acpi
  idle-0 [000] 118342.693254: irq_handler_exit: irq=9 ret=handled
  idle-0 [000] 118347.737689: irq_handler_entry: irq=9 name=acpi
  idle-0 [000] 118347.737738: irq_handler_exit: irq=9 ret=handled
According to the log, we only receive 3 sci interrupt, and one is lost.

I enable piix4's debug and 1 line printf into piix4_device_hotplug:
printf(slot: %d, up: %d, down: %d, state: %d\n, slot, s-pci0_status.up, 
s-pci0_status.down, state);
here is the log:

PM: mapping to 0xb000
PM readw port=0x0004 val=0x
...
gpe write afe2 == 0
gpe write afe0 == 255
gpe write afe3 == 0
gpe write afe1 == 255
PM readw port=0x val=0x0001
PM readw port=0x0002 val=0x
gpe read afe0 == 0
gpe read afe2 == 0
gpe read afe1 == 0
gpe read afe3 == 0
PM writew port=0x val=0x0020
PM readw port=0x0002 val=0x
PM writew port=0x0002 val=0x0020
PM readw port=0x0002 val=0x0020
gpe write afe2 == 255
gpe write afe3 == 255
...
slot: 6, up: 0, down: 0, state: 1    we attach interface the first time
PM readw port=0x val=0x0001
PM readw port=0x0002 val=0x0120
gpe read afe0 == 2
gpe read afe2 == ff
gpe read afe2 == ff
gpe write afe2 == 253
gpe read afe1 == 0
gpe read afe3 == ff
pcihotplug read ae00 == 40
pcihotplug read ae04 == 0
...
gpe write afe0 == 2
gpe write afe2 == 255
slot: 6, up: 64, down: 0, state: 0  = we detach interface the first time
PM readw port=0x val=0x0001
PM readw port=0x0002 val=0x0120
gpe read afe0 == 2
gpe read afe2 == ff
gpe read afe2 == ff
gpe write afe2 == 253
gpe read afe1 == 0
gpe read afe3 == ff
pcihotplug read ae00 == 0
pcihotplug read ae04 == 40
...
pciej write ae08 == 64 === we will call qdev_free()
pciej write ae08 == 64
gpe write afe0 == 2
gpe write afe2 == 255
slot: 7, up: 0, down: 64, state: 1  === we attach interface the second time.
PM readw port=0x val=0x0001
PM readw port=0x0002 val=0x0120
gpe read afe0 == 2
gpe read afe2 == ff
gpe read afe2 == ff
gpe write afe2 == 253
gpe read afe1 == 0
gpe read afe3 == ff
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug read ae04 == 0
pcihotplug read ae00 == 80
pcihotplug 

Re: [Qemu-devel] Re: [PATCH] Split machine creation from the main loop

2011-02-27 Thread Anthony Liguori

On 02/27/2011 05:33 AM, Avi Kivity wrote:

On 02/24/2011 07:25 PM, Anthony Liguori wrote:
Is it really necessary?  What's blocking us from initializing 
chardevs early?



Well

We initialize all chardevs at once right now and what set of chardevs 
there are depends on the machine (by the way defaults are applied).  
You could initialize chardevs in two stages although that requires 
quite a bit of additional complexity.


We could initialize chardevs on demand - that should resolve any 
dependencies?


I think that potentially screws up the way -global works.  There's some 
deep black magic involved in how -global, defaults, and device 
initialization interact.




It would be a pity to divorce the monitor from chardevs, they're 
really flexible.


Couple considerations:

1) chardevs don't support multiple simultaneous connections.  I view 
this as a blocker for QMP.


What do you mean by that?   Something like ,server which keeps on 
listening after it a connection is established?


,server won't allow multiple simultaneous connections.  CharDriverStates 
simply don't have a connection semantic.  There can only be one thing 
connected to it at a time.  This is why we don't use CharDriverState for 
VNC.


We should have another abstraction for connection based backend.  I'll 
take a go at this when I'm ready to try to get those patches in.


Just to be clear though, there is a CharDriverState version of the new 
QMP server.  This would be a second option for creating a QMP server and 
it takes a different command line sytnax.


2) Because chardevs don't support multiple connections, we can't 
reasonably hook on things like connect/disconnect which means that 
fd's sent via SCM_RIGHTs have to be handled in a very special way.  
By going outside of the chardev layer, we can let fd's via SCM_RIGHTS 
queue up naturally and have getfd/setfd refer to the fd at the top of 
the queue.  It makes it quite a bit easier to work with (I believe 
Daniel had actually requested this a while ago).


I really don't follow... what's the connection between SCM_RIGHTS and 
multiple connections?


Monitors have a single fd.  That fd is associated with the monitor and 
lives beyond the length of the connection to the monitor (recall that 
chardevs don't have a notion of connection life cycle).  This means if a 
management tool forgets to do a closefd on an unused fd, there's no easy 
way for QEMU to automatically clean that up.  IOW, a crashed management 
tool == fd leak in QEMU.


(6) can be started right now.  (1) comes with the QAPI merge.  (2) 
is pretty easy to do after applying this patch.  (3) is probably 
something that can be done shortly after (1).  (4) and (5) really 
require everything but (6) to be in place before we can meaningful 
do it.


I think we can lay out much of the ground work for this in 0.15 and 
I think we can have a total conversion realistically for 0.16.  
That means that by EOY, we could invoke QEMU with no options and do 
everything through QMP.


It's something that I've agitated for a long while, but when I see 
all the work needed, I'm not sure it's cost effective.


There's a lot of secondary benefits that come from doing this.  QMP 
becomes a much stronger interface.  A lot of operations that right 
now are only specifiable by the command line become dynamic which 
mitigates reboots in the long term. 


Only the hot-pluggable ones.


Yup, but it forces us to treat options that cannot change at runtime as 
special cases which I think is a nice plus.  Customers don't like having 
their guests rebooted during a scheduled downtime so we really ought to 
try to have as many things tunable at runtime as possible.


Regards,

Anthony Liguori



Re: [Qemu-devel] Re: KVM call agenda for Jan 25

2011-02-27 Thread Markus Armbruster
Stefan Hajnoczi stefa...@gmail.com writes:

 On Sat, Feb 26, 2011 at 9:50 PM, Dushyant Bansal
 cs5070...@cse.iitd.ac.in wrote:
 Disk block size is usually 512 bytes and in qemu-img, sector size is also
 512B. And, this change would  copy n sectors even if only one of them
 actually contains data (while cp checks and copies one block(=sector) at a
 time). Therefore, it will end up writing more data than cp.

 cp(1) from GNU coreutils 8.5 works in units of 32 KB on my system.  It
 reads 32 KB and checks for all zeroes.  If there are all zeroes, it
 seeks ahead 32 KB in the output file.  Otherwise it writes the entire
 32 KB.

The latest version[*] of cp does better:

cp now copies sparse files efficiently on file systems with FIEMAP
support (ext4, btrfs, xfs, ocfs2).  Before, it had to read 2^20 bytes
when copying a 1MiB sparse file.  Now, it copies bytes only for the
non-sparse sections of a file.  Similarly, to induce a hole in the
output file, it had to detect a long sequence of zero bytes.  Now,
it knows precisely where each hole in an input file is, and can
reproduce them efficiently in the output file.  mv also benefits
when it resorts to copying, e.g., between file systems.

But beware of kernel bugs[**].

 You can check what cp(1) is doing using strace(1).

Often enlightening :)

[...]

[*] http://lists.gnu.org/archive/html/coreutils-announce/2011-02/msg0.html

[**] http://lwn.net/Articles/429345/
Subscribera-only, should become public soon.  In the meantime, try
http://lwn.net/Articles/429349/



Re: [Qemu-devel] [PATCH V6 3/4] qmp, nmi: convert do_inject_nmi() to QObject

2011-02-27 Thread Markus Armbruster
Anthony Liguori anth...@codemonkey.ws writes:

 On 02/25/2011 03:54 AM, Markus Armbruster wrote:
 Anthony Liguorialigu...@linux.vnet.ibm.com  writes:


 On 02/24/2011 10:20 AM, Markus Armbruster wrote:
  
 Anthony Liguorialigu...@linux.vnet.ibm.com   writes:



 On 02/24/2011 02:33 AM, Markus Armbruster wrote:

  
 Anthony Liguorianth...@codemonkey.wswrites:

 [...]

 Please describe all expected errors.


  
 Quoting qmp-commands.hx:

3. Errors, in special, are not documented. Applications should 
 NOT check
   for specific errors classes or data (it's strongly recommended 
 to only
   check for the error key)

 Indeed, not a single error is documented there.  This is intentional.



 Yeah, but we're not 0.14 anymore and for 0.15, we need to document
 errors.  If you are suggesting I send a patch to remove that section,
 I'm more than happy to.

  
 Two separate issues here: 1. Are we ready to commit to the current
 design of errors, and 2. Is it fair to reject Lai's patch now because he
 doesn't document his errors.

 I'm not commenting on 1. here.

 Regarding 2.: rejecting a patch because it doesn't document an aspect
 that current master intentionally leaves undocumented is not how you
 treat contributors.  At least not if you want any other than certified
 masochists who enjoy pain, and professionals who get adequately
 compensated for it.

 Lead by example, not by fiat.


 http://repo.or.cz/w/qemu/aliguori.git/blob/refs/heads/glib:/qmp-schema.json

 I am in the process of documenting the errors of every command.  It's
 a royal pain but I'm going to document everything we have right now.
 It's actually the last bit of work I need to finish before sending
 QAPI out.

 So for new commands being added, it would be hugely helpful for the
 authors to document the errors such that I don't have to reverse
 engineer all of the possible error conditions.
  
 The moment this lands in master, you can begin to demand error
 descriptions from contributors.  Until then, I'll NAK error descriptions
 in qmp-commands.txt.  We left them undocumented there for good reasons:


 No, it was always a bad reason.  Good documentation is necessary to
 build good commands.  Errors are a huge part of the semantics of a
 command.  We cannot properly assess a command unless it's behavior in
 error conditions is well defined.

The decision to declare QMP stable was made at the KVM Forum after quite
some deliberation.  We explictly excluded errors.  We didn't do that
because errors are not worthy of specification (that would be silly).
We did it because there was too much unhappiness about the current state
of errors to close the debate on them at that time.  Luiz mindfully
documented that decision in qmp-commands.txt:

3. Errors, in special, are not documented. Applications should NOT check
   for specific errors classes or data (it's strongly recommended to only
   check for the error key)

Nobody denies that errors need to be specified, and this clause needs to
go.  But I must insist on proper review.  No sneaking in of error
specification through the commit backdoor, please.

 Once we have an error design in place that has a reasonable hope to
 stand the test of time, and have errors documented for at least some of
 the commands here, we can start to require proper error documentation
 for new commands.  But not now.

 I won't NAK non-normative error descriptions, say in commit messages, or
 in comments.  And I won't object to you asking for them.  But I feel you
 really shouldn't make it a condition for committing patches.  Especially
 not for simple patches that have been on list for months.


 I'm strongly committed to making QMP a first class interface in QEMU
 for 0.15.  I feel documentation is a crucial part to making that
 happen.

 I'm not asking for test cases even though that's something that we'll
 need for 0.15 because there's not enough infrastructure in master yet
 to do that in a reasonable way.  I realize I'm likely going to have to
 write that test case and I'm happy to do that.

 But there's no reason that we shouldn't require thorough documentation
 for all new QMP commands moving forward including error semantics.
 This is a critical part of having a first class API and no additional
 infrastructure is needed in master to do this.

Broken record time, again.

Nobody denies that errors need to be specified.  Fair goal for 0.15.
But I must insist on proper review.  No sneaking in of error
specification through the commit backdoor, please.

I won't NAK non-normative error descriptions, say in commit messages, or
in comments.  And I won't object to you asking for them.  But I feel you
really shouldn't make it a condition for committing patches.  Especially
not for simple patches that have been on list for months.