[Qemu-devel] [Bug 1098729] Re: qemu-user-static for armhf: segfault in threaded code

2013-06-20 Thread thierry bultel
I also experimented the bug.
It may SIGSEGV or hang. Or it may work, very rarely.

But I cannot reproduce it at all if change my app to stay on a single
CPU:

int
main(int argc, char * argv[] )
{

#ifdef QEMU
cpu_set_t cpuSet;
CPU_ZERO(cpuSet);
CPU_SET(0,cpuSet);
if (sched_setaffinity(getpid(), sizeof(cpu_set_t), cpuSet) !=0)
cerr  sched_setaffinity failed  endl;
#endif /* QEMU */

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1098729

Title:
  qemu-user-static for armhf: segfault in threaded code

Status in QEMU:
  New

Bug description:
  
  Currently running QEMU from git (fedf2de31023) and running the armhf version 
of qemu-user-static which I have renamed qemu-armhf-static to follow the naming 
convention used in Debian.

  The host systems is a Debian testing x86_64-linux and I have an Debian
  testing armhf chroot which I invoke using schroot.

  Majority of program in the armhf chroot run fine, but I'm getting qemu
  segfaults in multi-threaded programs.

  As an example, I've grabbed the threads demo program here:

  https://computing.llnl.gov/tutorials/pthreads/samples/dotprod_mutex.c

  and changed NUMTHRDS from 4 to 10. I compile it as (same compile
  command on both x86_64 host and armhf guest):

  gcc -Wall -lpthread dotprod_mutex.c -o dotprod_mutex

  When compiled for x86_64 host it runs perfectly and even under
  Valgrind displays no errors whatsoever.

  However, when I compile the program in my armhs chroot and run it it
  usually (but not always) segaults or hangs or crashes. Example output:

  
  (armhf) $ ./dotprod_mutex
  Thread 1 did 10 to 20:  mysum=10.00 global 
sum=10.00
  Thread 0 did 0 to 10:  mysum=10.00 global sum=20.00
  TCG temporary leak before f6731ca0
  qemu-arm-static: 
/home/erikd/Git/qemu-posix-timer-hacking/Upstream/tcg/tcg-op.h:2371:
  tcg_gen_goto_tb: Assertion `(tcg_ctx.goto_tb_issue_mask  (1  idx)) == 
0' failed.

  
  (armhf) $ ./dotprod_mutex
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  Segmentation fault

  (armhf) $ ./dotprod_mutex
  qemu-arm-static: 
/home/erikd/Git/qemu-posix-timer-hacking/Upstream/tcg/tcg.c:519:
  tcg_temp_free_internal: Assertion `idx = s-nb_globals  idx  
s-nb_temps' failed.

  
  (armhf) $ ./dotprod_mutex
  Thread 1 did 10 to 20:  mysum=10.00 global 
sum=10.00
  qemu: uncaught target signal 11 (Segmentation fault) - core dumped
  Segmentation fault

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1098729/+subscriptions



Re: [Qemu-devel] [PATCH v2 2/6] net: introduce lock to protect NetClientState's peer's access

2013-06-20 Thread liu ping fan
On Tue, Jun 18, 2013 at 8:25 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, Jun 13, 2013 at 05:03:02PM +0800, Liu Ping Fan wrote:
 @@ -67,6 +67,10 @@ struct NetClientState {
  NetClientInfo *info;
  int link_down;
  QTAILQ_ENTRY(NetClientState) next;
 +/* protect the race access of peer only between reader and writer.
 + * to resolve the writer's race condition, resort on biglock.
 + */

 Indentation

Will fix.
 @@ -301,6 +303,38 @@ static void qemu_free_net_client(NetClientState *nc)
  }
  }

 +/* elimate the reference and sync with exit of rx/tx action.

 s/elimate/Eliminate/

Will fix
 + * And flush out peer's queue.
 + */
 +static void qemu_net_client_detach_flush(NetClientState *nc)
 +{
 +NetClientState *peer;
 +
 +/* reader of self's peer field , fixme? the deleters are not concurrent,
 + * so this pair lock can save.
 + */

 Indentation, also please resolve the fixme.

So, here can I take the assumption that the deleters are serialized by
biglock, and remove the lock following this comment?

 @@ -394,6 +433,28 @@ int qemu_can_send_packet(NetClientState *sender)
  return 1;
  }

 +int qemu_can_send_packet(NetClientState *sender)
 +{
 +int ret = 1;
 +
 +qemu_mutex_lock(sender-peer_lock);
 +if (!sender-peer) {
 +goto unlock;
 +}
 +
 +if (sender-peer-receive_disabled) {
 +ret = 0;
 +goto unlock;
 +} else if (sender-peer-info-can_receive 
 +   !sender-peer-info-can_receive(sender-peer)) {
 +ret = 0;
 +goto unlock;
 +}

 Just call qemu_can_send_packet_nolock() instead of duplicating code?

Yes.

Thx  regards,
pingfan



Re: [Qemu-devel] [PATCH v2 4/6] net: force NetQue opaque to be NetClientState

2013-06-20 Thread liu ping fan
On Tue, Jun 18, 2013 at 8:47 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, Jun 13, 2013 at 05:03:04PM +0800, Liu Ping Fan wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com

 qemu_net_client_setup() is the only user of qemu_new_net_queue(), which
 will pass in NetClientState. By forcing it be a NetClientState, we
 can ref/unref NetQueue's owner

 Please s/opaque/nc/ in net/queue.[hc] since it's no longer opaque :).

Ok, I will.
 Also, qemu_deliver_packet()/qemu_deliver_packet_iov() can take an
 NetClientState *nc instead of void *opaque.

 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com

 pingfank here and pingfanl in the From: header.  Are both okay and which
 do you prefer to use?

Change my disk, totally re-install my system, and mix up with internal
mail address. Will fix it.

Thx  regards,
Pingfan



Re: [Qemu-devel] [PATCH v2 5/6] net: defer nested call to BH

2013-06-20 Thread liu ping fan
On Tue, Jun 18, 2013 at 8:57 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, Jun 13, 2013 at 05:03:05PM +0800, Liu Ping Fan wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com

 Nested call caused by -receive() will raise issue like deadlock,
 so postphone it to BH.

 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  net/queue.c | 40 ++--
  1 file changed, 38 insertions(+), 2 deletions(-)

 Does this patch belong before the netqueue lock patch?  The commit
 history should be bisectable without temporary failures/deadlocks.

Ok.
 diff --git a/net/queue.c b/net/queue.c
 index 58222b0..9c343ab 100644
 --- a/net/queue.c
 +++ b/net/queue.c
 @@ -24,6 +24,8 @@
  #include net/queue.h
  #include qemu/queue.h
  #include net/net.h
 +#include block/aio.h
 +#include qemu/main-loop.h

  /* The delivery handler may only return zero if it will call
   * qemu_net_queue_flush() when it determines that it is once again able
 @@ -183,6 +185,22 @@ static ssize_t qemu_net_queue_deliver_iov(NetQueue 
 *queue,
  return ret;
  }

 +typedef struct NetQueBH {

 This file uses Queue consistently, please don't add Que here.

 @@ -192,8 +210,17 @@ ssize_t qemu_net_queue_send(NetQueue *queue,
  {
  ssize_t ret;

 -if (queue-delivering || !qemu_can_send_packet_nolock(sender)) {
 +if (queue-delivering || !qemu_can_send_packet_nolock(sender)
 +|| sender-send_queue-delivering) {

 Not sure this is safe, we're only holding one NetClientState-peer_lock
 and one NetQueue-lock.  How can we access both queue-delivering and
 sender-send_queue-delivering safely?

Yes, you are right, it is not safely. The queue-delivering is
protected by peer_lock and we do not take the verse direction lock .
So finally the above code can not tell out the nested calling
A--B--A  from  A--B,  B--A (where A, B stands for a
NetClientState).
What about using TLS to trace the nested calling?  With it, we can
avoid AB-BA lock problem.

Thx  regards,
Pingfan



[Qemu-devel] [PATCH v2 0/2] libqtest leak fix cleanup

2013-06-20 Thread Markus Armbruster
v2: qtest_start() function comment

Markus Armbruster (2):
  libqtest: Plug fd and memory leaks in qtest_quit()
  libqtest: New qtest_end() to go with qtest_start()

 tests/fdc-test.c|  2 +-
 tests/hd-geo-test.c |  8 
 tests/ide-test.c|  2 +-
 tests/libqtest.c|  4 
 tests/libqtest.h| 12 
 5 files changed, 22 insertions(+), 6 deletions(-)

-- 
1.7.11.7




[Qemu-devel] [PATCH v2 1/2] libqtest: Plug fd and memory leaks in qtest_quit()

2013-06-20 Thread Markus Armbruster
Reviewed-by: Anthony Liguori aligu...@us.ibm.com
Signed-off-by: Markus Armbruster arm...@redhat.com
---
 tests/libqtest.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/libqtest.c b/tests/libqtest.c
index 879ffe9..bb82069 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -171,12 +171,16 @@ void qtest_quit(QTestState *s)
 waitpid(pid, status, 0);
 }
 
+close(s-fd);
+close(s-qmp_fd);
+g_string_free(s-rx, true);
 unlink(s-pid_file);
 unlink(s-socket_path);
 unlink(s-qmp_socket_path);
 g_free(s-pid_file);
 g_free(s-socket_path);
 g_free(s-qmp_socket_path);
+g_free(s);
 }
 
 static void socket_sendf(int fd, const char *fmt, va_list ap)
-- 
1.7.11.7




[Qemu-devel] [PATCH v2 2/2] libqtest: New qtest_end() to go with qtest_start()

2013-06-20 Thread Markus Armbruster

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 tests/fdc-test.c|  2 +-
 tests/hd-geo-test.c |  8 
 tests/ide-test.c|  2 +-
 tests/libqtest.h| 12 
 4 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/tests/fdc-test.c b/tests/fdc-test.c
index 4b0301d..fd198dc 100644
--- a/tests/fdc-test.c
+++ b/tests/fdc-test.c
@@ -556,7 +556,7 @@ int main(int argc, char **argv)
 ret = g_test_run();
 
 /* Cleanup */
-qtest_quit(global_qtest);
+qtest_end();
 unlink(test_image);
 
 return ret;
diff --git a/tests/hd-geo-test.c b/tests/hd-geo-test.c
index 9a31e85..b72042e 100644
--- a/tests/hd-geo-test.c
+++ b/tests/hd-geo-test.c
@@ -244,7 +244,7 @@ static void test_ide_none(void)
 setup_common(argv, ARRAY_SIZE(argv));
 qtest_start(g_strjoinv( , argv));
 test_cmos();
-qtest_quit(global_qtest);
+qtest_end();
 }
 
 static void test_ide_mbr(bool use_device, MBRcontents mbr)
@@ -262,7 +262,7 @@ static void test_ide_mbr(bool use_device, MBRcontents mbr)
 }
 qtest_start(g_strjoinv( , argv));
 test_cmos();
-qtest_quit(global_qtest);
+qtest_end();
 }
 
 /*
@@ -334,7 +334,7 @@ static void test_ide_drive_user(const char *dev, bool trans)
 g_free(opts);
 qtest_start(g_strjoinv( , argv));
 test_cmos();
-qtest_quit(global_qtest);
+qtest_end();
 }
 
 /*
@@ -387,7 +387,7 @@ static void test_ide_drive_cd_0(void)
 }
 qtest_start(g_strjoinv( , argv));
 test_cmos();
-qtest_quit(global_qtest);
+qtest_end();
 }
 
 int main(int argc, char **argv)
diff --git a/tests/ide-test.c b/tests/ide-test.c
index 7e2eb94..7307f1d 100644
--- a/tests/ide-test.c
+++ b/tests/ide-test.c
@@ -122,7 +122,7 @@ static void ide_test_start(const char *cmdline_fmt, ...)
 
 static void ide_test_quit(void)
 {
-qtest_quit(global_qtest);
+qtest_end();
 }
 
 static QPCIDevice *get_pci_device(uint16_t *bmdma_base)
diff --git a/tests/libqtest.h b/tests/libqtest.h
index 437bda3..0f6aade 100644
--- a/tests/libqtest.h
+++ b/tests/libqtest.h
@@ -17,6 +17,7 @@
 #ifndef LIBQTEST_H
 #define LIBQTEST_H
 
+#include stddef.h
 #include stdint.h
 #include stdbool.h
 #include stdarg.h
@@ -319,6 +320,17 @@ static inline QTestState *qtest_start(const char *args)
 }
 
 /**
+ * qtest_end:
+ *
+ * Shut down the QEMU process started by qtest_start().
+ */
+static inline void qtest_end(void)
+{
+qtest_quit(global_qtest);
+global_qtest = NULL;
+}
+
+/**
  * qmp:
  * @fmt...: QMP message to send to qemu
  *
-- 
1.7.11.7




Re: [Qemu-devel] [RFC 06/13] qemu-thread: add TLS wrappers

2013-06-20 Thread Fam Zheng
On Fri, 06/14 11:48, Stefan Hajnoczi wrote:
 From: Paolo Bonzini pbonz...@redhat.com
 
 Fast TLS is not available on some platforms, but it is always nice to
 use it.  This wrapper implementation falls back to pthread_get/setspecific
 on POSIX systems that lack __thread, but uses the dynamic linker's TLS
 support on Linux and Windows.
 
 The user shall call alloc_foo() in every thread that needs to access the
 variable---exactly once and before any access.  foo is the name of the
 variable as passed to DECLARE_TLS and DEFINE_TLS.  Then, get_foo() will
 return the address of the variable.  It is guaranteed to remain the same
 across the lifetime of a thread, so you can cache it.

Would tls_alloc_foo() and tls_get_foo() be easier to read and less
possible for name conflict?

Fam



Re: [Qemu-devel] [Xen-devel] [PATCH] Remove hardcoded xen-platform device initialization

2013-06-20 Thread Paul Durrant
 -Original Message-
 From: Stefano Stabellini [mailto:stefano.stabell...@eu.citrix.com]
 Sent: 19 June 2013 17:28
 To: Paul Durrant
 Cc: Stefano Stabellini; Ian Campbell; Paolo Bonzini; xen-de...@lists.xen.org;
 qemu-devel@nongnu.org
 Subject: RE: [Qemu-devel] [Xen-devel] [PATCH] Remove hardcoded xen-
 platform device initialization
 
 On Wed, 19 Jun 2013, Paul Durrant wrote:
   -Original Message-
   From: qemu-devel-bounces+paul.durrant=citrix@nongnu.org
   [mailto:qemu-devel-bounces+paul.durrant=citrix@nongnu.org] On
   Behalf Of Stefano Stabellini
   Sent: 19 June 2013 14:53
   To: Ian Campbell
   Cc: Paolo Bonzini; Paul Durrant; xen-de...@lists.xen.org; qemu-
   de...@nongnu.org; Stefano Stabellini
   Subject: Re: [Qemu-devel] [Xen-devel] [PATCH] Remove hardcoded xen-
   platform device initialization
  
   On Wed, 19 Jun 2013, Ian Campbell wrote:
On Tue, 2013-06-18 at 19:56 +0100, Stefano Stabellini wrote:
 On Fri, 14 Jun 2013, Paul Durrant wrote:
   -Original Message-
   From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf
 Of
   Paolo
   Bonzini
   Sent: 14 June 2013 15:58
   To: Paul Durrant
   Cc: Ian Campbell; Stefano Stabellini; qemu-devel@nongnu.org;
 xen-
   de...@lists.xen.org
   Subject: Re: [Xen-devel] [PATCH] Remove hardcoded xen-
 platform
   device
   initialization
  
   Il 14/06/2013 10:11, Paul Durrant ha scritto:
I think we're still going to need -M xenpv, I think; it's quite
distinct from pc.
  
   Of course!  Even more: -M xenpv should be reused on ARM.
  
I guess we could use -M pc for HVM and gate the
accel code as you suggest but, if that's the way we're going, it
would seem more logical just to ditch the accel code for xenpv
completely (assuming we can do all we need from the machine
 init)
   and
then use -M pc -accel=xen for HVM guests going forward.
  
   There is common code between pv and fv, and that one definitely
   belongs
   in xen_init.  Most fv-only code probably should be in pc_init.  
   The
 rest
   should move to xen_init though, because it would apply just as
 well
   for
   example to Q35.  It's a bit ugly to have fv-only code there, but 
   it's
   better than having a Xen-specific machine type.  Xen/KVM/TCG
   should be
   as similar as possible at the QEMU level, any difference should be
   handled in the toolstack.
  
But that does
rather screw up my autodiscovery plans because I would not
 know,
   for
a given qemu binary, which machine type to use.
  
   There's no need for that.  4.4 can just use -M pc 
   unconditionally,
   =4.3 will just use -M xenfv unconditionally.
  
If I create a new
xenfv-2.0 machine type though I *can* do auto discovery... in
 which
case do we need the -accel=xen option at all?
  
   Yes.  Please try not do things differently from other 
   accelerators.
  
 
  Ok. I guess we can have the ability to override the machine type in
 the
   VM config, so you could still kick off an older qemu with a newer libxl -
 but it
   sounds like the auto-discovery idea is a no-go then.

 xenfv-2.0 is a bad idea, like Paolo wrote, it should be possible to 
 just
 use -M pc for HVM guests and retain -M xenpv for pv guests.

 However it seems to me that we also need a way in libxl to find out
 whether QEMU is new enough for us to be able to use -M pc.
 We can't just assume that users will be able to figure out the magic
 rune they need to write in the VM config file to solve their VM crash
 at
 boot problem.
   
What crash at boot problem?
  
   If you start QEMU as device model on Xen with the wrong machine
 option
   (for example -M pc on an old QEMU), QEMU would probably just abort
   during initialization.
  
  
 We could spawn an instance of QEMU just to figure out the QEMU
   version
 but we certainly cannot do that every time we start a new VM.
 Once we figure out the QEMU version the first time we could write it
 to
 xenstore so that the next time we don't have to go through the same
 process again.
   
Due to the device_model_override we might need to make this per-
 path.
You'd also likely need to store mtime or something in case qemu gets
upgraded, although perhaps that is getting unnecessarily picky...
  
   I think of device_model_override as an option for developers. People
   that use device_model_override can also override the QEMUMachine
   version.
 
  Are you suggesting we allow a freeform -machine option in libxl, or are you
 suggesting they point device_model_override at a script which drops the -M
 argument and inserts their new choice before invoking qemu?
 
 I am suggesting that we could have a qemu_machine_override option in
 QEMU, or maybe a 

Re: [Qemu-devel] [PATCH v3 2/2] QEMUBH: make AioContext's bh re-entrant

2013-06-20 Thread Stefan Hajnoczi
On Thu, Jun 20, 2013 at 04:59:29AM +0800, Liu Ping Fan wrote:
 BH will be used outside big lock, so introduce lock to protect
 between the writers, ie, bh's adders and deleter. The lock only
 affects the writers and bh's callback does not take this extra lock.
 Note that for the same AioContext, aio_bh_poll() can not run in
 parallel yet.
 
 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  async.c | 22 ++
  include/block/aio.h |  5 +
  2 files changed, 27 insertions(+)

qemu_bh_cancel() and qemu_bh_delete() are not modified by this patch.

It seems that calling them from a thread is a little risky because there
is no guarantee that the BH is no longer invoked after a thread calls
these functions.

I think that's worth a comment or do you want them to take the lock so
they become safe?

The other thing I'm unclear on is the -idle assignment followed
immediately by a -scheduled assignment.  Without memory barriers
aio_bh_poll() isn't guaranteed to get an ordered view of these updates:
it may see an idle BH as a regular scheduled BH because -idle is still
0.

Stefan



Re: [Qemu-devel] [PATCH v2 2/6] net: introduce lock to protect NetClientState's peer's access

2013-06-20 Thread Stefan Hajnoczi
On Thu, Jun 20, 2013 at 02:30:30PM +0800, liu ping fan wrote:
 On Tue, Jun 18, 2013 at 8:25 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
  On Thu, Jun 13, 2013 at 05:03:02PM +0800, Liu Ping Fan wrote:
  + * And flush out peer's queue.
  + */
  +static void qemu_net_client_detach_flush(NetClientState *nc)
  +{
  +NetClientState *peer;
  +
  +/* reader of self's peer field , fixme? the deleters are not 
  concurrent,
  + * so this pair lock can save.
  + */
 
  Indentation, also please resolve the fixme.
 
 So, here can I take the assumption that the deleters are serialized by
 biglock, and remove the lock following this comment?

Ah, I understand the comment now.  Is there any advantage to dropping
the lock?  IMO it's clearer to take the lock consistently instead of
optimizing cases we think only get called from the main loop.



Re: [Qemu-devel] [Xen-devel] [PATCH] Add Xen platform PCI device version 2.

2013-06-20 Thread Paul Durrant
 -Original Message-
 From: Tim Deegan [mailto:t...@xen.org]
 Sent: 19 June 2013 21:15
 To: Matt Wilson
 Cc: Alex Bligh; Paul Durrant; xen-de...@lists.xen.org; Ian Campbell; qemu-
 de...@nongnu.org
 Subject: Re: [Xen-devel] [Qemu-devel] [PATCH] Add Xen platform PCI device
 version 2.
 
 At 11:21 -0700 on 19 Jun (1371640904), Matt Wilson wrote:
  On Wed, Jun 19, 2013 at 11:42:06AM +0100, Alex Bligh wrote:
  
   --On 19 June 2013 10:13:17 + Paul Durrant
   paul.durr...@citrix.com wrote:
  
   We obviously can't say to users Are you running Windows and are you
   running PV drivers = X.Y, if so set lever A to position B, otherwise if
   you are running some other OS or an earlier version of the Windows
 PV
   driver set it to position A.
   
   Why not? The device can be chosen on a per-VM basis.
  
   Not everyone knows what guest some random user will be running
   (consider cloud platforms).
 
  I agree. If this is really the only solution, we would need to have
  both versions presented to the guest so that old drivers continue to
  work without any intervention.
 
 I suspect that if we expose both, both sets of drivers try to run the
 same PV connections, and hilarity ensues.
 

Actually I think I can make that work, and it is the conclusion I came to after 
Alex's comment. I'll create a new patch which introduces a new device, let's 
call it citrix-pv-bus or somesuch, which will have the necessary device id and 
revision and will be a dedicate device purely for the Citrix PV drivers. Then, 
if someone wants to create a VM which will be able use Citrix PV drivers they 
add this device to their config but leave all other aspects of the config 
unchanged, thus not precluding using that VM with any drivers that bind to the 
xen platform device.
If someone has a VM that has the old Citrix drivers installed, or GPLPV, I 
think I should be able to spot this and make sure that the new bus driver 
quiesces itself to prevent strangeness ensuing. If and when said previous 
drivers are un-installed then the new bus driver can wake up and enumerate the 
device nodes for the other pv drivers and Windows Update can carry on doing its 
stuff.

  Paul



Re: [Qemu-devel] [PATCH v2 5/6] net: defer nested call to BH

2013-06-20 Thread Stefan Hajnoczi
On Thu, Jun 20, 2013 at 02:30:56PM +0800, liu ping fan wrote:
 On Tue, Jun 18, 2013 at 8:57 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
  On Thu, Jun 13, 2013 at 05:03:05PM +0800, Liu Ping Fan wrote:
  From: Liu Ping Fan pingf...@linux.vnet.ibm.com
 
  Nested call caused by -receive() will raise issue like deadlock,
  so postphone it to BH.
 
  Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
  ---
   net/queue.c | 40 ++--
   1 file changed, 38 insertions(+), 2 deletions(-)
 
  Does this patch belong before the netqueue lock patch?  The commit
  history should be bisectable without temporary failures/deadlocks.
 
 Ok.
  diff --git a/net/queue.c b/net/queue.c
  index 58222b0..9c343ab 100644
  --- a/net/queue.c
  +++ b/net/queue.c
  @@ -24,6 +24,8 @@
   #include net/queue.h
   #include qemu/queue.h
   #include net/net.h
  +#include block/aio.h
  +#include qemu/main-loop.h
 
   /* The delivery handler may only return zero if it will call
* qemu_net_queue_flush() when it determines that it is once again able
  @@ -183,6 +185,22 @@ static ssize_t qemu_net_queue_deliver_iov(NetQueue 
  *queue,
   return ret;
   }
 
  +typedef struct NetQueBH {
 
  This file uses Queue consistently, please don't add Que here.
 
  @@ -192,8 +210,17 @@ ssize_t qemu_net_queue_send(NetQueue *queue,
   {
   ssize_t ret;
 
  -if (queue-delivering || !qemu_can_send_packet_nolock(sender)) {
  +if (queue-delivering || !qemu_can_send_packet_nolock(sender)
  +|| sender-send_queue-delivering) {
 
  Not sure this is safe, we're only holding one NetClientState-peer_lock
  and one NetQueue-lock.  How can we access both queue-delivering and
  sender-send_queue-delivering safely?
 
 Yes, you are right, it is not safely. The queue-delivering is
 protected by peer_lock and we do not take the verse direction lock .
 So finally the above code can not tell out the nested calling
 A--B--A  from  A--B,  B--A (where A, B stands for a
 NetClientState).
 What about using TLS to trace the nested calling?  With it, we can
 avoid AB-BA lock problem.

I would take a step back and see if there's a way to avoid reaching into
inspect sender-send_queue-delivering here.

Stefan



Re: [Qemu-devel] [PATCH v2 0/5] qcow2: Discard freed clusters

2013-06-20 Thread Stefan Hajnoczi
On Wed, Jun 19, 2013 at 01:44:16PM +0200, Kevin Wolf wrote:
 This series adds options to make qcow2 discard freed clusters, in several
 categories. By default, only freed clusters related to snapshots (i.e. mainly
 snapshot deletion) are discarded.
 
 v2:
 - Removed leftover debug code
 - Don't discard after COW (overwriting compressed clusters)
 - Changed some commas into semicolons
 
 Kevin Wolf (5):
   Revert block: Disable driver-specific options for 1.5
   qcow2: Add refcount update reason to all callers
   qcow2: Options to enable discard for freed clusters
   qcow2: Batch discards
   block: Always enable discard on the protocol level
 
  block.c  |   2 +-
  block/qcow2-cluster.c|  41 ++
  block/qcow2-refcount.c   | 136 
 +++
  block/qcow2-snapshot.c   |   6 ++-
  block/qcow2.c|  30 ++-
  block/qcow2.h|  32 +--
  blockdev.c   | 118 ++--
  tests/qemu-iotests/group |   2 +-
  8 files changed, 214 insertions(+), 153 deletions(-)
 
 -- 
 1.8.1.4
 

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan



Re: [Qemu-devel] [PATCH 0/3] qapi: Top-level type reference for command definitions

2013-06-20 Thread Stefan Hajnoczi
On Wed, Jun 19, 2013 at 06:28:04PM +0200, Kevin Wolf wrote:
 Kevin Wolf (3):
   qapi.py: Move common code to evaluate()
   qapi.py: Allow top-level type reference for command definitions
   qapi-schema: Use BlockdevSnapshot type for blockdev-snapshot-sync
 
  qapi-schema.json |  3 +--
  scripts/qapi.py  | 43 +--
  2 files changed, 30 insertions(+), 16 deletions(-)

Nice, I'll use this for drive-backup.

Thanks,
Stefan



Re: [Qemu-devel] Java volatile vs. C11 seq_cst (was Re: [PATCH v2 1/2] add a header file for atomic operations)

2013-06-20 Thread Paolo Bonzini
Il 19/06/2013 22:25, Torvald Riegel ha scritto:
 On Wed, 2013-06-19 at 17:14 +0200, Paolo Bonzini wrote:
 (1) I don't care about relaxed RMW ops (loads/stores occur in hot paths,
 but RMW shouldn't be that bad.  I don't care if reference counting is a
 little slower than it could be, for example);
 
 I doubt relaxed RMW ops are sufficient even for reference counting.

They are enough on the increment side, or so says boost...

http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/usage_examples.html#boost_atomic.usage_examples.example_reference_counters

[An aside: Java guarantees that volatile stores are not reordered
with volatile loads.  This is not guaranteed by just using release
stores and acquire stores, and is why IIUC acq_rel  Java  seq_cst].

 Or maybe Java volatile is acq for loads and seq_cst for stores...

Perhaps (but I'm not 100% sure).

 As long as you only have a producer and a consumer, C11 is fine, because
 all you need is load-acquire/store-release.  In fact, if it weren't for
 the experience factor, C11 is easier than manually placing acquire and
 release barriers.  But as soon as two or more threads are reading _and_
 writing the shared memory, it gets complicated and I want to provide
 something simple that people can use.  This is the reason for (2) above.
 
 I can't quite follow you here.  There is a total order for all
 modifications to a single variable, and if you use acq/rel combined with
 loads and stores on this variable, then you basically can make use of
 the total order.  (All loads that read-from a certain store get a
 synchronized-with (and thus happens-before edge) with the store, and the
 stores are in a total order.)  This is independent of the number of
 readers and writers.  The difference starts once you want to sync with
 more than one variable, and need to establish an order between those
 accesses.

You're right of course.  More specifically when there is a thread where
some variables are stored while others are loaded.

 There will still be a few cases that need to be optimized, and here are
 where the difficult requirements come:

 (R1) the primitives *should* not be alien to people who know Linux.

 (R2) those optimizations *must* be easy to do and review; at least as
 easy as these things go.

 The two are obviously related.  Ease of review is why it is important to
 make things familiar to people who know Linux.

 In C11, relaxing SC loads and stores is complicated, and more
 specifically hard to explain!
 
 I can't see why that would be harder than reasoning about equally weaker
 Java semantics.  But you obviously know your community, and I don't :)

Because Java semantics are almost SC, and as Paul mentioned the
difference doesn't matter in practice (IRIW/RWC is where it matters, WRC
works even on Power; see
http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ppc051.html#toc5, row
WRC+lwsyncs).  It hasn't ever mattered for Linux, at least.

 By contrast, Java volatile semantics are easily converted to a sequence
 of relaxed loads, relaxed stores, and acq/rel/sc fences.
 
 The same holds for C11/C++11.  If you look at either the standard or the
 Batty model, you'll see that for every pair like store(rel)--load(acq),
 there is also store(rel)--fence(acq)+load(relaxed),
 store(relaxed)+fence(rel)--fence(acq)+load(relaxed), etc. defined,
 giving the same semantics.  Likewise for SC.

Do you have a pointer to that?  It would help.

 You can also build Dekker with SC stores and acq loads, if I'm not
 mistaken.  Typically one would probably use SC fences and relaxed
 stores/loads.

Yes.

 I guess so.  But you also have to consider the legacy that you create.
 I do think the C11/C++11 model will used widely, and more and more
 people will used to it.

 I don't think many people will learn how to use the various non-seqcst
 modes...  At least so far I punted. :)
 
 But you already use similarly weaker orderings that the other
 abstractions provide (e.g., Java), so you're half-way there :)

True.  On the other hand you can treat Java like kinda SC but don't
worry, you won't see the difference.  It is both worrisome and appealing...

Paolo



[Qemu-devel] [PATCH v2 2/8] exec: Clean up fall back when -mem-path allocation fails

2013-06-20 Thread Markus Armbruster
With -mem-path, qemu_ram_alloc_from_ptr() first tries to allocate
accordingly, but when it fails, it falls back to normal allocation.

The fall back allocation code used to be effectively identical to the
-mem-path not given code, until it started to diverge in commit
432d268.  I believe the code still works, but clean it up anyway: drop
the special fall back allocation code, and fall back to the ordinary
-mem-path not given code instead.

Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Markus Armbruster arm...@redhat.com
---
 exec.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/exec.c b/exec.c
index b424e12..56c31a9 100644
--- a/exec.c
+++ b/exec.c
@@ -1091,15 +1091,12 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, 
void *host,
 if (mem_path) {
 #if defined (__linux__)  !defined(TARGET_S390X)
 new_block-host = file_ram_alloc(new_block, size, mem_path);
-if (!new_block-host) {
-new_block-host = qemu_anon_ram_alloc(size);
-memory_try_enable_merging(new_block-host, size);
-}
 #else
 fprintf(stderr, -mem-path option unsupported\n);
 exit(1);
 #endif
-} else {
+}
+if (!new_block-host) {
 if (kvm_enabled()) {
 /* some s390/kvm configurations have special constraints */
 new_block-host = kvm_ram_alloc(size);
-- 
1.7.11.7




[Qemu-devel] [PATCH v2 1/8] exec: Fix Xen RAM allocation with unusual options

2013-06-20 Thread Markus Armbruster
Issues:

* We try to obey -mem-path even though it can't work with Xen.

* To implement -machine mem-merge, we call
  memory_try_enable_merging(new_block-host, size).  But with Xen,
  new_block-host remains null.  Oops.

Fix by separating Xen allocation from normal allocation.

Acked-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
Signed-off-by: Markus Armbruster arm...@redhat.com
---
 exec.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index 5b8b40d..b424e12 100644
--- a/exec.c
+++ b/exec.c
@@ -1081,6 +1081,12 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
 if (host) {
 new_block-host = host;
 new_block-flags |= RAM_PREALLOC_MASK;
+} else if (xen_enabled()) {
+if (mem_path) {
+fprintf(stderr, -mem-path not supported with Xen\n);
+exit(1);
+}
+xen_ram_alloc(new_block-offset, size, mr);
 } else {
 if (mem_path) {
 #if defined (__linux__)  !defined(TARGET_S390X)
@@ -1094,9 +1100,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
 exit(1);
 #endif
 } else {
-if (xen_enabled()) {
-xen_ram_alloc(new_block-offset, size, mr);
-} else if (kvm_enabled()) {
+if (kvm_enabled()) {
 /* some s390/kvm configurations have special constraints */
 new_block-host = kvm_ram_alloc(size);
 } else {
@@ -1174,6 +1178,8 @@ void qemu_ram_free(ram_addr_t addr)
 ram_list.version++;
 if (block-flags  RAM_PREALLOC_MASK) {
 ;
+} else if (xen_enabled()) {
+xen_invalidate_map_cache_entry(block-host);
 } else if (mem_path) {
 #if defined (__linux__)  !defined(TARGET_S390X)
 if (block-fd) {
@@ -1186,11 +1192,7 @@ void qemu_ram_free(ram_addr_t addr)
 abort();
 #endif
 } else {
-if (xen_enabled()) {
-xen_invalidate_map_cache_entry(block-host);
-} else {
-qemu_anon_ram_free(block-host, block-length);
-}
+qemu_anon_ram_free(block-host, block-length);
 }
 g_free(block);
 break;
@@ -1214,6 +1216,8 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 vaddr = block-host + offset;
 if (block-flags  RAM_PREALLOC_MASK) {
 ;
+} else if (xen_enabled()) {
+abort();
 } else {
 flags = MAP_FIXED;
 munmap(vaddr, length);
-- 
1.7.11.7




[Qemu-devel] [PATCH v2 8/8] pc_sysfw: Fix ISA BIOS init for ridiculously big flash

2013-06-20 Thread Markus Armbruster
pc_isa_bios_init() suffers integer overflow for flash larger than
INT_MAX.

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 hw/block/pc_sysfw.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/hw/block/pc_sysfw.c b/hw/block/pc_sysfw.c
index 412d1b0..aebefc9 100644
--- a/hw/block/pc_sysfw.c
+++ b/hw/block/pc_sysfw.c
@@ -54,10 +54,7 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory,
 flash_size = memory_region_size(flash_mem);
 
 /* map the last 128KB of the BIOS in ISA space */
-isa_bios_size = flash_size;
-if (isa_bios_size  (128 * 1024)) {
-isa_bios_size = 128 * 1024;
-}
+isa_bios_size = MIN(flash_size, 128 * 1024);
 isa_bios = g_malloc(sizeof(*isa_bios));
 memory_region_init_ram(isa_bios, isa-bios, isa_bios_size);
 vmstate_register_ram_global(isa_bios);
-- 
1.7.11.7




[Qemu-devel] [PATCH v2 3/8] exec: Reduce ifdeffery around -mem-path

2013-06-20 Thread Markus Armbruster
Instead of spreading its ifdeffery everywhere, confine it to
qemu_ram_alloc_from_ptr().  Everywhere else, simply test block-fd,
which is non-negative exactly when block uses -mem-path.

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 exec.c | 37 ++---
 include/exec/cpu-all.h |  2 --
 2 files changed, 10 insertions(+), 29 deletions(-)

diff --git a/exec.c b/exec.c
index 56c31a9..4dbb0f1 100644
--- a/exec.c
+++ b/exec.c
@@ -1073,6 +1073,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
 
 size = TARGET_PAGE_ALIGN(size);
 new_block = g_malloc0(sizeof(*new_block));
+new_block-fd = -1;
 
 /* This assumes the iothread lock is taken here too.  */
 qemu_mutex_lock_ramlist();
@@ -1177,17 +1178,9 @@ void qemu_ram_free(ram_addr_t addr)
 ;
 } else if (xen_enabled()) {
 xen_invalidate_map_cache_entry(block-host);
-} else if (mem_path) {
-#if defined (__linux__)  !defined(TARGET_S390X)
-if (block-fd) {
-munmap(block-host, block-length);
-close(block-fd);
-} else {
-qemu_anon_ram_free(block-host, block-length);
-}
-#else
-abort();
-#endif
+} else if (block-fd = 0) {
+munmap(block-host, block-length);
+close(block-fd);
 } else {
 qemu_anon_ram_free(block-host, block-length);
 }
@@ -1218,25 +1211,15 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 } else {
 flags = MAP_FIXED;
 munmap(vaddr, length);
-if (mem_path) {
-#if defined(__linux__)  !defined(TARGET_S390X)
-if (block-fd) {
+if (block-fd = 0) {
 #ifdef MAP_POPULATE
-flags |= mem_prealloc ? MAP_POPULATE | MAP_SHARED :
-MAP_PRIVATE;
+flags |= mem_prealloc ? MAP_POPULATE | MAP_SHARED :
+MAP_PRIVATE;
 #else
-flags |= MAP_PRIVATE;
-#endif
-area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
-flags, block-fd, offset);
-} else {
-flags |= MAP_PRIVATE | MAP_ANONYMOUS;
-area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
-flags, -1, 0);
-}
-#else
-abort();
+flags |= MAP_PRIVATE;
 #endif
+area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
+flags, block-fd, offset);
 } else {
 #if defined(TARGET_S390X)  defined(CONFIG_KVM)
 flags |= MAP_SHARED | MAP_ANONYMOUS;
diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index e9c3717..c369b25 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -476,9 +476,7 @@ typedef struct RAMBlock {
  * Writes must take both locks.
  */
 QTAILQ_ENTRY(RAMBlock) next;
-#if defined(__linux__)  !defined(TARGET_S390X)
 int fd;
-#endif
 } RAMBlock;
 
 typedef struct RAMList {
-- 
1.7.11.7




[Qemu-devel] [PATCH v2 4/8] exec: Simplify the guest physical memory allocation hook

2013-06-20 Thread Markus Armbruster
Make it a generic hook rather than a KVM hook.  Less code and
ifdeffery.

Since the only user of the hook is old S390 KVM, there's hope we can
get rid of it some day.

Acked-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Markus Armbruster arm...@redhat.com
---
 exec.c  | 20 ++--
 include/exec/exec-all.h |  2 ++
 include/sysemu/kvm.h|  5 -
 kvm-all.c   | 13 -
 target-s390x/kvm.c  | 17 ++---
 5 files changed, 22 insertions(+), 35 deletions(-)

diff --git a/exec.c b/exec.c
index 4dbb0f1..c45eb33 100644
--- a/exec.c
+++ b/exec.c
@@ -685,6 +685,19 @@ typedef struct subpage_t {
 static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
  uint16_t section);
 static subpage_t *subpage_init(hwaddr base);
+
+static void *(*phys_mem_alloc)(ram_addr_t size) = qemu_anon_ram_alloc;
+
+/*
+ * Set a custom physical guest memory alloator.
+ * Accelerators with unusual needs may need this.  Hopefully, we can
+ * get rid of it eventually.
+ */
+void phys_mem_set_alloc(void *(*alloc)(ram_addr_t))
+{
+phys_mem_alloc = alloc;
+}
+
 static void destroy_page_desc(uint16_t section_index)
 {
 MemoryRegionSection *section = phys_sections[section_index];
@@ -1098,12 +,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
 #endif
 }
 if (!new_block-host) {
-if (kvm_enabled()) {
-/* some s390/kvm configurations have special constraints */
-new_block-host = kvm_ram_alloc(size);
-} else {
-new_block-host = qemu_anon_ram_alloc(size);
-}
+new_block-host = phys_mem_alloc(size);
 memory_try_enable_merging(new_block-host, size);
 }
 }
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index b2162a4..4921696 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -369,6 +369,8 @@ bool is_tcg_gen_code(uintptr_t pc_ptr);
 
 #if !defined(CONFIG_USER_ONLY)
 
+void phys_mem_set_alloc(void *(*alloc)(ram_addr_t));
+
 struct MemoryRegion *iotlb_to_region(hwaddr index);
 bool io_mem_read(struct MemoryRegion *mr, hwaddr addr,
  uint64_t *pvalue, unsigned size);
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 8b19322..e722027 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -151,11 +151,6 @@ int kvm_init_vcpu(CPUState *cpu);
 #ifdef NEED_CPU_H
 int kvm_cpu_exec(CPUArchState *env);
 
-#if !defined(CONFIG_USER_ONLY)
-void *kvm_ram_alloc(ram_addr_t size);
-void *kvm_arch_ram_alloc(ram_addr_t size);
-#endif
-
 void kvm_setup_guest_memory(void *start, size_t size);
 void kvm_flush_coalesced_mmio_buffer(void);
 
diff --git a/kvm-all.c b/kvm-all.c
index 405480e..f88c4ec 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1816,19 +1816,6 @@ int kvm_has_intx_set_mask(void)
 return kvm_state-intx_set_mask;
 }
 
-void *kvm_ram_alloc(ram_addr_t size)
-{
-#ifdef TARGET_S390X
-void *mem;
-
-mem = kvm_arch_ram_alloc(size);
-if (mem) {
-return mem;
-}
-#endif
-return qemu_anon_ram_alloc(size);
-}
-
 void kvm_setup_guest_memory(void *start, size_t size)
 {
 #ifdef CONFIG_VALGRIND_H
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 4d9ac4a..e7863d7 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -92,9 +92,15 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = {
 
 static int cap_sync_regs;
 
+static void *legacy_s390_alloc(ram_addr_t size);
+
 int kvm_arch_init(KVMState *s)
 {
 cap_sync_regs = kvm_check_extension(s, KVM_CAP_SYNC_REGS);
+if (!kvm_check_extension(s, KVM_CAP_S390_GMAP)
+|| !kvm_check_extension(s, KVM_CAP_S390_COW)) {
+phys_mem_set_alloc(legacy_s390_alloc);
+}
 return 0;
 }
 
@@ -332,17 +338,6 @@ static void *legacy_s390_alloc(ram_addr_t size)
 return mem;
 }
 
-void *kvm_arch_ram_alloc(ram_addr_t size)
-{
-/* Can we use the standard allocation ? */
-if (kvm_check_extension(kvm_state, KVM_CAP_S390_GMAP) 
-kvm_check_extension(kvm_state, KVM_CAP_S390_COW)) {
-return NULL;
-} else {
-return legacy_s390_alloc(size);
-}
-}
-
 int kvm_arch_insert_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
 {
 S390CPU *cpu = S390_CPU(cs);
-- 
1.7.11.7




[Qemu-devel] [PATCH v2 0/8] Guest memory allocation fixes cleanup

2013-06-20 Thread Markus Armbruster
All I wanted to do is exit(1) instead of abort() on guest memory
allocation failure [07/08].  But that lead me into a minor #ifdef bog,
and here's what I brought back.  Enjoy!

Testing:
* Christian Borntraeger reports v1 works fine under LPAR (new S390
  KVM, i.e. generic allocation) and as second guest under z/VM (old
  S390 KVM, i.e. legacy S390 allocation).  Thanks for testing, and for
  catching a stupid mistake.  v2 differs from v1 only in code that
  isn't reachable on S390.

Changes since v1:
* 5/8: Fix assertion in qemu_ram_remap() (Paolo)
* All other patches unchanged except for Acked-by in commit messages
Changes since RFC:
* 1-3+8/8 unchanged except for commit message tweaks
* 4+6/8 rewritten to address Paolo's review
* 5/8 rewritten: don't fix dead code, just assert it's dead
* 7/8 fix mistakes caught by Richard Henderson and Peter Maydell

Markus Armbruster (8):
  exec: Fix Xen RAM allocation with unusual options
  exec: Clean up fall back when -mem-path allocation fails
  exec: Reduce ifdeffery around -mem-path
  exec: Simplify the guest physical memory allocation hook
  exec: Drop incorrect  dead S390 code in qemu_ram_remap()
  exec: Clean up unnecessary S390 ifdeffery
  exec: Don't abort when we can't allocate guest memory
  pc_sysfw: Fix ISA BIOS init for ridiculously big flash

 exec.c  | 121 ++--
 hw/block/pc_sysfw.c |   5 +-
 include/exec/cpu-all.h  |   2 -
 include/exec/exec-all.h |   2 +
 include/sysemu/kvm.h|   5 --
 kvm-all.c   |  13 --
 target-s390x/kvm.c  |  23 +++--
 util/oslib-posix.c  |   4 +-
 util/oslib-win32.c  |   5 +-
 9 files changed, 78 insertions(+), 102 deletions(-)

-- 
1.7.11.7




[Qemu-devel] [PATCH v2 5/8] exec: Drop incorrect dead S390 code in qemu_ram_remap()

2013-06-20 Thread Markus Armbruster
Old S390 KVM wants guest RAM mapped in a peculiar way.  Commit 6b02494
implemented that.

When qemu_ram_remap() got added in commit cd19cfa, its code carefully
mimicked the allocation code: peculiar way if defined(TARGET_S390X) 
defined(CONFIG_KVM), else normal way.

For new S390 KVM, we actually want the normal way.  Commit fdec991
changed qemu_ram_alloc_from_ptr() accordingly, but forgot to update
qemu_ram_remap().  If qemu_ram_alloc_from_ptr() maps RAM the normal
way, but qemu_ram_remap() remaps it the peculiar way, remapping
changes protection and flags, which it shouldn't.

Fortunately, this can't happen, as we never remap on S390.

Replace the incorrect code with an assertion.

Thanks to Christian Borntraeger for help with assessing the bug's
(non-)impact.

Acked-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Markus Armbruster arm...@redhat.com
---
 exec.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index c45eb33..366ac6a 100644
--- a/exec.c
+++ b/exec.c
@@ -1229,15 +1229,16 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
 flags, block-fd, offset);
 } else {
-#if defined(TARGET_S390X)  defined(CONFIG_KVM)
-flags |= MAP_SHARED | MAP_ANONYMOUS;
-area = mmap(vaddr, length, PROT_EXEC|PROT_READ|PROT_WRITE,
-flags, -1, 0);
-#else
+/*
+ * Remap needs to match alloc.  Accelerators that
+ * set phys_mem_alloc never remap.  If they did,
+ * we'd need a remap hook here.
+ */
+assert(phys_mem_alloc == qemu_anon_ram_alloc);
+
 flags |= MAP_PRIVATE | MAP_ANONYMOUS;
 area = mmap(vaddr, length, PROT_READ | PROT_WRITE,
 flags, -1, 0);
-#endif
 }
 if (area != vaddr) {
 fprintf(stderr, Could not remap addr: 
-- 
1.7.11.7




[Qemu-devel] [PATCH v2 6/8] exec: Clean up unnecessary S390 ifdeffery

2013-06-20 Thread Markus Armbruster
Another issue missed in commit fdec991 is -mem-path: it needs to be
rejected only for old S390 KVM, not for any S390.  Not that I
personally care, but the ifdeffery in qemu_ram_alloc_from_ptr() annoys
me.

Note that this doesn't actually make -mem-path work, as the kernel
doesn't (yet?)  support large pages in the host for KVM guests.  Clean
it up anyway.

Thanks to Christian Borntraeger for pointing out the S390 kernel
limitations.

Signed-off-by: Markus Armbruster arm...@redhat.com
---
 exec.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/exec.c b/exec.c
index 366ac6a..bf2a7d6 100644
--- a/exec.c
+++ b/exec.c
@@ -862,7 +862,7 @@ void qemu_mutex_unlock_ramlist(void)
 qemu_mutex_unlock(ram_list.mutex);
 }
 
-#if defined(__linux__)  !defined(TARGET_S390X)
+#ifdef __linux__
 
 #include sys/vfs.h
 
@@ -965,6 +965,14 @@ static void *file_ram_alloc(RAMBlock *block,
 block-fd = fd;
 return area;
 }
+#else
+static void *file_ram_alloc(RAMBlock *block,
+ram_addr_t memory,
+const char *path)
+{
+fprintf(stderr, -mem-path not supported on this host\n);
+exit(1);
+}
 #endif
 
 static ram_addr_t find_ram_offset(ram_addr_t size)
@@ -1103,12 +,17 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, 
void *host,
 xen_ram_alloc(new_block-offset, size, mr);
 } else {
 if (mem_path) {
-#if defined (__linux__)  !defined(TARGET_S390X)
+if (phys_mem_alloc != qemu_anon_ram_alloc) {
+/*
+ * file_ram_alloc() needs to allocate just like
+ * phys_mem_alloc, but we haven't bothered to provide
+ * a hook there.
+ */
+fprintf(stderr,
+-mem-path not supported with this accelerator\n);
+exit(1);
+}
 new_block-host = file_ram_alloc(new_block, size, mem_path);
-#else
-fprintf(stderr, -mem-path option unsupported\n);
-exit(1);
-#endif
 }
 if (!new_block-host) {
 new_block-host = phys_mem_alloc(size);
-- 
1.7.11.7




[Qemu-devel] [PATCH v2 7/8] exec: Don't abort when we can't allocate guest memory

2013-06-20 Thread Markus Armbruster
We abort() on memory allocation failure.  abort() is appropriate for
programming errors.  Maybe most memory allocation failures are
programming errors, maybe not.  But guest memory allocation failure
isn't, and aborting when the user asks for more memory than we can
provide is not nice.  exit(1) instead, and do it in just one place, so
the error message is consistent.

Tested-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Markus Armbruster arm...@redhat.com
---
 exec.c | 5 +
 target-s390x/kvm.c | 6 +-
 util/oslib-posix.c | 4 +---
 util/oslib-win32.c | 5 +
 4 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/exec.c b/exec.c
index bf2a7d6..3f7fe29 100644
--- a/exec.c
+++ b/exec.c
@@ -1125,6 +1125,11 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void 
*host,
 }
 if (!new_block-host) {
 new_block-host = phys_mem_alloc(size);
+if (!new_block-host) {
+fprintf(stderr, Cannot set up guest memory '%s': %s\n,
+new_block-mr-name, strerror(errno));
+exit(1);
+}
 memory_try_enable_merging(new_block-host, size);
 }
 }
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index e7863d7..b1ffcea 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -331,11 +331,7 @@ static void *legacy_s390_alloc(ram_addr_t size)
 mem = mmap((void *) 0x8ULL, size,
PROT_EXEC|PROT_READ|PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
-if (mem == MAP_FAILED) {
-fprintf(stderr, Allocating RAM failed\n);
-abort();
-}
-return mem;
+return mem == MAP_FAILED ? NULL : mem;
 }
 
 int kvm_arch_insert_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 3dc8b1b..253bc3d 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -112,9 +112,7 @@ void *qemu_anon_ram_alloc(size_t size)
 size_t offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
 
 if (ptr == MAP_FAILED) {
-fprintf(stderr, Failed to allocate %zu B: %s\n,
-size, strerror(errno));
-abort();
+return NULL;
 }
 
 ptr += offset;
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index 961fbf5..983b7a2 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -65,10 +65,7 @@ void *qemu_anon_ram_alloc(size_t size)
 /* FIXME: this is not exactly optimal solution since VirtualAlloc
has 64Kb granularity, but at least it guarantees us that the
memory is page aligned. */
-if (!size) {
-abort();
-}
-ptr = qemu_oom_check(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE));
+ptr = VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE);
 trace_qemu_anon_ram_alloc(size, ptr);
 return ptr;
 }
-- 
1.7.11.7




Re: [Qemu-devel] [PATCH v3 1/2] pvpanic: initialization cleanup

2013-06-20 Thread Laszlo Ersek
On 06/19/13 17:02, Michael S. Tsirkin wrote:
 Avoid use of static variables: PC systems
 initialize pvpanic device through pvpanic_init,
 so we can simply create the fw_cfg file at that point.
 This also makes it possible to skip device
 creation completely if fw_cfg is not there, e.g. for xen -
 so the ports it reserves are not discoverable by guests.
 
 Also, make pvpanic_init void since callers ignore return
 status anyway.
 
 Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com
 Cc: Laszlo Ersek ler...@redhat.com
 Cc: Paul Durrant paul.durr...@citrix.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 Chanes from v2:
 skip device creation completely if !fw_cfg
 make pvpanic_init void
 Changes from v1:
 don't assert if !fw_cfg
 
 
  hw/misc/pvpanic.c| 30 --
  include/hw/i386/pc.h |  2 +-
  2 files changed, 17 insertions(+), 15 deletions(-)
 
 diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c
 index 060099b..83ed226 100644
 --- a/hw/misc/pvpanic.c
 +++ b/hw/misc/pvpanic.c
 @@ -97,26 +97,28 @@ static void pvpanic_isa_realizefn(DeviceState *dev, Error 
 **errp)
  {
  ISADevice *d = ISA_DEVICE(dev);
  PVPanicState *s = ISA_PVPANIC_DEVICE(dev);
 -static bool port_configured;
 -FWCfgState *fw_cfg;
  
  isa_register_ioport(d, s-io, s-ioport);
 +}
  
 -if (!port_configured) {
 -fw_cfg = fw_cfg_find();
 -if (fw_cfg) {
 -fw_cfg_add_file(fw_cfg, etc/pvpanic-port,
 -g_memdup(s-ioport, sizeof(s-ioport)),
 -sizeof(s-ioport));
 -port_configured = true;
 -}
 -}
 +static void pvpanic_fw_cfg(ISADevice *dev, FWCfgState *fw_cfg)
 +{
 +PVPanicState *s = ISA_PVPANIC_DEVICE(dev);
 +
 +fw_cfg_add_file(fw_cfg, etc/pvpanic-port,
 +g_memdup(s-ioport, sizeof(s-ioport)),
 +sizeof(s-ioport));
  }
  
 -int pvpanic_init(ISABus *bus)
 +void pvpanic_init(ISABus *bus)
  {
 -isa_create_simple(bus, TYPE_ISA_PVPANIC_DEVICE);
 -return 0;
 +ISADevice *dev;
 +FWCfgState *fw_cfg = fw_cfg_find();
 +if (!fw_cfg) {
 +return;
 +}
 +dev = isa_create_simple (bus, TYPE_ISA_PVPANIC_DEVICE);
 +pvpanic_fw_cfg(dev, fw_cfg);
  }
  
  static Property pvpanic_isa_properties[] = {
 diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
 index ba9ba1a..458eded 100644
 --- a/include/hw/i386/pc.h
 +++ b/include/hw/i386/pc.h
 @@ -196,7 +196,7 @@ static inline bool isa_ne2000_init(ISABus *bus, int base, 
 int irq, NICInfo *nd)
  void pc_system_firmware_init(MemoryRegion *rom_memory);
  
  /* pvpanic.c */
 -int pvpanic_init(ISABus *bus);
 +void pvpanic_init(ISABus *bus);
  
  /* e820 types */
  #define E820_RAM1
 

series
Reviewed-by: Laszlo Ersek ler...@redhat.com




Re: [Qemu-devel] [PATCH v3 1/2] add a header file for atomic operations

2013-06-20 Thread Paolo Bonzini
Il 19/06/2013 22:44, Richard Henderson ha scritto:
  +/* Data must be read atomically. We don't really need barrier 
  semantics
  + * but it's easier to use atomic_* than roll our own. */
  +log = atomic_xchg(from, 0);
  
  If you really don't need any ordering guarantees / barriers here, then
  using a relaxed load should be fine.  But my gut feeling tells me you
  probably do need some barriers; either you are re-using another
  barrier (and then the comment should probably point out which), or it
  must be a case where it's either fine to read any value someone (else)
  wrote or there's no concurrent store after all.
  
 There is a store here, before and after.  Read the value, store zero.
 
 I suppose what the comment is saying is that the atomic operation doesn't need
 to be ordered with respect to the rest of the surrounding code, as the object
 being synchronized is just that one integer.

Exactly.  The items of the array can be read independently.

Paolo



Re: [Qemu-devel] [Xen-devel] [PATCH] Add Xen platform PCI device version 2.

2013-06-20 Thread Alex Bligh



--On 20 June 2013 07:47:12 + Paul Durrant paul.durr...@citrix.com 
wrote:



If someone has a VM that has the old Citrix drivers installed, or GPLPV,
I think I should be able to spot this and make sure that the new bus
driver quiesces itself to prevent strangeness ensuing. If and when said
previous drivers are un-installed then the new bus driver can wake up and
enumerate the device nodes for the other pv drivers and Windows Update
can carry on doing its stuff.


I have no clue about Windows device drivers, so this may be a silly
suggestion. If your suggestion above already requires a Xen code change,
one possibility might be copy the idea behind the PCI unplug logic. Either
if the new PCI device is used, it could unplug the old one, or vice versa.
Drivers magically unplugging themselves may not be ideal, but it beats
having 2 drivers fighting over the same device.

--
Alex Bligh



Re: [Qemu-devel] [PATCH v3 2/2] QEMUBH: make AioContext's bh re-entrant

2013-06-20 Thread Paolo Bonzini
Il 20/06/2013 09:39, Stefan Hajnoczi ha scritto:
 qemu_bh_cancel() and qemu_bh_delete() are not modified by this patch.
 
 It seems that calling them from a thread is a little risky because there
 is no guarantee that the BH is no longer invoked after a thread calls
 these functions.
 
 I think that's worth a comment or do you want them to take the lock so
 they become safe?

Taking the lock wouldn't help.  The invoking loop of aio_bh_poll runs
lockless.  I think a comment is better.

qemu_bh_cancel is inherently not thread-safe, there's not much you can
do about it.

qemu_bh_delete is safe as long as you wait for the bottom half to stop
before deleting the containing object.  Once we have RCU, deletion of
QOM objects will be RCU-protected.  Hence, a simple way could be to put
the first part of aio_bh_poll() within rcu_read_lock/unlock.

 The other thing I'm unclear on is the -idle assignment followed
 immediately by a -scheduled assignment.  Without memory barriers
 aio_bh_poll() isn't guaranteed to get an ordered view of these updates:
 it may see an idle BH as a regular scheduled BH because -idle is still
 0.

Right.  You need to order -idle writes before -scheduled writes, and
add memory barriers, or alternatively use two bits in -scheduled so
that you can assign both atomically.

Paolo



Re: [Qemu-devel] [Xen-devel] [PATCH] Add Xen platform PCI device version 2.

2013-06-20 Thread Paul Durrant
 -Original Message-
 From: Alex Bligh [mailto:a...@alex.org.uk]
 Sent: 20 June 2013 09:09
 To: Paul Durrant; Tim (Xen.org); Matt Wilson
 Cc: xen-de...@lists.xen.org; Ian Campbell; qemu-devel@nongnu.org; Alex
 Bligh
 Subject: RE: [Xen-devel] [Qemu-devel] [PATCH] Add Xen platform PCI device
 version 2.
 
 
 
 --On 20 June 2013 07:47:12 + Paul Durrant paul.durr...@citrix.com
 wrote:
 
  If someone has a VM that has the old Citrix drivers installed, or GPLPV,
  I think I should be able to spot this and make sure that the new bus
  driver quiesces itself to prevent strangeness ensuing. If and when said
  previous drivers are un-installed then the new bus driver can wake up and
  enumerate the device nodes for the other pv drivers and Windows Update
  can carry on doing its stuff.
 
 I have no clue about Windows device drivers, so this may be a silly
 suggestion. If your suggestion above already requires a Xen code change,
 one possibility might be copy the idea behind the PCI unplug logic. Either
 if the new PCI device is used, it could unplug the old one, or vice versa.
 Drivers magically unplugging themselves may not be ideal, but it beats
 having 2 drivers fighting over the same device.
 

Unfortunately, whilst it sounds good on the face of it, it's not as 
straightforward as that. The old Citrix PV drivers did not just bind to the Xen 
platform device, and make that device go away automagically would actually 
cause the system disk to disappear without any clean fallback to emulation.
As long as nothing actually breaks if and when Windows fetches the new PV bus 
driver from Windows Update then we can document the need to manually uninstall 
any other PV drivers.

  Paul



Re: [Qemu-devel] [RESEND PATCH] virtio-scsi: forward scsibus for virtio-scsi-pci.

2013-06-20 Thread Frederic Konrad

On 18/06/2013 17:21, Michael S. Tsirkin wrote:

On Fri, Jun 14, 2013 at 08:13:29AM +0200, Frederic Konrad wrote:

On 13/06/2013 09:59, Michael S. Tsirkin wrote:

On Thu, Jun 13, 2013 at 09:34:30AM +0200, Frederic Konrad wrote:

On 13/06/2013 09:23, Michael S. Tsirkin wrote:

On Thu, Jun 13, 2013 at 04:46:09PM +1000, Alexey Kardashevskiy wrote:

On 06/13/2013 04:28 PM, Frederic Konrad wrote:

On 12/06/2013 13:21, Alexey Kardashevskiy wrote:

On 06/12/2013 07:16 PM, Michael S. Tsirkin wrote:

On Wed, Jun 12, 2013 at 07:04:48PM +1000, Alexey Kardashevskiy wrote:

On 06/12/2013 07:03 PM, Michael S. Tsirkin wrote:

On Wed, Jun 12, 2013 at 08:15:17AM +0200, fred.kon...@greensocs.com
wrote:

From: KONRAD Frederic fred.kon...@greensocs.com

This fix a bug with scsi hotplug on virtio-scsi-pci:

As virtio-scsi-pci doesn't have any scsi bus, we need to forward
scsi-hot-add
to the virtio-scsi-device plugged on the virtio-bus.

Cc: qemu-sta...@nongnu.org
Reported-by: Alexey Kardashevskiy a...@ozlabs.ru
Reviewed-by: Andreas Färber afaer...@suse.de
Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com

Acked-by: Michael S. Tsirkin m...@redhat.com

Note: we don't seem to have any decent way to
add disks to devices: no QMP interface,
pci address is required instead of using an id ...

Anyone can be bothered to fix this?

Actually PCI address is not always required, this field (we are talking
about drive_add?) is ignored when if=none.


Then documentation in hmp-commands.hx is wrong, isn't it?
Add that to the list.

if=none can't be actually used to hot-add
a disk to a device, can it? It creates a disc and assumes you will
use it by a device created later.

Yep. I run QEMU with -device virtio-scsi-pci,id=device0 and then do in
console:
drive_add auto file=virtimg/fc18guest,if=none,id=bar1
device_add scsi-disk,bus=device0.0,drive=bar1

Pretty hot plug :)

I thought you use drive_add 0 if=scsi?

That's the other option, I posted a bug but I did not actually try the fix
till now :)

It works now if I run QEMU with -device virtio-scsi-pci and do this in
qemu console:
drive_add 0 file=virtimg/fc18guest

No extra parameters or anything, cool, thanks, and :)

Tested-by: Alexey Kardashevskiy a...@ozlabs.ru


The only problem with it that it still wants PCI SCSI adapter while
spapr-vscsi is VIO device so if the guest kernel does not have virtio-scsi
support, I have to do what I described in the quote but this is a different
story.

Okay.  How about:
- document that pci_addr is optional in hmp
- if no pci_addr assume if=none
- add drive_add to qmp without the pci_addr and if options

We are left with the bus=device0.0 syntax for device_add which is also
gross - user asked for device0, the .0 part is qemu internals exposed to
users.
How about teaching qdev that if there's a single bus under a device,
naming the device itself should be identical?

Yes why not seems a good idea, but you'll pass it through bus= option?

This will solve the problem neatly without virtio specific hacks,
won't it?

The issue here is command line back-compatibility for pci_addr,
which won't be solved with
the single bus idea?

Why not? This code:
 scsibus = (SCSIBus *)
 object_dynamic_cast(OBJECT(QLIST_FIRST(adapter-child_bus)),
 TYPE_SCSI_BUS);
should be replaced with code from qdev that we'll write
that goes down the chain as long as there's 1 device
on each bus, looking for a device of the appropriate type.

Ok, understood what you mean :).

Why not if everybody is happy with it.

Fred

Ok so - want to try implementing this?


Ok, will try to look at it next week.

What about the stable release?
Wouldn't be safe to take this patch for the stable?

Fred



---
   hw/pci/pci-hotplug.c | 19 +--
   1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/hw/pci/pci-hotplug.c b/hw/pci/pci-hotplug.c
index 12287d1..c708752 100644
--- a/hw/pci/pci-hotplug.c
+++ b/hw/pci/pci-hotplug.c
@@ -30,6 +30,8 @@
   #include monitor/monitor.h
   #include hw/scsi/scsi.h
   #include hw/virtio/virtio-blk.h
+#include hw/virtio/virtio-scsi.h
+#include hw/virtio/virtio-pci.h
   #include qemu/config-file.h
   #include sysemu/blockdev.h
   #include qapi/error.h
@@ -79,13 +81,26 @@ static int scsi_hot_add(Monitor *mon, DeviceState
*adapter,
   {
   SCSIBus *scsibus;
   SCSIDevice *scsidev;
+VirtIOPCIProxy *virtio_proxy;
 scsibus = (SCSIBus *)
   object_dynamic_cast(OBJECT(QLIST_FIRST(adapter-child_bus)),
   TYPE_SCSI_BUS);
   if (!scsibus) {
-error_report(Device is not a SCSI adapter);
-return -1;
+/*
+ * Check if the adapter is a virtio-scsi-pci, and forward
scsi_hot_add
+ * to the virtio-scsi-device.
+ */
+if (!object_dynamic_cast(OBJECT(adapter),
TYPE_VIRTIO_SCSI_PCI)) {
+error_report(Device is not a SCSI adapter);
+return -1;
+}
+virtio_proxy = VIRTIO_PCI(adapter);
+   

Re: [Qemu-devel] [RESEND PATCH] virtio-scsi: forward scsibus for virtio-scsi-pci.

2013-06-20 Thread Michael S. Tsirkin
On Thu, Jun 20, 2013 at 10:26:18AM +0200, Frederic Konrad wrote:
 On 18/06/2013 17:21, Michael S. Tsirkin wrote:
 On Fri, Jun 14, 2013 at 08:13:29AM +0200, Frederic Konrad wrote:
 On 13/06/2013 09:59, Michael S. Tsirkin wrote:
 On Thu, Jun 13, 2013 at 09:34:30AM +0200, Frederic Konrad wrote:
 On 13/06/2013 09:23, Michael S. Tsirkin wrote:
 On Thu, Jun 13, 2013 at 04:46:09PM +1000, Alexey Kardashevskiy wrote:
 On 06/13/2013 04:28 PM, Frederic Konrad wrote:
 On 12/06/2013 13:21, Alexey Kardashevskiy wrote:
 On 06/12/2013 07:16 PM, Michael S. Tsirkin wrote:
 On Wed, Jun 12, 2013 at 07:04:48PM +1000, Alexey Kardashevskiy wrote:
 On 06/12/2013 07:03 PM, Michael S. Tsirkin wrote:
 On Wed, Jun 12, 2013 at 08:15:17AM +0200, fred.kon...@greensocs.com
 wrote:
 From: KONRAD Frederic fred.kon...@greensocs.com
 
 This fix a bug with scsi hotplug on virtio-scsi-pci:
 
 As virtio-scsi-pci doesn't have any scsi bus, we need to forward
 scsi-hot-add
 to the virtio-scsi-device plugged on the virtio-bus.
 
 Cc: qemu-sta...@nongnu.org
 Reported-by: Alexey Kardashevskiy a...@ozlabs.ru
 Reviewed-by: Andreas Färber afaer...@suse.de
 Signed-off-by: KONRAD Frederic fred.kon...@greensocs.com
 Acked-by: Michael S. Tsirkin m...@redhat.com
 
 Note: we don't seem to have any decent way to
 add disks to devices: no QMP interface,
 pci address is required instead of using an id ...
 
 Anyone can be bothered to fix this?
 Actually PCI address is not always required, this field (we are 
 talking
 about drive_add?) is ignored when if=none.
 
 Then documentation in hmp-commands.hx is wrong, isn't it?
 Add that to the list.
 
 if=none can't be actually used to hot-add
 a disk to a device, can it? It creates a disc and assumes you will
 use it by a device created later.
 Yep. I run QEMU with -device virtio-scsi-pci,id=device0 and then do 
 in
 console:
 drive_add auto file=virtimg/fc18guest,if=none,id=bar1
 device_add scsi-disk,bus=device0.0,drive=bar1
 
 Pretty hot plug :)
 I thought you use drive_add 0 if=scsi?
 That's the other option, I posted a bug but I did not actually try the 
 fix
 till now :)
 
 It works now if I run QEMU with -device virtio-scsi-pci and do this in
 qemu console:
 drive_add 0 file=virtimg/fc18guest
 
 No extra parameters or anything, cool, thanks, and :)
 
 Tested-by: Alexey Kardashevskiy a...@ozlabs.ru
 
 
 The only problem with it that it still wants PCI SCSI adapter while
 spapr-vscsi is VIO device so if the guest kernel does not have 
 virtio-scsi
 support, I have to do what I described in the quote but this is a 
 different
 story.
 Okay.  How about:
 - document that pci_addr is optional in hmp
 - if no pci_addr assume if=none
 - add drive_add to qmp without the pci_addr and if options
 
 We are left with the bus=device0.0 syntax for device_add which is also
 gross - user asked for device0, the .0 part is qemu internals exposed to
 users.
 How about teaching qdev that if there's a single bus under a device,
 naming the device itself should be identical?
 Yes why not seems a good idea, but you'll pass it through bus= option?
 This will solve the problem neatly without virtio specific hacks,
 won't it?
 The issue here is command line back-compatibility for pci_addr,
 which won't be solved with
 the single bus idea?
 Why not? This code:
  scsibus = (SCSIBus *)
  object_dynamic_cast(OBJECT(QLIST_FIRST(adapter-child_bus)),
  TYPE_SCSI_BUS);
 should be replaced with code from qdev that we'll write
 that goes down the chain as long as there's 1 device
 on each bus, looking for a device of the appropriate type.
 Ok, understood what you mean :).
 
 Why not if everybody is happy with it.
 
 Fred
 Ok so - want to try implementing this?
 
 Ok, will try to look at it next week.
 
 What about the stable release?
 Wouldn't be safe to take this patch for the stable?
 
 Fred

Yes. My ACK is for stable.

 
 ---
hw/pci/pci-hotplug.c | 19 +--
1 file changed, 17 insertions(+), 2 deletions(-)
 
 diff --git a/hw/pci/pci-hotplug.c b/hw/pci/pci-hotplug.c
 index 12287d1..c708752 100644
 --- a/hw/pci/pci-hotplug.c
 +++ b/hw/pci/pci-hotplug.c
 @@ -30,6 +30,8 @@
#include monitor/monitor.h
#include hw/scsi/scsi.h
#include hw/virtio/virtio-blk.h
 +#include hw/virtio/virtio-scsi.h
 +#include hw/virtio/virtio-pci.h
#include qemu/config-file.h
#include sysemu/blockdev.h
#include qapi/error.h
 @@ -79,13 +81,26 @@ static int scsi_hot_add(Monitor *mon, 
 DeviceState
 *adapter,
{
SCSIBus *scsibus;
SCSIDevice *scsidev;
 +VirtIOPCIProxy *virtio_proxy;
  scsibus = (SCSIBus *)

  object_dynamic_cast(OBJECT(QLIST_FIRST(adapter-child_bus)),
TYPE_SCSI_BUS);
if (!scsibus) {
 -error_report(Device is not a SCSI adapter);
 -return -1;
 +/*
 + * Check if the adapter is a virtio-scsi-pci, and forward
 scsi_hot_add
 + * to the virtio-scsi-device.
 + */
 

Re: [Qemu-devel] [PATCH] pseries: Fix compiler warning (conversion of pointer to integral value)

2013-06-20 Thread Alexander Graf


Am 20.06.2013 um 07:10 schrieb Michael Tokarev m...@tls.msk.ru:

 20.06.2013 01:40, Alexander Graf wrote:
 
 On 19.06.2013, at 23:08, Stefan Weil wrote:
 
 This kind of type cast must use uintptr_t or target_ulong to be portable
 for hosts with sizeof(void *) != sizeof(long).
 
 Here the value is assigned to a variable of type target_ulong.
 
 Signed-off-by: Stefan Weil s...@weilnetz.de
 
 Acked-by: Alexander Graf ag...@suse.de
 
 I suppose this one goes through the trivial tree?
 
 Anything which goes to -trivial can be applied directly or
 into some other subsystem tree first.  When I send a pull
 request I rebase trivial tree ontop of current master and
 filter out anything which has been already applied, so that's
 not an issue.  The only possible issue is when you applied
 it to some other tree, and -trivial pull request were handled
 before that other tree - how that will be handled by git?
 
 Will it complain (so that the situation should be resolved
 manually), will it apply nothing or will it appy an empty
 patch?  (the patch signature will be different, with
 different S-o-b.)  I think I've seen empty commits before
 in qemu tree, with the same subject/author as some previous
 commit.

It depends on how you handle the tree. I rebase ppc-next too, so the commit 
would simply vanish.

I'll apply it to ppc-next as well then.


Alex




Re: [Qemu-devel] [PATCH 6/9] vhost-scsi: new device supporting the tcm_vhost Linux kernel module

2013-06-20 Thread Libaiqing
Hi Asias,
Thanks for your config.
According to you config,I test booting from vhost device with upstream 
kernel and qemu,but failed.

1 installing guest from cdrom,ok.
2 booting vhost-scsi,guest fs error occurs. 
3 using fileio backstores,the error is same..
4 rebooting guest,a log printed:
 (qemu) hw/scsi/virtio-scsi.c:533:virtio_scsi_handle_event: Object 
0x7fccae7f2c88 is not an instance of type virtio-scsi-device
5 using upstream seabios,core dumped.
 
Could you give me some advise to debug this problem ? I can provide more 
information if need.

The qemu cmd:
[root@fedora121 x86_64-softmmu]# ./qemu-system-x86_64 -enable-kvm -name fedora  
 -M pc -m 1024 -smp 2   -drive file=/home/fedora18.iso,if=ide,media=cdrom 
-device vhost-scsi-pci,wwpn=naa.50014057133e25dc  -monitor stdio   -vga qxl  
-vnc :1

The vnc output:
Dracut-initqueue[189]:/dev/mapper/fedora-root:UNEXPECTED INCONSISTENCY;RUN FSCK 
MANUALLY.
Dracut-initqueue[189]: Warning: e2fsck returned with 4
Dracut-initqueue[189]: Warning: ***An error occurred during the file system 
check.

The guest kernel log:
Kernel: virtio-pci :00:04.0: irq 40 for MSI/MSI-X
Kernel: virtio-pci :00:04.0: irq 41 for MSI/MSI-X
Kernel: virtio-pci :00:04.0: irq 42 for MSI/MSI-X
Kernel: virtio-pci :00:04.0: irq 43 for MSI/MSI-X
Kernel: scsi2 : Virtio SCSI HBA
Kernel: scsi 2:0:1:0: Direct-Access LIO-ORG r0
Kernel: sd 2:0:1:0: Attached scsi generic sg1 type 0
Kernel: sd 2:0:1:0: [sda]1258912 512-byte logical .
Kernel: sd 2:0:1:0: [sda]write protect is off
Kernel: sd 2:0:1:0: [sda]Mode sense :43 00 00 08
Kernel: sd 2:0:1:0: [sda]write cache: disabled, read .
Kernel: sda sda1 sda2
Kernel: sd 2:0:1:0: [sda] Attached SCSI disk
Dracut-initqueue[189]: Scanning devices sda2 for LVM
Dracut-initqueue[189]: inactive '/dev/fedora/swap'...
Dracut-initqueue[189]: inactive '/dev/fedora/root'...

The info of host:
[root@fedora121 x86_64-softmmu]# uname -a
Linux fedora121 3.10.0-rc6 #1 SMP Wed Jun 19 19:34:24 CST 2013 x86_64 x86_64 
x86_64 GNU/Linux
[root@fedora121 x86_64-softmmu]# lsmod |grep vhost_scsi
vhost_scsi 49456  5
target_core_mod   282163  14 
target_core_iblock,target_core_pscsi,iscsi_target_mod,target_core_file,vhost_scsi
[root@fedora121 x86_64-softmmu]# targetcli
targetcli shell version v2.1.fb26
Copyright 2011 by RisingTide Systems LLC and others.
For help on commands, type 'help'.

/ ls
o- / 
.
 [...]
  o- backstores 
..
 [...]
  | o- block 
..
 [Storage Objects: 0]
  | o- fileio 
.
 [Storage Objects: 0]
  | o- pscsi 
..
 [Storage Objects: 0]
  | o- ramdisk 

 [Storage Objects: 1]
  |   o- r0 
...
 [(6.0GiB) activated]
  o- iscsi 

 [Targets: 0]
  o- loopback 
.
 [Targets: 0]
  o- vhost 

 [Targets: 1]
o- naa.50014057133e25dc 
..
 [TPGs: 1]
  o- tpg1 
...
 [naa.5001405a70ac3421]
o- acls 
..
 [ACLs: 0]
o- luns 
..
 [LUNs: 1]
  o- lun0 
.
 [ramdisk/r0]

Regards,
baiqing
 -Original Message-
 From: Asias He [mailto:as...@redhat.com]
 Sent: Thursday, June 20, 2013 9:34 AM
 To: Libaiqing
 Cc: Paolo Bonzini; Wenchao Xia; qemu-devel@nongnu.org;
 n...@linux-iscsi.org; Michael S. Tsirkin; Haofeng
 Subject: Re: [Qemu-devel] [PATCH 6/9] vhost-scsi: new device supporting the
 tcm_vhost Linux kernel module
 
 On Wed, Jun 19, 2013 at 12:55:10PM +, Libaiqing wrote:
  Hi paolo,
The vhost-scsi device can be used as boot device?
I tested with your config + 3.10 rc6 + 

Re: [Qemu-devel] [RFC 06/13] qemu-thread: add TLS wrappers

2013-06-20 Thread Paolo Bonzini
Il 20/06/2013 09:26, Fam Zheng ha scritto:
 On Fri, 06/14 11:48, Stefan Hajnoczi wrote:
 From: Paolo Bonzini pbonz...@redhat.com

 Fast TLS is not available on some platforms, but it is always nice to
 use it.  This wrapper implementation falls back to pthread_get/setspecific
 on POSIX systems that lack __thread, but uses the dynamic linker's TLS
 support on Linux and Windows.

 The user shall call alloc_foo() in every thread that needs to access the
 variable---exactly once and before any access.  foo is the name of the
 variable as passed to DECLARE_TLS and DEFINE_TLS.  Then, get_foo() will
 return the address of the variable.  It is guaranteed to remain the same
 across the lifetime of a thread, so you can cache it.
 
 Would tls_alloc_foo() and tls_get_foo() be easier to read and less
 possible for name conflict?

Fine by me.

Paolo




Re: [Qemu-devel] [PATCH 06/12] spapr-vty: add copyright and license

2013-06-20 Thread Paolo Bonzini
Il 19/06/2013 22:40, Anthony Liguori ha scritto:
 If you are on CC, then please Ack this patch as you touched this
 file at some point in time.
 
 Cc: Alexey Kardashevskiy a...@ozlabs.ru
 Cc: Andreas Färber afaer...@suse.de
 Cc: David Gibson da...@gibson.dropbear.id.au
 Cc: Michael Ellerman mich...@ellerman.id.au
 Cc: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Anthony Liguori aligu...@us.ibm.com
 ---
  hw/char/spapr_vty.c | 13 +
  1 file changed, 13 insertions(+)
 
 diff --git a/hw/char/spapr_vty.c b/hw/char/spapr_vty.c
 index 2993848..ecc2bb5 100644
 --- a/hw/char/spapr_vty.c
 +++ b/hw/char/spapr_vty.c
 @@ -1,3 +1,16 @@
 +/*
 + * QEMU PowerPC pSeries Logical Partition (aka sPAPR) hardware System 
 Emulator
 + *
 + * PAPR Inter-VM Logical Lan, aka ibmveth
 + *
 + * Copyright IBM, Corp. 2010-2013
 + *
 + * Authors:
 + *   David Gibson da...@gibson.dropbear.id.au
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
  #include hw/qdev.h
  #include sysemu/char.h
  #include hw/ppc/spapr.h
 

ACK

Paolo



Re: [Qemu-devel] [Xen-devel] [PATCH] Add Xen platform PCI device version 2.

2013-06-20 Thread Tim Deegan
At 07:47 + on 20 Jun (1371714432), Paul Durrant wrote:
   I agree. If this is really the only solution, we would need to have
   both versions presented to the guest so that old drivers continue to
   work without any intervention.
  
  I suspect that if we expose both, both sets of drivers try to run the
  same PV connections, and hilarity ensues.
  
 
 Actually I think I can make that work, and it is the conclusion I came
 to after Alex's comment.

Ah, nice!  In that case, I'm a lot less worried -- we can just expose
both versions/devices by default and there's no need for a visible
control knob tied to driver version (except maybe for debugging).

It means an 'unsupported' device appearing on other/older OSes, which is
unfortunate, but ISTR only Windows really complains visibly about that.

Tim.



Re: [Qemu-devel] [PATCH v4 0/9] Make 'dump-guest-memory' dump in kdump-compressed format

2013-06-20 Thread Stefan Hajnoczi
On Thu, Jun 20, 2013 at 10:18:35AM +0800, Qiao Nuohan wrote:
 On 06/19/2013 09:49 PM, Stefan Hajnoczi wrote:
 Where does that code live that writes DISKDUMP files?  I can see the
 diskdump.[ch] code.
 
 Sorry, I cannot catch what do you mean here.

Please link to the code that writes DISKDUMP kdump files on a physical
machine.  I only see the crash utility code to read the DISKDUMP code
but I haven't found anything in the Linux kernel, the crash utility, or
the kexec-utils code to actually write a DISKDUMP file.

 
 The file format is pretty bad: we need 4 temporary files and a lot of
 data copying to write it out.
 
 Why not just compress an ELF file and teach the crash utility how to
 decompress while reading the ELF?
 
 Also, did you look into simply outputting the ELF file without zero
 pages?
 
 What I want is a dump file with smaller size. And compressed format and with
 zero pages excluded can make it. I choose kdump-compressed format because it 
 is
 a standard format and it can realize what I want.
 
 Why 4 temporary files are need? dump-guest-memory may be called with a fd 
 which
 is supposed to send data of dump to. If fd is opened on a pipe or etc which is
 unable to seek, then I need to cache the data.

I understand why you need temporary files, but my questions stand:

Have you looked at using ELF more efficiently instead of duplicating
kdump code into QEMU?  kdump is not a great format for the problem
you're trying to solve - you're not filling in the Linux-specific
metadata and it's a pain to write due to its layout.

Why can't you simply omit zero pages from the ELF?

Why can't you compress the entire ELF file and add straightforward
decompression to the crash utility?

Stefan



[Qemu-devel] [RFC] qemu-img: add option -d in convert

2013-06-20 Thread Wenchao Xia
Hi,
  This is a draft design which aimed for internal snapshot convert,
hope to get your comments:

  Internal snapshot is not as easy as external snapshot, to query and
convert. This patch will improve convertion side, which helps internal
/ external snapshot mixed case. With it user can treat internal
snapshot as lineraity relationship, use it like external ones with
tool qemu-img.


An detailed example, If there is a chain as following:

imageA(sn0)-imageB(sn0,sn1)-imageC(sn0)

The real relationship in it could be:
--imageA.qcow2imageB.qcow2-imageC.qcow2
|-imageA(sn0)   |-imageB(sn0)|-imageC(sn0)
 |-imageB(sn1)

To export it, two steps:
1. duplicate them to get an exactly same tree by:
qemu-img convert imageA.qcow2 -O export/imageA.qcow2 -f qcow2
qemu-img convert imageA.qcow2 -s sn0 -O export/imageA_sn0.qcow2
qemu-img convert imageB.qcow2 -O export/imageB.qcow2 -f qcow2 -o
backing_file=export/imageA.qcow2
qemu-img convert imageB.qcow2 -s sn0 -O export/imageB.qcow2 -f qcow2 -o
backing_file=export/imageB.qcow2
...

result at ./export:
--imageA.qcow2imageB.qcow2-imageC.qcow2
|-imageA_sn0.qcow2  |-imageB_sn0.qcow2   |-imageC_sn0.qcow2
 |-imageB_sn1.qcow2

2. change the relationship to linearity to save space(or by 3rd party
diff tool):
qemu-img create imageA_l.qcow2 -f qcow2 -p backing_file=imageA_qcow2
qemu-img rebase imageA_l.qcow2 -b imageA_sn0.qcow2
qemu-img rebase -u imageB.qcow2 -b imageA_l.qcow2
discard imageA.qcow2


result at ./export:
imageA_sn0.qcow2--imageA_l.qcow2--imageB_sn0.qcow2--imageB_sn1_l.qcow2-
-imageB_l.qcow2--imageC_sn0.qcow2--imageC_l.qcow2


This is a bit complexity, they can be merged into one step, to save
disk I/O and make procedure simple, add a parameter:
[-d [base_image=IMAGE,]snapshot=SNAPSHOT]

qemu-img convert imageA.qcow2 -s sn0 -O export/imageA_sn0.qcow2 -f qcow2
qemu-img convert imageA.qcow2 -d snapshot=sn0 -O export/imageA.qcow2 -f
qcow2 -o backing_file=export/imageA_sn0.qcow2
...

result at ./export:
imageA_sn0.qcow2--imageA.qcow2--imageB_sn0.qcow2--imageB_sn1.qcow2-
-imageB.qcow2--imageC_sn0.qcow2--imageC.qcow2

parameter base_image allow diff operation taken across image in the
backing chain.

Note:
  1 snapshot query can be added in qemu-nbd easily later.
  2 This is actually a work around by qemu-img and qemu-nbd. A better
way is to provide user snapshot_read() and snapshot_allocated()
interface, typically a library. But that need some adjust in block
level, especially thread, coroutine, and emulator cut off, so delay
that.

-- 
Best Regards

Wenchao Xia




[Qemu-devel] 1192847 : NMI watchdog fails to increment the NMI counter in /proc/interrupts

2013-06-20 Thread chandrashekar shastri

Hi All,

I have filed the following bug for watchdog:

NMI watchdog fails to increment the NMI counter in /proc/interrupts

Kernel Version: 3.10.0-rc5+
Libvirt Version: 1.0.6
Qemu Version: 1.5.50

Steps to reproduce the issue:

1. Booted the VM with :
qemu-system-x86_64 VM1.qcow2 -enable-kvm -watchdog i6300esb 
-watchdog-action reset -smp 2 -m 2000
2. Edit the /boot/grub/grub/grub.conf with nmi_watchdog = 1 before the 
initrd image.

3. Restart the guests, the NMI counter in /proc/interrupts was 0
4. Installed the watchdog rpm and ran chkconfig watchdog on
5. Restart the guest, even then the NMI counter did not increment
6. Changed the /boot/grub/grub/grub.conf with nmi_watchdog = 1 to 
/boot/grub/grub/grub.conf with nmi_watchdog = 2
and restarted the guest. Even then NMI conuter did not increment (The 
NMI counter was showing 0 all the time for all the above steps).


Please let me know if I am missing some steps to test the NMI.

Thanks,
Shastri



[Qemu-devel] Failed booting into OS after introduct the KVM_MEM_READONLY flag for regions

2013-06-20 Thread Zhangleiqiang
HI, Jordan:

By using the latest master of qemu, after booting the vnc view will 
continue to be just black, and the os cannot be start. After bisect, I found 
the problem was introduced by the commit: 
235e8982ad393e5611cb892df54881c872eea9e1 (kvm: support using KVM_MEM_READONLY 
flag for regions). There are plenty of medications in this commit, so I report 
this problem to you. 
The running environment:

Qemu: 4eda32f588086b6cd0ec2be6a7a6c131f8c2b427
Host: Fedora 18, kernel: 3.8.1-201.fc18.x86_64
Boot Cmd: 
./x86_64-softmmu/qemu-system-x86_64 -name ovirt -M pc-0.15 -m 1024 
-enable-kvm -boot d \
-drive file=/pkgs/imgs/win7.qcow2,format=qcow2,id=drive0,if=none \
-device ide-hd,drive=drive0,id=disk0 \
-vnc 186.100.8.144:0 -monitor stdio \
-net tap,ifname=tap0,downscript=no -net nic

I can provide more information if need.


--
Leiqzhang

Best Regards



Re: [Qemu-devel] [PATCH v2 2/2] QEMUBH: make AioContext's bh re-entrant

2013-06-20 Thread Stefan Hajnoczi
On Wed, Jun 19, 2013 at 11:27:25AM +0200, Paolo Bonzini wrote:
 Il 19/06/2013 00:26, mdroth ha scritto:
  On Tue, Jun 18, 2013 at 09:20:26PM +0200, Paolo Bonzini wrote:
  Il 18/06/2013 17:14, mdroth ha scritto:
  Could we possibly simplify this by introducing a recursive mutex that we
  could use to protect the whole list loop and hold even during the cb?
 
  If it is possible, we should avoid recursive locks.  It makes impossible
  to establish a lock hierarchy.  For example:
 
  I assume we can't hold the lock during the cb currently since we might
  try to reschedule, but if it's a recursive mutex would that simplify
  things?
 
  If you have two callbacks in two different AioContexts, both of which do
  bdrv_drain_all(), you get an AB-BA deadlock
  
  I think I see what you mean. That problem exists regardless of whether we
  introduce a recursive mutex though right?
 
 Without a recursive mutex, you only hold one lock at a time in each thread.
 
  I guess the main issue is the
  fact that we'd be encouraging sloppy locking practices without
  addressing the root problem?
 
 Yeah.  We're basically standing where the Linux kernel stood 10 years
 ago (let's say 2.2 timeframe).  If Linux got this far without recursive
 mutexes, we can at least try. :)

FWIW I was also looking into recursive mutexes for the block layer.
What scared me a little is that they make it tempting to stop thinking
about locks since you know you'll be able to reacquire locks you already
hold.

Especially when converting existing code, I think we need to be rigorous
about exploring every function and thinking about the locks it needs and
which child functions it calls.

Otherwise we'll have code paths hidden away somewhere that were never
truly thought through.

Stefan



Re: [Qemu-devel] [PATCH v3 2/2] QEMUBH: make AioContext's bh re-entrant

2013-06-20 Thread liu ping fan
On Thu, Jun 20, 2013 at 4:16 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/06/2013 09:39, Stefan Hajnoczi ha scritto:
 qemu_bh_cancel() and qemu_bh_delete() are not modified by this patch.

 It seems that calling them from a thread is a little risky because there
 is no guarantee that the BH is no longer invoked after a thread calls
 these functions.

 I think that's worth a comment or do you want them to take the lock so
 they become safe?

 Taking the lock wouldn't help.  The invoking loop of aio_bh_poll runs
 lockless.  I think a comment is better.

Yes, will document it.
 qemu_bh_cancel is inherently not thread-safe, there's not much you can
 do about it.

 qemu_bh_delete is safe as long as you wait for the bottom half to stop
 before deleting the containing object.  Once we have RCU, deletion of
 QOM objects will be RCU-protected.  Hence, a simple way could be to put
 the first part of aio_bh_poll() within rcu_read_lock/unlock.

 The other thing I'm unclear on is the -idle assignment followed
 immediately by a -scheduled assignment.  Without memory barriers
 aio_bh_poll() isn't guaranteed to get an ordered view of these updates:
 it may see an idle BH as a regular scheduled BH because -idle is still
 0.

 Right.  You need to order -idle writes before -scheduled writes, and
 add memory barriers, or alternatively use two bits in -scheduled so
 that you can assign both atomically.

I think just shift the position of smp_rmb/wmb in _schedule and _poll,
we can acheive this (callbacks will not refer to -idle)

Regards,
Pingfan

 Paolo



Re: [Qemu-devel] [PATCH v2 3/6] net: make netclient re-entrant with refcnt

2013-06-20 Thread liu ping fan
On Tue, Jun 18, 2013 at 8:41 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Thu, Jun 13, 2013 at 05:03:03PM +0800, Liu Ping Fan wrote:
 From: Liu Ping Fan pingf...@linux.vnet.ibm.com

 With refcnt, NetClientState's user can run agaist deleter.

 Please split this into two patches:

 1. net_clients lock
 2. NetClientState refcount

Ok.

 Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
 ---
  hw/core/qdev-properties-system.c | 14 
  include/net/net.h|  3 +++
  net/hub.c|  3 +++
  net/net.c| 47 
 +---
  net/slirp.c  |  3 ++-
  5 files changed, 66 insertions(+), 4 deletions(-)

 diff --git a/hw/core/qdev-properties-system.c 
 b/hw/core/qdev-properties-system.c
 index 0eada32..41cc7e6 100644
 --- a/hw/core/qdev-properties-system.c
 +++ b/hw/core/qdev-properties-system.c
 @@ -302,6 +302,7 @@ static void set_vlan(Object *obj, Visitor *v, void 
 *opaque,
  return;
  }

 +/* inc ref, released when unset property */
  hubport = net_hub_port_find(id);
  if (!hubport) {
  error_set(errp, QERR_INVALID_PARAMETER_VALUE,
 @@ -311,11 +312,24 @@ static void set_vlan(Object *obj, Visitor *v, void 
 *opaque,
  *ptr = hubport;
  }

 +static void release_vlan(Object *obj, const char *name, void *opaque)
 +{
 +DeviceState *dev = DEVICE(obj);
 +Property *prop = opaque;
 +NICPeers *peers_ptr = qdev_get_prop_ptr(dev, prop);
 +NetClientState **ptr = peers_ptr-ncs[0];
 +
 +if (*ptr) {
 +netclient_unref(*ptr);
 +}
 +}
 +
  PropertyInfo qdev_prop_vlan = {
  .name  = vlan,
  .print = print_vlan,
  .get   = get_vlan,
  .set   = set_vlan,
 +.release = release_vlan,
  };

  int qdev_prop_set_drive(DeviceState *dev, const char *name,

 What about the netdev property?  I don't see any refcount code there.

Yes, the release of netdev and vlan property should all free its
backend. Will add the code.
 @@ -1109,6 +1146,7 @@ void net_cleanup(void)
  qemu_del_net_client(nc);
  }
  }
 +qemu_mutex_destroy(net_clients_lock);

 Why is it okay to iterate over net_clients here without the lock?

 atexit(net_cleanup); So no other racers exist.

Thx  Regards,
Pingfan



Re: [Qemu-devel] [PATCH v2 2/6] net: introduce lock to protect NetClientState's peer's access

2013-06-20 Thread liu ping fan
On Thu, Jun 20, 2013 at 3:46 PM, Stefan Hajnoczi stefa...@redhat.com wrote:
 On Thu, Jun 20, 2013 at 02:30:30PM +0800, liu ping fan wrote:
 On Tue, Jun 18, 2013 at 8:25 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
  On Thu, Jun 13, 2013 at 05:03:02PM +0800, Liu Ping Fan wrote:
  + * And flush out peer's queue.
  + */
  +static void qemu_net_client_detach_flush(NetClientState *nc)
  +{
  +NetClientState *peer;
  +
  +/* reader of self's peer field , fixme? the deleters are not 
  concurrent,
  + * so this pair lock can save.
  + */
 
  Indentation, also please resolve the fixme.
 
 So, here can I take the assumption that the deleters are serialized by
 biglock, and remove the lock following this comment?

 Ah, I understand the comment now.  Is there any advantage to dropping

:), only two atomic instruction in rare path.
 the lock?  IMO it's clearer to take the lock consistently instead of
 optimizing cases we think only get called from the main loop.

Reasonable, will keep them.



Re: [Qemu-devel] [PATCH v3 2/2] QEMUBH: make AioContext's bh re-entrant

2013-06-20 Thread Paolo Bonzini
Il 20/06/2013 11:12, liu ping fan ha scritto:
 Right.  You need to order -idle writes before -scheduled writes, and
 add memory barriers, or alternatively use two bits in -scheduled so
 that you can assign both atomically.

 I think just shift the position of smp_rmb/wmb in _schedule and _poll,
 we can acheive this (callbacks will not refer to -idle)

Yes, but you also need to swap -idle and -scheduled assignments
(aio_bh_poll reads scheduled before idle; qemu_bh_schedule* must write
idle before scheduled).

Paolo



Re: [Qemu-devel] [Xen-devel] [PATCH] Add Xen platform PCI device version 2.

2013-06-20 Thread Paul Durrant
 -Original Message-
 From: Tim Deegan [mailto:t...@xen.org]
 Sent: 20 June 2013 09:56
 To: Paul Durrant
 Cc: Matt Wilson; Alex Bligh; xen-de...@lists.xen.org; Ian Campbell; qemu-
 de...@nongnu.org
 Subject: Re: [Xen-devel] [Qemu-devel] [PATCH] Add Xen platform PCI device
 version 2.
 
 At 07:47 + on 20 Jun (1371714432), Paul Durrant wrote:
I agree. If this is really the only solution, we would need to have
both versions presented to the guest so that old drivers continue to
work without any intervention.
  
   I suspect that if we expose both, both sets of drivers try to run the
   same PV connections, and hilarity ensues.
  
 
  Actually I think I can make that work, and it is the conclusion I came
  to after Alex's comment.
 
 Ah, nice!  In that case, I'm a lot less worried -- we can just expose
 both versions/devices by default and there's no need for a visible
 control knob tied to driver version (except maybe for debugging).
 
 It means an 'unsupported' device appearing on other/older OSes, which is
 unfortunate, but ISTR only Windows really complains visibly about that.
 

Yes, I think only Windows complains and we should be able to post an article 
somewhere saying 'don't worry about it' :-)

  Paul



[Qemu-devel] [Bug 994378] Re: Nested-virt)L1 (kvm on kvm)guest panic with parameter “-cpu host” in qemu command line.

2013-06-20 Thread Kashyap Chamarthy
Short: I can't reproduce here with L1 guest having has host-passthrough
for CPU.

Long:
=

Version Info:
-

On Physical host:
~
$ uname -r; rpm -q libvirt-daemon-kvm qemu
3.10.0-0.rc2.git1.2.fc20.x86_64
qemu-1.4.2-3.fc19.x86_64
libvirt-daemon-kvm-1.0.5.2-1.fc19.x86_64
libguestfs-1.22.3-1.fc19.x86_64


On L1:
~~
$ uname -r; rpm -q libvirt-daemon-kvm qemu
3.10.0-0.rc3.git0.2.fc20.x86_64
libvirt-daemon-kvm-1.0.5.1-1.fc19.x86_64
qemu-1.4.2-2.fc19.x86_64
[root@dhcp47-209 ~]# 


L1 guest CLI:
-
[root@bare-metal ~]# ps -ef | grep qemu
qemu  7281 1 67 04:57 ?00:00:10 /usr/bin/qemu-system-x86_64 
-machine accel=kvm -name regular-guest -S -machine 
pc-i440fx-1.4,accel=kvm,usb=off -cpu host -m 10240 -smp 
4,sockets=4,cores=1,threads=1 -uuid 4ed9ac0b-7f72-dfcf-68b3-e6fe2ac588b2 
-nographic -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/regular-guest.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=/home/test/vmimages/regular-guest.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:80:c1:34,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-device usb-tablet,id=input0 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


L2 guest CLI:
-
[root@regular-guest ~]# ps -ef | grep -i qemu
qemu  1138 1 88 05:18 ?00:00:07 /usr/bin/qemu-system-x86_64 
-machine accel=kvm -name nguest-01 -S -machine pc-i440fx-1.4,accel=kvm,usb=off 
-m 2048 -smp 2,sockets=2,cores=1,threads=1 -uuid 
b47c5cbb-b320-ce9d-c595-4e083b0e465d -nographic -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/nguest-01.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
file=/home/test/vmimages/nguest-01.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:be:d5:8e,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-device usb-tablet,id=input0 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


A search for string 'error' in logs doesn't turn up anything:
[root@nguest-01 ~]# grep -i error /var/log/boot.log 
[root@nguest-01 ~]# grep -i error /var/log/messages
[root@nguest-01 ~]# 


Yongjie, can you please re-try?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/994378

Title:
  Nested-virt)L1 (kvm on kvm)guest panic with parameter “-cpu host” in
  qemu command line.

Status in QEMU:
  New

Bug description:
  Environment:
  
  Host OS (ia32/ia32e/IA64):ia32e
  Guest OS (ia32/ia32e/IA64):ia32e
  Guest OS Type (Linux/Windows):Linux
  kvm.git Commit:19853301ef3289bda2d5264c1093e74efddaeab9
  qemu-kvm Commit:69abebf20280152da8fa7c418a819ae51e862231
  Host Kernel Version:3.4.0-rc3
  Hardware:WSM-EP, Romley-EP

  
  Bug detailed description:
  --
  (KVM on KVM) L1 guest panic when starting the L1 guest with “-cpu host” 
parameter in qemu command line.

  Note:
  1. when creating guest with “-cpu qemu64,+vmx”, L1 guest and L2 guest can boot
  up. 
  2. This should be a qemu-kvm bug. using '-cpu host' parameter, the following 
is the result.
  Kvm+ qemu-kvm =result
  19853301 + 69abebf2  = bad
  19853301 + 44755ea3  = good
  3. when booting up the guest with  the good commit of 19853301 + 44755ea3, 
you can see some
  error info, but nested virt works fine. (L1 and L2 guest can boot up.)
  “error: feature i64 not available in set
  error: bad option value [extfeature_edx = i64 xd syscall]” 

  some logs 
  [root@vt-snb9 x86_64-softmmu]# ./qemu-system-x86_64 -m 2048 -net 
nic,model=rtl8139 -net tap,script=/etc/kvm/qemu-ifup -hda /root/nested-kvm.qcow 
-cpu host
  error: feature i64 not available in set
  error: bad option value [extfeature_edx = i64 xd syscall]
  error: feature i64 not available in set
  error: bad option value [extfeature_edx = i64 xd syscall]
  error: feature i64 not available in set
  error: bad option value [extfeature_edx = i64 syscall xd]
  error: feature i64 not available in set
  error: bad option value [extfeature_edx = i64 syscall xd]
  VNC server running on `::1:5900'


  Reproduce steps:
  
  1.start up a host with kvm (commit: 19853301)
  2.rmmod 

Re: [Qemu-devel] [PATCH 6/9] vhost-scsi: new device supporting the tcm_vhost Linux kernel module

2013-06-20 Thread Asias He
On Thu, Jun 20, 2013 at 08:49:50AM +, Libaiqing wrote:
 Hi Asias,
 Thanks for your config.
 According to you config,I test booting from vhost device with upstream 
 kernel and qemu,but failed.
 
 1 installing guest from cdrom,ok.
 2 booting vhost-scsi,guest fs error occurs. 
 3 using fileio backstores,the error is same..
 4 rebooting guest,a log printed:
  (qemu) hw/scsi/virtio-scsi.c:533:virtio_scsi_handle_event: Object 
 0x7fccae7f2c88 is not an instance of type virtio-scsi-device

Paolo, I remember you fixed a similar issue?

 5 using upstream seabios,core dumped.
  
 Could you give me some advise to debug this problem ? I can provide more 
 information if need.

Can you show me qemu commit id you used? Can you verity that if using the
host kernel for guest helps? Does booting directly (without the install
and reboot process) work?

 The qemu cmd:
 [root@fedora121 x86_64-softmmu]# ./qemu-system-x86_64 -enable-kvm -name 
 fedora   -M pc -m 1024 -smp 2   -drive 
 file=/home/fedora18.iso,if=ide,media=cdrom -device 
 vhost-scsi-pci,wwpn=naa.50014057133e25dc  -monitor stdio   -vga qxl  -vnc :1
 
 The vnc output:
 Dracut-initqueue[189]:/dev/mapper/fedora-root:UNEXPECTED INCONSISTENCY;RUN 
 FSCK MANUALLY.
 Dracut-initqueue[189]: Warning: e2fsck returned with 4
 Dracut-initqueue[189]: Warning: ***An error occurred during the file system 
 check.
 
 The guest kernel log:
 Kernel: virtio-pci :00:04.0: irq 40 for MSI/MSI-X
 Kernel: virtio-pci :00:04.0: irq 41 for MSI/MSI-X
 Kernel: virtio-pci :00:04.0: irq 42 for MSI/MSI-X
 Kernel: virtio-pci :00:04.0: irq 43 for MSI/MSI-X
 Kernel: scsi2 : Virtio SCSI HBA
 Kernel: scsi 2:0:1:0: Direct-Access LIO-ORG r0
 Kernel: sd 2:0:1:0: Attached scsi generic sg1 type 0
 Kernel: sd 2:0:1:0: [sda]1258912 512-byte logical .
 Kernel: sd 2:0:1:0: [sda]write protect is off
 Kernel: sd 2:0:1:0: [sda]Mode sense :43 00 00 08
 Kernel: sd 2:0:1:0: [sda]write cache: disabled, read .
 Kernel: sda sda1 sda2
 Kernel: sd 2:0:1:0: [sda] Attached SCSI disk
 Dracut-initqueue[189]: Scanning devices sda2 for LVM
 Dracut-initqueue[189]: inactive '/dev/fedora/swap'...
 Dracut-initqueue[189]: inactive '/dev/fedora/root'...
 
 The info of host:
 [root@fedora121 x86_64-softmmu]# uname -a
 Linux fedora121 3.10.0-rc6 #1 SMP Wed Jun 19 19:34:24 CST 2013 x86_64 x86_64 
 x86_64 GNU/Linux
 [root@fedora121 x86_64-softmmu]# lsmod |grep vhost_scsi
 vhost_scsi 49456  5
 target_core_mod   282163  14 
 target_core_iblock,target_core_pscsi,iscsi_target_mod,target_core_file,vhost_scsi
 [root@fedora121 x86_64-softmmu]# targetcli
 targetcli shell version v2.1.fb26
 Copyright 2011 by RisingTide Systems LLC and others.
 For help on commands, type 'help'.
 
 / ls
 o- / 
 .
  [...]
   o- backstores 
 ..
  [...]
   | o- block 
 ..
  [Storage Objects: 0]
   | o- fileio 
 .
  [Storage Objects: 0]
   | o- pscsi 
 ..
  [Storage Objects: 0]
   | o- ramdisk 
 
  [Storage Objects: 1]
   |   o- r0 
 ...
  [(6.0GiB) activated]
   o- iscsi 
 
  [Targets: 0]
   o- loopback 
 .
  [Targets: 0]
   o- vhost 
 
  [Targets: 1]
 o- naa.50014057133e25dc 
 ..
  [TPGs: 1]
   o- tpg1 
 ...
  [naa.5001405a70ac3421]
 o- acls 
 ..
  [ACLs: 0]
 o- luns 
 ..
  [LUNs: 1]
   o- lun0 
 .
  [ramdisk/r0]
 
 Regards,
 baiqing
  -Original Message-
  From: Asias He [mailto:as...@redhat.com]
  Sent: Thursday, June 20, 2013 9:34 AM
  To: Libaiqing

Re: [Qemu-devel] [PATCH v3 2/2] QEMUBH: make AioContext's bh re-entrant

2013-06-20 Thread liu ping fan
On Thu, Jun 20, 2013 at 4:16 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/06/2013 09:39, Stefan Hajnoczi ha scritto:
 qemu_bh_cancel() and qemu_bh_delete() are not modified by this patch.

 It seems that calling them from a thread is a little risky because there
 is no guarantee that the BH is no longer invoked after a thread calls
 these functions.

 I think that's worth a comment or do you want them to take the lock so
 they become safe?

 Taking the lock wouldn't help.  The invoking loop of aio_bh_poll runs
 lockless.  I think a comment is better.

 qemu_bh_cancel is inherently not thread-safe, there's not much you can
 do about it.

 qemu_bh_delete is safe as long as you wait for the bottom half to stop
 before deleting the containing object.  Once we have RCU, deletion of
 QOM objects will be RCU-protected.  Hence, a simple way could be to put
 the first part of aio_bh_poll() within rcu_read_lock/unlock.

In fact, I have some idea about this,  introduce another member -
Object for QEMUBH which will be refereed in cb, then we leave anything
to refcnt mechanism.
For qemu_bh_cancel(), I do not figure out whether it is important or
not to sync with caller.

diff --git a/async.c b/async.c
index 4b17eb7..60c35a1 100644
--- a/async.c
+++ b/async.c
@@ -61,6 +61,7 @@ int aio_bh_poll(AioContext *ctx)
 {
 QEMUBH *bh, **bhp, *next;
 int ret;
+int sched;

 {
 QEMUBH *bh, **bhp, *next;
 int ret;
+int sched;

 ctx-walking_bh++;

@@ -69,8 +70,10 @@ int aio_bh_poll(AioContext *ctx)
 /* Make sure fetching bh before accessing its members */
 smp_read_barrier_depends();
 next = bh-next;
-if (!bh-deleted  bh-scheduled) {
-bh-scheduled = 0;
+sched = 0;
+atomic_xchg(bh-scheduled, sched);
+if (!bh-deleted  sched) {
+//bh-scheduled = 0;
 if (!bh-idle)
 ret = 1;
 bh-idle = 0;
@@ -79,6 +82,9 @@ int aio_bh_poll(AioContext *ctx)
  */
 smp_rmb();
 bh-cb(bh-opaque);
+if (bh-obj) {
+object_unref(bh-obj);
+}
 }
 }

@@ -105,8 +111,12 @@ int aio_bh_poll(AioContext *ctx)

 void qemu_bh_schedule_idle(QEMUBH *bh)
 {
-if (bh-scheduled)
+int sched = 1;
+
+atomic_xchg( bh-scheduled, sched);
+if (sched) {
 return;
+}
 /* Make sure any writes that are needed by the callback are done
  * before the locations are read in the aio_bh_poll.
  */
@@ -117,25 +127,46 @@ void qemu_bh_schedule_idle(QEMUBH *bh)

 void qemu_bh_schedule(QEMUBH *bh)
 {
-if (bh-scheduled)
+int sched = 1;
+
+atomic_xchg( bh-scheduled, sched);
+if (sched) {
 return;
+}
 /* Make sure any writes that are needed by the callback are done
  * before the locations are read in the aio_bh_poll.
  */
 smp_wmb();
 bh-scheduled = 1;
+if (bh-obj) {
+object_ref(bh-obj);
+}
 bh-idle = 0;
 aio_notify(bh-ctx);
 }

 void qemu_bh_cancel(QEMUBH *bh)
 {
-bh-scheduled = 0;
+int sched = 0;
+
+atomic_xchg( bh-scheduled, sched);
+if (sched) {
+if (bh-obj) {
+object_ref(bh-obj);
+}
+}
 }

 void qemu_bh_delete(QEMUBH *bh)
 {
-bh-scheduled = 0;
+int sched = 0;
+
+atomic_xchg( bh-scheduled, sched);
+if (sched) {
+if (bh-obj) {
+object_ref(bh-obj);
+}
+}
 bh-deleted = 1;
 }

Regards,
Pingfan
 The other thing I'm unclear on is the -idle assignment followed
 immediately by a -scheduled assignment.  Without memory barriers
 aio_bh_poll() isn't guaranteed to get an ordered view of these updates:
 it may see an idle BH as a regular scheduled BH because -idle is still
 0.

 Right.  You need to order -idle writes before -scheduled writes, and
 add memory barriers, or alternatively use two bits in -scheduled so
 that you can assign both atomically.

 Paolo



Re: [Qemu-devel] [PATCH v3 2/2] QEMUBH: make AioContext's bh re-entrant

2013-06-20 Thread Paolo Bonzini
Il 20/06/2013 11:41, liu ping fan ha scritto:
 On Thu, Jun 20, 2013 at 4:16 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/06/2013 09:39, Stefan Hajnoczi ha scritto:
 qemu_bh_cancel() and qemu_bh_delete() are not modified by this patch.

 It seems that calling them from a thread is a little risky because there
 is no guarantee that the BH is no longer invoked after a thread calls
 these functions.

 I think that's worth a comment or do you want them to take the lock so
 they become safe?

 Taking the lock wouldn't help.  The invoking loop of aio_bh_poll runs
 lockless.  I think a comment is better.

 qemu_bh_cancel is inherently not thread-safe, there's not much you can
 do about it.

 qemu_bh_delete is safe as long as you wait for the bottom half to stop
 before deleting the containing object.  Once we have RCU, deletion of
 QOM objects will be RCU-protected.  Hence, a simple way could be to put
 the first part of aio_bh_poll() within rcu_read_lock/unlock.

 In fact, I have some idea about this,  introduce another member -
 Object for QEMUBH which will be refereed in cb, then we leave anything
 to refcnt mechanism.
 For qemu_bh_cancel(), I do not figure out whether it is important or
 not to sync with caller.

This is a separate patch anyway... and a long discussion to have before
too. :)

Let's concentrate on one thing at a time.

Paolo

 diff --git a/async.c b/async.c
 index 4b17eb7..60c35a1 100644
 --- a/async.c
 +++ b/async.c
 @@ -61,6 +61,7 @@ int aio_bh_poll(AioContext *ctx)
  {
  QEMUBH *bh, **bhp, *next;
  int ret;
 +int sched;
 
  {
  QEMUBH *bh, **bhp, *next;
  int ret;
 +int sched;
 
  ctx-walking_bh++;
 
 @@ -69,8 +70,10 @@ int aio_bh_poll(AioContext *ctx)
  /* Make sure fetching bh before accessing its members */
  smp_read_barrier_depends();
  next = bh-next;
 -if (!bh-deleted  bh-scheduled) {
 -bh-scheduled = 0;
 +sched = 0;
 +atomic_xchg(bh-scheduled, sched);

This is expensive.

 +if (!bh-deleted  sched) {
 +//bh-scheduled = 0;
  if (!bh-idle)
  ret = 1;
  bh-idle = 0;
 @@ -79,6 +82,9 @@ int aio_bh_poll(AioContext *ctx)
   */
  smp_rmb();
  bh-cb(bh-opaque);
 +if (bh-obj) {
 +object_unref(bh-obj);
 +}
  }
  }
 
 @@ -105,8 +111,12 @@ int aio_bh_poll(AioContext *ctx)
 
  void qemu_bh_schedule_idle(QEMUBH *bh)
  {
 -if (bh-scheduled)
 +int sched = 1;
 +
 +atomic_xchg( bh-scheduled, sched);
 +if (sched) {
  return;
 +}
  /* Make sure any writes that are needed by the callback are done
   * before the locations are read in the aio_bh_poll.
   */
 @@ -117,25 +127,46 @@ void qemu_bh_schedule_idle(QEMUBH *bh)
 
  void qemu_bh_schedule(QEMUBH *bh)
  {
 -if (bh-scheduled)
 +int sched = 1;
 +
 +atomic_xchg( bh-scheduled, sched);
 +if (sched) {
  return;
 +}
  /* Make sure any writes that are needed by the callback are done
   * before the locations are read in the aio_bh_poll.
   */
  smp_wmb();
  bh-scheduled = 1;
 +if (bh-obj) {
 +object_ref(bh-obj);
 +}
  bh-idle = 0;
  aio_notify(bh-ctx);
  }
 
  void qemu_bh_cancel(QEMUBH *bh)
  {
 -bh-scheduled = 0;
 +int sched = 0;
 +
 +atomic_xchg( bh-scheduled, sched);
 +if (sched) {
 +if (bh-obj) {
 +object_ref(bh-obj);
 +}
 +}
  }
 
  void qemu_bh_delete(QEMUBH *bh)
  {
 -bh-scheduled = 0;
 +int sched = 0;
 +
 +atomic_xchg( bh-scheduled, sched);
 +if (sched) {
 +if (bh-obj) {
 +object_ref(bh-obj);
 +}
 +}
  bh-deleted = 1;
  }
 
 Regards,
 Pingfan
 The other thing I'm unclear on is the -idle assignment followed
 immediately by a -scheduled assignment.  Without memory barriers
 aio_bh_poll() isn't guaranteed to get an ordered view of these updates:
 it may see an idle BH as a regular scheduled BH because -idle is still
 0.

 Right.  You need to order -idle writes before -scheduled writes, and
 add memory barriers, or alternatively use two bits in -scheduled so
 that you can assign both atomically.

 Paolo




Re: [Qemu-devel] Adding a persistent writeback cache to qemu

2013-06-20 Thread Stefan Hajnoczi
On Wed, Jun 19, 2013 at 10:28:53PM +0100, Alex Bligh wrote:
 --On 11 April 2013 11:25:48 +0200 Stefan Hajnoczi
 stefa...@gmail.com wrote:
 
 I'd like to experiment with adding persistent writeback cache to qemu.
 The use case here is where non-local storage is used (e.g. rbd, ceph)
 using the qemu drivers, together with a local cache as a file on
 a much faster locally mounted device, for instance an SSD (possibly
 replicated). This would I think give a similar performance boost to
 using an rbd block device plus flashcache/dm-cache/bcache, but without
 introducing all the context switches and limitations of having to
 use real block devices. I appreciate it would need to be live migration
 aware (worst case solution: flush and turn off caching during live
 migrate), and ideally be capable of replaying a dirty writeback cache
 in the event the host crashes.
 
 Is there any support for this already? Has anyone worked on this before?
 If not, would there be any interest in it?
 
 I'm concerned about the complexity this would introduce in QEMU.
 Therefore I'm a fan of using existing solutions like the Linux block
 layer instead of reimplementing this stuff in Linux.
 
 What concrete issues are there with using rbd plus
 flashcache/dm-cache/bcache?
 
 I'm not sure I understand the context switch problem since implementing
 it in user space will still require system calls to do all the actual
 cache I/O.
 
 I failed to see your reply and got distracted from this. Apologies.
 So several months later ...

Happens to me sometimes too ;-).

 The concrete problem here is that flashcache/dm-cache/bcache don't
 work with the rbd (librbd) driver, as flashcache/dm-cache/bcache
 cache access to block devices (in the host layer), and with rbd
 (for instance) there is no access to a block device at all. block/rbd.c
 simply calls librbd which calls librados etc.
 
 So the context switches etc. I am avoiding are the ones that would
 be introduced by using kernel rbd devices rather than librbd.

I understand the limitations with kernel block devices - their
setup/teardown is an extra step outside QEMU and privileges need to be
managed.  That basically means you need to use a management tool like
libvirt to make it usable.

But I don't understand the performance angle here.  Do you have profiles
that show kernel rbd is a bottleneck due to context switching?

We use the kernel page cache for -drive file=test.img,cache=writeback
and no one has suggested reimplementing the page cache inside QEMU for
better performance.

Also, how do you want to manage QEMU page cache with multiple guests
running?  They are independent and know nothing about each other.  Their
process memory consumption will be bloated and the kernel memory
management will end up having to sort out who gets to stay in physical
memory.

You can see I'm skeptical of this and think it's premature optimization,
but if there's really a case for it with performance profiles then I
guess it would be necessary.  But we should definitely get feedback from
the Ceph folks too.

I'd like to hear from Ceph folks what their position on kernel rbd vs
librados is.  Why one do they recommend for QEMU guests and what are the
pros/cons?

CCed Sage and Josh

Stefan



Re: [Qemu-devel] [PATCH v3] vl.c: Support multiple CPU ranges on -numa option

2013-06-20 Thread Paolo Bonzini
Il 20/06/2013 11:30, Igor Mammedov ha scritto:
So, basically the format seemed easier to work with if we are 
thinking 
of using QemuOpts for -numa. Using -cpu rather than cpus probably
makes it less ambiguous as well IMO. However, it's probably not a 
good idea
if the current syntax is well established ?
  
  libvirt uses the cpus option already, so we have to keep it working.
 Sure, we can leave it as it's now for some time while a new interface is
 introduced/adopted. And than later deprecate cpus.

So, you used a new name because the new behavior of -numa
node,cpus=1-2,cpus=3-4 would be incompatible with the old.

Personally I don't think that's a problem, but I remember a long
discussion in the past.  Igor/Eduardo, do you remember the conclusions?

Paolo



Re: [Qemu-devel] [RFC PATCH 0/4] per-object libraries

2013-06-20 Thread Peter Maydell
On 20 June 2013 10:49, Paolo Bonzini pbonz...@redhat.com wrote:
 This only leaves Darwin.  I have no idea about that, and I don't have
 anymore a machine to test it.  Andreas or Peter, can you shed light?

If you have something concrete you'd like me to test I can test it.

 But still, libtool wouldn't be a particularly problematic dependency.
 We're already using it for libcacard.

...and we're already disabling the probe for libtool in configure
on MacOSX because MacOS libtool is something completely different...

thanks
-- PMM



Re: [Qemu-devel] [RFC PATCH 0/4] per-object libraries

2013-06-20 Thread Paolo Bonzini
Il 19/06/2013 22:00, Michael Tokarev ha scritto:
 19.06.2013 22:52, Paolo Bonzini wrote:
 Il 19/06/2013 20:18, Michael Tokarev ha scritto:
 Currently I expand it like this:

   $(foreach m, $(filter %.o,$1), $($(m:%.o=%.libs)))

 Probably I can change that to

   $(foreach m, $(filter %.o,$1), $($(m:%.o=./%.libs)))

 (here and in other similar cases), and it will work without changing
 anything around $(obj).

 But maybe we can argee here that this is not really OBJect, it is
 a path or dir, and name it $(d) or $(p) instead of $(obj) ?  To
 include the slash when needed. just like I did for $(obj).

 I chose $(obj) because that's what Kbuild uses.
 
 kbuild almost never requires this variable in user makefiles
 (not counting kbuild internals), -- because of recursive way
 of its working there's no need to prepend directory names like
 qemu makefiles currently do, user makefiles refer to files
 using bare (no path) names.
 
 Contrary to that, qemu does not recurse, it tries to do everything
 in one instance, so it can expand just a few magic variables,
 and for everything else we have to explicitly use that prefix.
 So that it becomes inconsistent (some vars require prefix, some
 don't), and too verbose.

I think something like this would still be messy:

common-obj-y += $(p)core.o $(p)smbus.o $(p)smbus_eeprom.o
common-obj-$(CONFIG_VERSATILE_I2C) += $(p)versatile_i2c.o
common-obj-$(CONFIG_ACPI) += $(p)smbus_ich9.o
common-obj-$(CONFIG_APM) += $(p)pm_smbus.o
common-obj-$(CONFIG_BITBANG_I2C) += $(p)bitbang_i2c.o
common-obj-$(CONFIG_EXYNOS4) += $(p)exynos4210_i2c.o
obj-$(CONFIG_OMAP) += $(p)omap_i2c.o

and this is why I preferred per-target variable values to magic variable
names.

I also disliked the duplication, but in fact you do not even have the
duplication if you use libtool and make a .la convenience library
(similar to ld -r but preserving library dependencies) for each
module, even if it is builtin.  Then you can link the .la either into a
.so module, or add it to the executable.

If you do this, the LIBS only goes into a $(obj)/foo.la target-specific
variable.

 Also, for the inevitable bikeshedding, I would prefer

cflags-$(obj)/curl.o-y
libs-$(obj)/curl.o-y
 What are all these -y suffixes for?  In existing variables and in
 this new your invention?  It's already a bit too verbose.

 It is so that you can do

 foo-$(CONFIG_XYZ) += blah

 instead of

 ifeq ($(CONFIG_XYZ),y)
 FOO += blah
 endif
 
 Yes, that sure I undertand -- obj-y, block-m etc.  I'm asking
 about those new vars yuo propose -- why you want -y sufffix
 *there*, like cflags-foo.o-y ?
 
 Do you want to be able to specify different flags for -y and -m
 builds?  Isn't it a bit too much?

I was thinking of a module that optionally uses a library.  For example
raw block I/O may optionally use the linux AIO library.  But you would do

module-raw-posix-$(CONFIG_LINUX_AIO) += linux-aio.o
libs-linux-aio.o = -laio

or something like that, not

module-raw-posix-$(CONFIG_LINUX_AIO) += linux-aio.o
libs-raw-posix-$(CONFIG_LINUX_AIO) += -laio

So it looks like there's no need for the -y here, indeed.

Paolo



[Qemu-devel] [PATCH RFC 01/15] i440fx: remove unused parameter i440fx_state of i440fx_init.

2013-06-20 Thread Hu Tao
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/i386/pc_piix.c|  4 +---
 hw/pci-host/piix.c   | 10 --
 include/hw/i386/pc.h |  2 +-
 3 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 97362f2..1e83c1c 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -74,7 +74,6 @@ static void pc_init1(MemoryRegion *system_memory,
 ram_addr_t below_4g_mem_size, above_4g_mem_size;
 PCIBus *pci_bus;
 ISABus *isa_bus;
-PCII440FXState *i440fx_state;
 int piix3_devfn = -1;
 qemu_irq *cpu_irq;
 qemu_irq *gsi;
@@ -137,7 +136,7 @@ static void pc_init1(MemoryRegion *system_memory,
 }
 
 if (pci_enabled) {
-pci_bus = i440fx_init(i440fx_state, piix3_devfn, isa_bus, gsi,
+pci_bus = i440fx_init(piix3_devfn, isa_bus, gsi,
   system_memory, system_io, ram_size,
   below_4g_mem_size,
   0x1ULL - below_4g_mem_size,
@@ -148,7 +147,6 @@ static void pc_init1(MemoryRegion *system_memory,
   pci_memory, ram_memory);
 } else {
 pci_bus = NULL;
-i440fx_state = NULL;
 isa_bus = isa_bus_new(NULL, system_io);
 no_hpet = 1;
 }
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index f9e68c3..0176ae9 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -225,7 +225,6 @@ static int i440fx_initfn(PCIDevice *dev)
 }
 
 static PCIBus *i440fx_common_init(const char *device_name,
-  PCII440FXState **pi440fx_state,
   int *piix3_devfn,
   ISABus **isa_bus, qemu_irq *pic,
   MemoryRegion *address_space_mem,
@@ -255,8 +254,7 @@ static PCIBus *i440fx_common_init(const char *device_name,
 qdev_init_nofail(dev);
 
 d = pci_create_simple(b, 0, device_name);
-*pi440fx_state = I440FX_PCI_DEVICE(d);
-f = *pi440fx_state;
+f = I440FX_PCI_DEVICE(d);
 f-system_memory = address_space_mem;
 f-pci_address_space = pci_address_space;
 f-ram_memory = ram_memory;
@@ -307,14 +305,14 @@ static PCIBus *i440fx_common_init(const char *device_name,
 ram_size = ram_size / 8 / 1024 / 1024;
 if (ram_size  255)
 ram_size = 255;
-(*pi440fx_state)-dev.config[0x57]=ram_size;
+f-dev.config[0x57] = ram_size;
 
 i440fx_update_memory_mappings(f);
 
 return b;
 }
 
-PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix3_devfn,
+PCIBus *i440fx_init(int *piix3_devfn,
 ISABus **isa_bus, qemu_irq *pic,
 MemoryRegion *address_space_mem,
 MemoryRegion *address_space_io,
@@ -328,7 +326,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int 
*piix3_devfn,
 {
 PCIBus *b;
 
-b = i440fx_common_init(TYPE_I440FX_PCI_DEVICE, pi440fx_state,
+b = i440fx_common_init(TYPE_I440FX_PCI_DEVICE,
piix3_devfn, isa_bus, pic,
address_space_mem, address_space_io, ram_size,
pci_hole_start, pci_hole_size,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 7f04967..7c95189 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -126,7 +126,7 @@ extern int no_hpet;
 struct PCII440FXState;
 typedef struct PCII440FXState PCII440FXState;
 
-PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix_devfn,
+PCIBus *i440fx_init(int *piix_devfn,
 ISABus **isa_bus, qemu_irq *pic,
 MemoryRegion *address_space_mem,
 MemoryRegion *address_space_io,
-- 
1.8.3.1




[Qemu-devel] [PATCH RFC 02/15] i440fx: rename i440FX to i440FX-PMC

2013-06-20 Thread Hu Tao
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/pci-host/piix.c   | 60 ++--
 include/hw/i386/pc.h |  6 +++---
 2 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 0176ae9..fc955bd 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -82,11 +82,11 @@ typedef struct PIIX3State {
 MemoryRegion rcr_mem;
 } PIIX3State;
 
-#define TYPE_I440FX_PCI_DEVICE i440FX
-#define I440FX_PCI_DEVICE(obj) \
-OBJECT_CHECK(PCII440FXState, (obj), TYPE_I440FX_PCI_DEVICE)
+#define TYPE_I440FX_PMC_DEVICE i440FX-PMC
+#define I440FX_PMC_DEVICE(obj) \
+OBJECT_CHECK(I440FXPMCState, (obj), TYPE_I440FX_PMC_DEVICE)
 
-struct PCII440FXState {
+struct I440FXPMCState {
 PCIDevice dev;
 MemoryRegion *system_memory;
 MemoryRegion *pci_address_space;
@@ -118,7 +118,7 @@ static int pci_slot_get_pirq(PCIDevice *pci_dev, int 
pci_intx)
 return (pci_intx + slot_addend)  3;
 }
 
-static void i440fx_update_memory_mappings(PCII440FXState *d)
+static void i440fx_pmc_update_memory_mappings(I440FXPMCState *d)
 {
 int i;
 
@@ -133,7 +133,7 @@ static void i440fx_update_memory_mappings(PCII440FXState *d)
 
 static void i440fx_set_smm(int val, void *arg)
 {
-PCII440FXState *d = arg;
+I440FXPMCState *d = arg;
 
 memory_region_transaction_begin();
 smram_set_smm(d-smm_enabled, val, d-dev.config[I440FX_SMRAM],
@@ -145,25 +145,25 @@ static void i440fx_set_smm(int val, void *arg)
 static void i440fx_write_config(PCIDevice *dev,
 uint32_t address, uint32_t val, int len)
 {
-PCII440FXState *d = I440FX_PCI_DEVICE(dev);
+I440FXPMCState *d = I440FX_PMC_DEVICE(dev);
 
 /* XXX: implement SMRAM.D_LOCK */
 pci_default_write_config(dev, address, val, len);
 if (ranges_overlap(address, len, I440FX_PAM, I440FX_PAM_SIZE) ||
 range_covers_byte(address, len, I440FX_SMRAM)) {
-i440fx_update_memory_mappings(d);
+i440fx_pmc_update_memory_mappings(d);
 }
 }
 
 static int i440fx_load_old(QEMUFile* f, void *opaque, int version_id)
 {
-PCII440FXState *d = opaque;
+I440FXPMCState *d = opaque;
 int ret, i;
 
 ret = pci_device_load(d-dev, f);
 if (ret  0)
 return ret;
-i440fx_update_memory_mappings(d);
+i440fx_pmc_update_memory_mappings(d);
 qemu_get_8s(f, d-smm_enabled);
 
 if (version_id == 2) {
@@ -177,22 +177,22 @@ static int i440fx_load_old(QEMUFile* f, void *opaque, int 
version_id)
 
 static int i440fx_post_load(void *opaque, int version_id)
 {
-PCII440FXState *d = opaque;
+I440FXPMCState *d = opaque;
 
-i440fx_update_memory_mappings(d);
+i440fx_pmc_update_memory_mappings(d);
 return 0;
 }
 
-static const VMStateDescription vmstate_i440fx = {
-.name = I440FX,
+static const VMStateDescription vmstate_i440fx_pmc = {
+.name = TYPE_I440FX_PMC_DEVICE,
 .version_id = 3,
 .minimum_version_id = 3,
 .minimum_version_id_old = 1,
 .load_state_old = i440fx_load_old,
 .post_load = i440fx_post_load,
 .fields  = (VMStateField []) {
-VMSTATE_PCI_DEVICE(dev, PCII440FXState),
-VMSTATE_UINT8(smm_enabled, PCII440FXState),
+VMSTATE_PCI_DEVICE(dev, I440FXPMCState),
+VMSTATE_UINT8(smm_enabled, I440FXPMCState),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -214,9 +214,9 @@ static int i440fx_pcihost_initfn(SysBusDevice *dev)
 return 0;
 }
 
-static int i440fx_initfn(PCIDevice *dev)
+static int i440fx_pmc_initfn(PCIDevice *dev)
 {
-PCII440FXState *d = I440FX_PCI_DEVICE(dev);
+I440FXPMCState *d = I440FX_PMC_DEVICE(dev);
 
 d-dev.config[I440FX_SMRAM] = 0x02;
 
@@ -242,7 +242,7 @@ static PCIBus *i440fx_common_init(const char *device_name,
 PCIDevice *d;
 PCIHostState *s;
 PIIX3State *piix3;
-PCII440FXState *f;
+I440FXPMCState *f;
 unsigned i;
 
 dev = qdev_create(NULL, i440FX-pcihost);
@@ -254,7 +254,7 @@ static PCIBus *i440fx_common_init(const char *device_name,
 qdev_init_nofail(dev);
 
 d = pci_create_simple(b, 0, device_name);
-f = I440FX_PCI_DEVICE(d);
+f = I440FX_PMC_DEVICE(d);
 f-system_memory = address_space_mem;
 f-pci_address_space = pci_address_space;
 f-ram_memory = ram_memory;
@@ -307,7 +307,7 @@ static PCIBus *i440fx_common_init(const char *device_name,
 ram_size = 255;
 f-dev.config[0x57] = ram_size;
 
-i440fx_update_memory_mappings(f);
+i440fx_pmc_update_memory_mappings(f);
 
 return b;
 }
@@ -326,7 +326,7 @@ PCIBus *i440fx_init(int *piix3_devfn,
 {
 PCIBus *b;
 
-b = i440fx_common_init(TYPE_I440FX_PCI_DEVICE,
+b = i440fx_common_init(TYPE_I440FX_PMC_DEVICE,
piix3_devfn, isa_bus, pic,
address_space_mem, address_space_io, ram_size,
pci_hole_start, pci_hole_size,
@@ -603,13 +603,13 @@ static const TypeInfo piix3_xen_info = {
 

[Qemu-devel] [PATCH RFC 03/15] i440fx: rename i440FX-pcihost to i440FX

2013-06-20 Thread Hu Tao
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/pci-host/piix.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index fc955bd..9c482ec 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -38,6 +38,10 @@
  * http://download.intel.com/design/chipsets/datashts/29054901.pdf
  */
 
+#define TYPE_I440FX_DEVICE i440FX
+#define I440FX_DEVICE(obj) \
+OBJECT_CHECK(I440FXState, (obj), TYPE_I440FX_DEVICE)
+
 typedef struct I440FXState {
 PCIHostState parent_obj;
 } I440FXState;
@@ -197,7 +201,7 @@ static const VMStateDescription vmstate_i440fx_pmc = {
 }
 };
 
-static int i440fx_pcihost_initfn(SysBusDevice *dev)
+static int i440fx_initfn(SysBusDevice *dev)
 {
 PCIHostState *s = PCI_HOST_BRIDGE(dev);
 
@@ -245,7 +249,7 @@ static PCIBus *i440fx_common_init(const char *device_name,
 I440FXPMCState *f;
 unsigned i;
 
-dev = qdev_create(NULL, i440FX-pcihost);
+dev = qdev_create(NULL, TYPE_I440FX_DEVICE);
 s = PCI_HOST_BRIDGE(dev);
 b = pci_bus_new(dev, NULL, pci_address_space,
 address_space_io, 0, TYPE_PCI_BUS);
@@ -627,21 +631,21 @@ static const TypeInfo i440fx_pmc_info = {
 .class_init= i440fx_pmc_class_init,
 };
 
-static void i440fx_pcihost_class_init(ObjectClass *klass, void *data)
+static void i440fx_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k-init = i440fx_pcihost_initfn;
+k-init = i440fx_initfn;
 dc-fw_name = pci;
 dc-no_user = 1;
 }
 
-static const TypeInfo i440fx_pcihost_info = {
-.name  = i440FX-pcihost,
+static const TypeInfo i440fx_info = {
+.name  = TYPE_I440FX_DEVICE,
 .parent= TYPE_PCI_HOST_BRIDGE,
 .instance_size = sizeof(I440FXState),
-.class_init= i440fx_pcihost_class_init,
+.class_init= i440fx_class_init,
 };
 
 static void i440fx_register_types(void)
@@ -649,7 +653,7 @@ static void i440fx_register_types(void)
 type_register_static(i440fx_pmc_info);
 type_register_static(piix3_info);
 type_register_static(piix3_xen_info);
-type_register_static(i440fx_pcihost_info);
+type_register_static(i440fx_info);
 }
 
 type_init(i440fx_register_types)
-- 
1.8.3.1




[Qemu-devel] [PATCH RFC 05/15] i440fx pmc: create pmc through comosition

2013-06-20 Thread Hu Tao
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/pci-host/piix.c | 123 +
 1 file changed, 76 insertions(+), 47 deletions(-)

diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 1c5c761..218aca2 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -38,16 +38,6 @@
  * http://download.intel.com/design/chipsets/datashts/29054901.pdf
  */
 
-#define TYPE_I440FX_DEVICE i440FX
-#define I440FX_DEVICE(obj) \
-OBJECT_CHECK(I440FXState, (obj), TYPE_I440FX_DEVICE)
-
-typedef struct I440FXState {
-PCIHostState parent_obj;
-MemoryRegion *address_space_io;
-MemoryRegion *pci_address_space;
-} I440FXState;
-
 #define PIIX_NUM_PIC_IRQS   16  /* i8259 * 2 */
 #define PIIX_NUM_PIRQS  4ULL/* PIRQ[A-D] */
 #define XEN_PIIX_NUM_PIRQS  128ULL
@@ -102,8 +92,25 @@ struct I440FXPMCState {
 PAMMemoryRegion pam_regions[13];
 MemoryRegion smram_region;
 uint8_t smm_enabled;
+ram_addr_t ram_size;
+hwaddr pci_hole_start;
+hwaddr pci_hole_size;
+hwaddr pci_hole64_start;
+hwaddr pci_hole64_size;
 };
 
+#define TYPE_I440FX_DEVICE i440FX
+#define I440FX_DEVICE(obj) \
+OBJECT_CHECK(I440FXState, (obj), TYPE_I440FX_DEVICE)
+
+typedef struct I440FXState {
+PCIHostState parent_obj;
+MemoryRegion *address_space_io;
+MemoryRegion *pci_address_space;
+
+PIIX3State piix3;
+I440FXPMCState pmc;
+} I440FXState;
 
 #define I440FX_PAM  0x59
 #define I440FX_PAM_SIZE 7
@@ -221,16 +228,63 @@ static int i440fx_realize(SysBusDevice *dev)
 sysbus_add_io(dev, 0xcfc, s-data_mem);
 sysbus_init_ioports(s-busdev, 0xcfc, 4);
 
+qdev_set_parent_bus(DEVICE(f-pmc), BUS(s-bus));
+qdev_init_nofail(DEVICE(f-pmc));
+
 return 0;
 }
 
 static void i440fx_initfn(Object *obj)
 {
+I440FXState *f = I440FX_DEVICE(obj);
+
+object_initialize(f-pmc, TYPE_I440FX_PMC_DEVICE);
+object_property_add_child(obj, pmc, OBJECT(f-pmc), NULL);
+qdev_prop_set_uint32(DEVICE(f-pmc), addr, PCI_DEVFN(0, 0));
 }
 
 static int i440fx_pmc_initfn(PCIDevice *dev)
 {
 I440FXPMCState *d = I440FX_PMC_DEVICE(dev);
+ram_addr_t ram_size;
+int i;
+
+g_assert(d-system_memory != NULL);
+g_assert(d-pci_address_space != NULL);
+g_assert(d-ram_memory != NULL);
+
+memory_region_init_alias(d-pci_hole, pci-hole, d-pci_address_space,
+ d-pci_hole_start, d-pci_hole_size);
+memory_region_add_subregion(d-system_memory, d-pci_hole_start,
+d-pci_hole);
+memory_region_init_alias(d-pci_hole_64bit, pci-hole64,
+ d-pci_address_space,
+ d-pci_hole64_start, d-pci_hole64_size);
+if (d-pci_hole64_size) {
+memory_region_add_subregion(d-system_memory, d-pci_hole64_start,
+d-pci_hole_64bit);
+}
+memory_region_init_alias(d-smram_region, smram-region,
+ d-pci_address_space, 0xa, 0x2);
+memory_region_add_subregion_overlap(d-system_memory, 0xa,
+d-smram_region, 1);
+memory_region_set_enabled(d-smram_region, false);
+
+init_pam(d-ram_memory, d-system_memory, d-pci_address_space,
+ d-pam_regions[0], PAM_BIOS_BASE, PAM_BIOS_SIZE);
+for (i = 0; i  12; ++i) {
+init_pam(d-ram_memory, d-system_memory, d-pci_address_space,
+ d-pam_regions[i+1], PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE,
+ PAM_EXPAN_SIZE);
+}
+
+ram_size = d-ram_size / 8 / 1024 / 1024;
+if (ram_size  255) {
+ram_size = 255;
+}
+d-dev.config[0x57] = ram_size;
+
+i440fx_pmc_update_memory_mappings(d);
 
 d-dev.config[I440FX_SMRAM] = 0x02;
 
@@ -251,12 +305,10 @@ static PCIBus *i440fx_common_init(const char *device_name,
   MemoryRegion *pci_address_space,
   MemoryRegion *ram_memory)
 {
-PCIDevice *d;
 PCIHostState *s;
 PIIX3State *piix3;
 I440FXPMCState *f;
 I440FXState *i440fx;
-unsigned i;
 
 i440fx = I440FX_DEVICE(object_new(TYPE_I440FX_DEVICE));
 s = PCI_HOST_BRIDGE(i440fx);
@@ -264,38 +316,22 @@ static PCIBus *i440fx_common_init(const char *device_name,
 i440fx-address_space_io = address_space_io;
 i440fx-pci_address_space = pci_address_space;
 
-object_property_add_child(qdev_get_machine(), i440fx,
-  OBJECT(i440fx), NULL);
-qdev_set_parent_bus(DEVICE(i440fx), sysbus_get_default());
-qdev_init_nofail(DEVICE(i440fx));
+/* FIXME these should be derived */
+i440fx-pmc.pci_hole_start = pci_hole_start;
+i440fx-pmc.pci_hole_size = pci_hole_size;
+i440fx-pmc.pci_hole64_start = pci_hole64_start;
+i440fx-pmc.pci_hole64_size = pci_hole64_size;
 
-d = pci_create_simple(s-bus, 0, device_name);
-f = I440FX_PMC_DEVICE(d);
+f = i440fx-pmc;

[Qemu-devel] [PATCH RFC 00/15] pc refactor about memory controller

2013-06-20 Thread Hu Tao
This series introduces MemoryController, and refactor pc code
about i440fx pmc and q35 mch, and introduces ISAPc to resolve
breakage of isapc.

The memory hotplug patchset will base on this series. Comments
are welcome!

Hu Tao (15):
  i440fx: remove unused parameter i440fx_state of i440fx_init.
  i440fx: rename i440FX to i440FX-PMC
  i440fx: rename i440FX-pcihost to i440FX
  i440fx: prepare for composition
  i440fx pmc: create pmc through comosition
  i440fx-pmc: calculate PCI memory hole directly
  i440fx-pmc: create pci address space
  q35-mch: create pci address space
  i440fx-pmc: move ram initialization into i440fx-pmc
  q35-mch: move ram initialization into q35-mch
  introduce ISAPc
  i440fx pmc: inherit from MemoryController
  q35 mch: inherit from MemoryController
  move bios loading to MemoryController and ISAPc 

 hw/i386/pc.c  | 214 +++-
 hw/i386/pc_piix.c |  52 --
 hw/i386/pc_q35.c  |  31 ++
 hw/isa/Makefile.objs  |   2 +-
 hw/isa/isa_pc.c   |  53 ++
 hw/pci-host/piix.c| 244 +++---
 hw/pci-host/q35.c | 132 +++--
 include/hw/i386/pc.h  |  66 ++---
 include/hw/isa/isa_pc.h   |  27 +
 include/hw/pci-host/q35.h |  16 +--
 10 files changed, 465 insertions(+), 372 deletions(-)
 create mode 100644 hw/isa/isa_pc.c
 create mode 100644 include/hw/isa/isa_pc.h

-- 
1.8.3.1




[Qemu-devel] [PATCH RFC 09/15] i440fx-pmc: move ram initialization into i440fx-pmc

2013-06-20 Thread Hu Tao
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/i386/pc.c | 24 +---
 hw/i386/pc_piix.c|  5 ++---
 hw/pci-host/piix.c   | 39 +++
 include/hw/i386/pc.h |  3 +--
 4 files changed, 35 insertions(+), 36 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 5e8f143..909307e 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1033,33 +1033,11 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
MemoryRegion **ram_memory)
 {
 int linux_boot, i;
-MemoryRegion *ram, *option_rom_mr;
-MemoryRegion *ram_below_4g, *ram_above_4g;
+MemoryRegion *option_rom_mr;
 FWCfgState *fw_cfg;
 
 linux_boot = (kernel_filename != NULL);
 
-/* Allocate RAM.  We allocate it as a single memory region and use
- * aliases to address portions of it, mostly for backwards compatibility
- * with older qemus that used qemu_ram_alloc().
- */
-ram = g_malloc(sizeof(*ram));
-memory_region_init_ram(ram, pc.ram,
-   below_4g_mem_size + above_4g_mem_size);
-vmstate_register_ram_global(ram);
-*ram_memory = ram;
-ram_below_4g = g_malloc(sizeof(*ram_below_4g));
-memory_region_init_alias(ram_below_4g, ram-below-4g, ram,
- 0, below_4g_mem_size);
-memory_region_add_subregion(system_memory, 0, ram_below_4g);
-if (above_4g_mem_size  0) {
-ram_above_4g = g_malloc(sizeof(*ram_above_4g));
-memory_region_init_alias(ram_above_4g, ram-above-4g, ram,
- below_4g_mem_size, above_4g_mem_size);
-memory_region_add_subregion(system_memory, 0x1ULL,
-ram_above_4g);
-}
-
 
 /* Initialize PC system firmware */
 pc_system_firmware_init(rom_memory);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 1ca705c..4614c07 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -84,7 +84,6 @@ static void pc_init1(MemoryRegion *system_memory,
 BusState *idebus[MAX_IDE_BUS];
 ISADevice *rtc_state;
 ISADevice *floppy;
-MemoryRegion *ram_memory;
 MemoryRegion *pci_memory;
 MemoryRegion *rom_memory;
 DeviceState *icc_bridge;
@@ -121,7 +120,7 @@ static void pc_init1(MemoryRegion *system_memory,
 if (pci_enabled) {
 pci_bus = i440fx_init(piix3_devfn, isa_bus, gsi,
   system_memory, system_io, ram_size,
-  pci_memory, ram_memory);
+  pci_memory);
 } else {
 pci_bus = NULL;
 isa_bus = isa_bus_new(NULL, system_io);
@@ -140,7 +139,7 @@ static void pc_init1(MemoryRegion *system_memory,
 fw_cfg = pc_memory_init(system_memory,
 kernel_filename, kernel_cmdline, 
initrd_filename,
 below_4g_mem_size, above_4g_mem_size,
-rom_memory, ram_memory);
+rom_memory, NULL);
 }
 
 if (kvm_irqchip_in_kernel()) {
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 604a66e..5318ddc 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -96,6 +96,10 @@ struct I440FXPMCState {
 MemoryRegion smram_region;
 uint8_t smm_enabled;
 ram_addr_t ram_size;
+MemoryRegion ram;
+MemoryRegion ram_below_4g;
+MemoryRegion ram_above_4g;
+
 };
 
 #define TYPE_I440FX_DEVICE i440FX
@@ -250,18 +254,39 @@ static int i440fx_pmc_initfn(PCIDevice *dev)
 {
 I440FXPMCState *d = I440FX_PMC_DEVICE(dev);
 ram_addr_t ram_size;
+hwaddr below_4g_mem_size, above_4g_mem_size;
 hwaddr pci_hole_start, pci_hole_size;
 hwaddr pci_hole64_start, pci_hole64_size;
 int i;
 
 g_assert(d-system_memory != NULL);
-g_assert(d-ram_memory != NULL);
 
 if(d-ram_size  I440FX_PMC_PCI_HOLE) {
-pci_hole_start = I440FX_PMC_PCI_HOLE;
+below_4g_mem_size = I440FX_PMC_PCI_HOLE;
+above_4g_mem_size = d-ram_size - I440FX_PMC_PCI_HOLE;
 } else {
-pci_hole_start = d-ram_size;
+below_4g_mem_size = d-ram_size;
+above_4g_mem_size = 0;
 }
+
+/* Allocate RAM.  We allocate it as a single memory region and use
+ * aliases to address portions of it, mostly for backwards compatibility
+ * with older qemus that used qemu_ram_alloc().
+ */
+memory_region_init_ram(d-ram, pc.ram,
+   below_4g_mem_size + above_4g_mem_size);
+vmstate_register_ram_global(d-ram);
+memory_region_init_alias(d-ram_below_4g, ram-below-4g, d-ram,
+ 0, below_4g_mem_size);
+memory_region_add_subregion(d-system_memory, 0, d-ram_below_4g);
+if (above_4g_mem_size  0) {
+memory_region_init_alias(d-ram_above_4g, ram-above-4g, d-ram,
+ below_4g_mem_size, above_4g_mem_size);
+memory_region_add_subregion(d-system_memory, 

[Qemu-devel] [PATCH RFC 06/15] i440fx-pmc: calculate PCI memory hole directly

2013-06-20 Thread Hu Tao
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/i386/pc_piix.c|  6 --
 hw/pci-host/piix.c   | 49 -
 include/hw/i386/pc.h |  4 
 3 files changed, 24 insertions(+), 35 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 1e83c1c..3934754 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -138,12 +138,6 @@ static void pc_init1(MemoryRegion *system_memory,
 if (pci_enabled) {
 pci_bus = i440fx_init(piix3_devfn, isa_bus, gsi,
   system_memory, system_io, ram_size,
-  below_4g_mem_size,
-  0x1ULL - below_4g_mem_size,
-  0x1ULL + above_4g_mem_size,
-  (sizeof(hwaddr) == 4
-   ? 0
-   : ((uint64_t)1  62)),
   pci_memory, ram_memory);
 } else {
 pci_bus = NULL;
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 218aca2..7b58d56 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -43,6 +43,9 @@
 #define XEN_PIIX_NUM_PIRQS  128ULL
 #define PIIX_PIRQC  0x60
 
+#define I440FX_PMC_PCI_HOLE   0xE000ULL
+#define I440FX_PMC_PCI_HOLE_END   0x1ULL
+
 /*
  * Reset Control Register: PCI-accessible ISA-Compatible Register at address
  * 0xcf9, provided by the PCI/ISA bridge (PIIX3 PCI function 0, 8086:7000).
@@ -93,10 +96,6 @@ struct I440FXPMCState {
 MemoryRegion smram_region;
 uint8_t smm_enabled;
 ram_addr_t ram_size;
-hwaddr pci_hole_start;
-hwaddr pci_hole_size;
-hwaddr pci_hole64_start;
-hwaddr pci_hole64_size;
 };
 
 #define TYPE_I440FX_DEVICE i440FX
@@ -247,21 +246,37 @@ static int i440fx_pmc_initfn(PCIDevice *dev)
 {
 I440FXPMCState *d = I440FX_PMC_DEVICE(dev);
 ram_addr_t ram_size;
+hwaddr pci_hole_start, pci_hole_size;
+hwaddr pci_hole64_start, pci_hole64_size;
 int i;
 
 g_assert(d-system_memory != NULL);
 g_assert(d-pci_address_space != NULL);
 g_assert(d-ram_memory != NULL);
 
+if(d-ram_size  I440FX_PMC_PCI_HOLE) {
+pci_hole_start = I440FX_PMC_PCI_HOLE;
+} else {
+pci_hole_start = d-ram_size;
+}
+pci_hole_size = I440FX_PMC_PCI_HOLE_END - pci_hole_start;
+
+pci_hole64_start = I440FX_PMC_PCI_HOLE_END + d-ram_size - pci_hole_start;
+if (sizeof(hwaddr) == 4) {
+pci_hole64_size = 0;
+} else {
+pci_hole64_size = (1ULL  62);
+}
+
 memory_region_init_alias(d-pci_hole, pci-hole, d-pci_address_space,
- d-pci_hole_start, d-pci_hole_size);
-memory_region_add_subregion(d-system_memory, d-pci_hole_start,
+ pci_hole_start, pci_hole_size);
+memory_region_add_subregion(d-system_memory, pci_hole_start,
 d-pci_hole);
 memory_region_init_alias(d-pci_hole_64bit, pci-hole64,
  d-pci_address_space,
- d-pci_hole64_start, d-pci_hole64_size);
-if (d-pci_hole64_size) {
-memory_region_add_subregion(d-system_memory, d-pci_hole64_start,
+ pci_hole64_start, pci_hole64_size);
+if (pci_hole64_size) {
+memory_region_add_subregion(d-system_memory, pci_hole64_start,
 d-pci_hole_64bit);
 }
 memory_region_init_alias(d-smram_region, smram-region,
@@ -298,10 +313,6 @@ static PCIBus *i440fx_common_init(const char *device_name,
   MemoryRegion *address_space_mem,
   MemoryRegion *address_space_io,
   ram_addr_t ram_size,
-  hwaddr pci_hole_start,
-  hwaddr pci_hole_size,
-  hwaddr pci_hole64_start,
-  hwaddr pci_hole64_size,
   MemoryRegion *pci_address_space,
   MemoryRegion *ram_memory)
 {
@@ -316,12 +327,6 @@ static PCIBus *i440fx_common_init(const char *device_name,
 i440fx-address_space_io = address_space_io;
 i440fx-pci_address_space = pci_address_space;
 
-/* FIXME these should be derived */
-i440fx-pmc.pci_hole_start = pci_hole_start;
-i440fx-pmc.pci_hole_size = pci_hole_size;
-i440fx-pmc.pci_hole64_start = pci_hole64_start;
-i440fx-pmc.pci_hole64_size = pci_hole64_size;
-
 f = i440fx-pmc;
 f-ram_size = ram_size;
 f-system_memory = address_space_mem;
@@ -362,10 +367,6 @@ PCIBus *i440fx_init(int *piix3_devfn,
 MemoryRegion *address_space_mem,
 MemoryRegion *address_space_io,
 ram_addr_t ram_size,
-hwaddr pci_hole_start,
-hwaddr pci_hole_size,
-

[Qemu-devel] [PATCH RFC 08/15] q35-mch: create pci address space

2013-06-20 Thread Hu Tao
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/i386/pc_q35.c  | 13 -
 hw/pci-host/q35.c | 12 +++-
 include/hw/pci-host/q35.h |  1 +
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index bb0ce6a..109bb64 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -64,7 +64,6 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 BusState *idebus[MAX_SATA_PORTS];
 ISADevice *rtc_state;
 ISADevice *floppy;
-MemoryRegion *pci_memory;
 MemoryRegion *rom_memory;
 MemoryRegion *ram_memory;
 GSIState *gsi_state;
@@ -87,6 +86,9 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 
 kvmclock_create();
 
+/* create pci host bus */
+q35_host = Q35_HOST_DEVICE(qdev_create(NULL, TYPE_Q35_HOST_DEVICE));
+
 if (ram_size = 0xb000) {
 above_4g_mem_size = ram_size - 0xb000;
 below_4g_mem_size = 0xb000;
@@ -97,11 +99,8 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 
 /* pci enabled */
 if (pci_enabled) {
-pci_memory = g_new(MemoryRegion, 1);
-memory_region_init(pci_memory, pci, INT64_MAX);
-rom_memory = pci_memory;
+rom_memory = q35_host-pci_address_space;
 } else {
-pci_memory = NULL;
 rom_memory = get_system_memory();
 }
 
@@ -122,11 +121,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
 }
 
-/* create pci host bus */
-q35_host = Q35_HOST_DEVICE(qdev_create(NULL, TYPE_Q35_HOST_DEVICE));
-
 q35_host-mch.ram_memory = ram_memory;
-q35_host-mch.pci_address_space = pci_memory;
 q35_host-mch.system_memory = get_system_memory();
 q35_host-mch.address_space_io = get_system_io();
 q35_host-mch.below_4g_mem_size = below_4g_mem_size;
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 24df6b5..3ab5ed8 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -36,8 +36,7 @@
 
 static int q35_host_init(SysBusDevice *dev)
 {
-PCIBus *b;
-PCIHostState *pci = FROM_SYSBUS(PCIHostState, dev);
+PCIHostState *pci = PCI_HOST_BRIDGE(dev);
 Q35PCIHost *s = Q35_HOST_DEVICE(dev-qdev);
 
 memory_region_init_io(pci-conf_mem, pci_host_conf_le_ops, pci,
@@ -53,11 +52,12 @@ static int q35_host_init(SysBusDevice *dev)
 if (pcie_host_init(s-host)  0) {
 return -1;
 }
-b = pci_bus_new(s-host.pci.busdev.qdev, pcie.0,
+
+s-mch.pci_address_space = s-pci_address_space;
+pci-bus = pci_bus_new(DEVICE(s), pcie.0,
 s-mch.pci_address_space, s-mch.address_space_io,
 0, TYPE_PCIE_BUS);
-s-host.pci.bus = b;
-qdev_set_parent_bus(DEVICE(s-mch), BUS(b));
+qdev_set_parent_bus(DEVICE(s-mch), BUS(pci-bus));
 qdev_init_nofail(DEVICE(s-mch));
 
 return 0;
@@ -87,6 +87,8 @@ static void q35_host_initfn(Object *obj)
 object_property_add_child(OBJECT(s), mch, OBJECT(s-mch), NULL);
 qdev_prop_set_uint32(DEVICE(s-mch), addr, PCI_DEVFN(0, 0));
 qdev_prop_set_bit(DEVICE(s-mch), multifunction, false);
+
+memory_region_init(s-pci_address_space, pci, INT64_MAX);
 }
 
 static const TypeInfo q35_host_info = {
diff --git a/include/hw/pci-host/q35.h b/include/hw/pci-host/q35.h
index e182c82..1c02420 100644
--- a/include/hw/pci-host/q35.h
+++ b/include/hw/pci-host/q35.h
@@ -60,6 +60,7 @@ typedef struct MCHPCIState {
 typedef struct Q35PCIHost {
 PCIExpressHost host;
 MCHPCIState mch;
+MemoryRegion pci_address_space;
 } Q35PCIHost;
 
 #define Q35_MASK(bit, ms_bit, ls_bit) \
-- 
1.8.3.1




[Qemu-devel] [PATCH RFC 04/15] i440fx: prepare for composition

2013-06-20 Thread Hu Tao
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/pci-host/piix.c | 49 +++--
 1 file changed, 31 insertions(+), 18 deletions(-)

diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 9c482ec..1c5c761 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -44,6 +44,8 @@
 
 typedef struct I440FXState {
 PCIHostState parent_obj;
+MemoryRegion *address_space_io;
+MemoryRegion *pci_address_space;
 } I440FXState;
 
 #define PIIX_NUM_PIC_IRQS   16  /* i8259 * 2 */
@@ -201,9 +203,13 @@ static const VMStateDescription vmstate_i440fx_pmc = {
 }
 };
 
-static int i440fx_initfn(SysBusDevice *dev)
+static int i440fx_realize(SysBusDevice *dev)
 {
 PCIHostState *s = PCI_HOST_BRIDGE(dev);
+I440FXState *f = I440FX_DEVICE(dev);
+
+s-bus = pci_bus_new(DEVICE(f), NULL, f-pci_address_space,
+ f-address_space_io, 0, TYPE_PCI_BUS);
 
 memory_region_init_io(s-conf_mem, pci_host_conf_le_ops, s,
   pci-conf-idx, 4);
@@ -218,6 +224,10 @@ static int i440fx_initfn(SysBusDevice *dev)
 return 0;
 }
 
+static void i440fx_initfn(Object *obj)
+{
+}
+
 static int i440fx_pmc_initfn(PCIDevice *dev)
 {
 I440FXPMCState *d = I440FX_PMC_DEVICE(dev);
@@ -241,23 +251,25 @@ static PCIBus *i440fx_common_init(const char *device_name,
   MemoryRegion *pci_address_space,
   MemoryRegion *ram_memory)
 {
-DeviceState *dev;
-PCIBus *b;
 PCIDevice *d;
 PCIHostState *s;
 PIIX3State *piix3;
 I440FXPMCState *f;
+I440FXState *i440fx;
 unsigned i;
 
-dev = qdev_create(NULL, TYPE_I440FX_DEVICE);
-s = PCI_HOST_BRIDGE(dev);
-b = pci_bus_new(dev, NULL, pci_address_space,
-address_space_io, 0, TYPE_PCI_BUS);
-s-bus = b;
-object_property_add_child(qdev_get_machine(), i440fx, OBJECT(dev), NULL);
-qdev_init_nofail(dev);
+i440fx = I440FX_DEVICE(object_new(TYPE_I440FX_DEVICE));
+s = PCI_HOST_BRIDGE(i440fx);
+
+i440fx-address_space_io = address_space_io;
+i440fx-pci_address_space = pci_address_space;
 
-d = pci_create_simple(b, 0, device_name);
+object_property_add_child(qdev_get_machine(), i440fx,
+  OBJECT(i440fx), NULL);
+qdev_set_parent_bus(DEVICE(i440fx), sysbus_get_default());
+qdev_init_nofail(DEVICE(i440fx));
+
+d = pci_create_simple(s-bus, 0, device_name);
 f = I440FX_PMC_DEVICE(d);
 f-system_memory = address_space_mem;
 f-pci_address_space = pci_address_space;
@@ -291,15 +303,15 @@ static PCIBus *i440fx_common_init(const char *device_name,
  * These additional routes can be discovered through ACPI. */
 if (xen_enabled()) {
 piix3 = DO_UPCAST(PIIX3State, dev,
-pci_create_simple_multifunction(b, -1, true, PIIX3-xen));
-pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq,
+pci_create_simple_multifunction(s-bus, -1, true, 
PIIX3-xen));
+pci_bus_irqs(s-bus, xen_piix3_set_irq, xen_pci_slot_get_pirq,
 piix3, XEN_PIIX_NUM_PIRQS);
 } else {
 piix3 = DO_UPCAST(PIIX3State, dev,
-pci_create_simple_multifunction(b, -1, true, PIIX3));
-pci_bus_irqs(b, piix3_set_irq, pci_slot_get_pirq, piix3,
+pci_create_simple_multifunction(s-bus, -1, true, PIIX3));
+pci_bus_irqs(s-bus, piix3_set_irq, pci_slot_get_pirq, piix3,
 PIIX_NUM_PIRQS);
-pci_bus_set_route_irq_fn(b, piix3_route_intx_pin_to_irq);
+pci_bus_set_route_irq_fn(s-bus, piix3_route_intx_pin_to_irq);
 }
 piix3-pic = pic;
 *isa_bus = ISA_BUS(qdev_get_child_bus(DEVICE(piix3), isa.0));
@@ -313,7 +325,7 @@ static PCIBus *i440fx_common_init(const char *device_name,
 
 i440fx_pmc_update_memory_mappings(f);
 
-return b;
+return s-bus;
 }
 
 PCIBus *i440fx_init(int *piix3_devfn,
@@ -636,7 +648,7 @@ static void i440fx_class_init(ObjectClass *klass, void 
*data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k-init = i440fx_initfn;
+k-init = i440fx_realize;
 dc-fw_name = pci;
 dc-no_user = 1;
 }
@@ -645,6 +657,7 @@ static const TypeInfo i440fx_info = {
 .name  = TYPE_I440FX_DEVICE,
 .parent= TYPE_PCI_HOST_BRIDGE,
 .instance_size = sizeof(I440FXState),
+.instance_init = i440fx_initfn,
 .class_init= i440fx_class_init,
 };
 
-- 
1.8.3.1




[Qemu-devel] [PATCH RFC 12/15] introduce memory controller

2013-06-20 Thread Hu Tao
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/i386/pc.c | 156 +++
 hw/pci-host/piix.c   |   1 -
 include/hw/i386/pc.h |  43 ++
 3 files changed, 199 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 398b201..c28baa2 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1256,3 +1256,159 @@ void ioapic_init_gsi(GSIState *gsi_state, const char 
*parent_name)
 gsi_state-ioapic_irq[i] = qdev_get_gpio_in(dev, i);
 }
 }
+
+void mc_update_pam(MemoryController *d)
+{
+PCIDevice *pd = PCI_DEVICE(d);
+MemoryControllerClass *c = MEMORY_CONTROLLER_GET_CLASS(d);
+int i;
+
+memory_region_transaction_begin();
+for (i = 0; i  13; i++) {
+pam_update(d-pam_regions[i], i,
+   pd-config[c-pam0 + ((i + 1) / 2)]);
+}
+memory_region_transaction_commit();
+}
+
+void mc_update_smram(MemoryController *d)
+{
+PCIDevice *pd = PCI_DEVICE(d);
+MemoryControllerClass *c = MEMORY_CONTROLLER_GET_CLASS(d);
+
+memory_region_transaction_begin();
+smram_update(d-smram_region, pd-config[c-smram],
+ d-smm_enabled);
+memory_region_transaction_commit();
+}
+
+void mc_update(MemoryController *d)
+{
+mc_update_pam(d);
+mc_update_smram(d);
+}
+
+void mc_set_smm(int val, void *arg)
+{
+MemoryController *d = arg;
+PCIDevice *pd = PCI_DEVICE(d);
+MemoryControllerClass *c = MEMORY_CONTROLLER_GET_CLASS(d);
+
+memory_region_transaction_begin();
+smram_set_smm(d-smm_enabled, val, pd-config[c-smram],
+  d-smram_region);
+memory_region_transaction_commit();
+}
+
+static int memory_controller_init(PCIDevice *dev)
+{
+MemoryController *m = MEMORY_CONTROLLER(dev);
+MemoryControllerClass *c = MEMORY_CONTROLLER_GET_CLASS(dev);
+ram_addr_t ram_size;
+hwaddr below_4g_mem_size, above_4g_mem_size;
+hwaddr pci_hole_start, pci_hole_size;
+hwaddr pci_hole64_start, pci_hole64_size;
+int i;
+
+g_assert(m-system_memory != NULL);
+
+if(m-ram_size  c-pci_hole_start) {
+below_4g_mem_size = c-pci_hole_start;
+above_4g_mem_size = m-ram_size - c-pci_hole_start;
+} else {
+below_4g_mem_size = m-ram_size;
+above_4g_mem_size = 0;
+}
+
+/* Allocate RAM.  We allocate it as a single memory region and use
+ * aliases to address portions of it, mostly for backwards compatibility
+ * with older qemus that used qemu_ram_alloc().
+ */
+memory_region_init_ram(m-ram, pc.ram, m-ram_size);
+vmstate_register_ram_global(m-ram);
+memory_region_init_alias(m-ram_below_4g, ram-below-4g, m-ram,
+ 0, below_4g_mem_size);
+memory_region_add_subregion(m-system_memory, 0, m-ram_below_4g);
+if (above_4g_mem_size  0) {
+memory_region_init_alias(m-ram_above_4g, ram-above-4g, m-ram,
+ below_4g_mem_size, above_4g_mem_size);
+memory_region_add_subregion(m-system_memory, c-pci_hole_end,
+m-ram_above_4g);
+}
+
+pci_hole_start = below_4g_mem_size;
+pci_hole_size = c-pci_hole_end - pci_hole_start;
+
+pci_hole64_start = c-pci_hole_end + m-ram_size - pci_hole_start;
+if (sizeof(hwaddr) == 4) {
+pci_hole64_size = 0;
+} else {
+pci_hole64_size = (1ULL  62);
+}
+
+memory_region_init_alias(m-pci_hole, pci-hole, m-pci_address_space,
+ pci_hole_start, pci_hole_size);
+memory_region_add_subregion(m-system_memory, pci_hole_start,
+m-pci_hole);
+memory_region_init_alias(m-pci_hole_64bit, pci-hole64,
+ m-pci_address_space,
+ pci_hole64_start, pci_hole64_size);
+if (pci_hole64_size) {
+memory_region_add_subregion(m-system_memory, pci_hole64_start,
+m-pci_hole_64bit);
+}
+memory_region_init_alias(m-smram_region, smram-region,
+ m-pci_address_space, 0xa, 0x2);
+memory_region_add_subregion_overlap(m-system_memory, 0xa,
+m-smram_region, 1);
+memory_region_set_enabled(m-smram_region, false);
+
+init_pam(m-ram_memory, m-system_memory, m-pci_address_space,
+ m-pam_regions[0], PAM_BIOS_BASE, PAM_BIOS_SIZE);
+for (i = 0; i  12; ++i) {
+init_pam(m-ram_memory, m-system_memory, m-pci_address_space,
+ m-pam_regions[i+1], PAM_EXPAN_BASE + i * PAM_EXPAN_SIZE,
+ PAM_EXPAN_SIZE);
+}
+
+ram_size = m-ram_size / 8 / 1024 / 1024;
+if (ram_size  255) {
+ram_size = 255;
+}
+dev-config[0x57] = ram_size;
+
+dev-config[0x72] = 0x02;
+
+cpu_smm_register(c-set_smm, m);
+return 0;
+}
+
+static void memory_controller_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = 

[Qemu-devel] [PATCH RFC 10/15] q35-mch: move ram initialization into q35-mch

2013-06-20 Thread Hu Tao
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/i386/pc.c  |  3 +--
 hw/i386/pc_piix.c |  2 +-
 hw/i386/pc_q35.c  |  7 ++-
 hw/pci-host/q35.c | 36 +++-
 include/hw/i386/pc.h  |  3 +--
 include/hw/pci-host/q35.h |  9 +++--
 6 files changed, 43 insertions(+), 17 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 909307e..398b201 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1029,8 +1029,7 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
const char *initrd_filename,
ram_addr_t below_4g_mem_size,
ram_addr_t above_4g_mem_size,
-   MemoryRegion *rom_memory,
-   MemoryRegion **ram_memory)
+   MemoryRegion *rom_memory)
 {
 int linux_boot, i;
 MemoryRegion *option_rom_mr;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 4614c07..77d7ef0 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -139,7 +139,7 @@ static void pc_init1(MemoryRegion *system_memory,
 fw_cfg = pc_memory_init(system_memory,
 kernel_filename, kernel_cmdline, 
initrd_filename,
 below_4g_mem_size, above_4g_mem_size,
-rom_memory, NULL);
+rom_memory);
 }
 
 if (kvm_irqchip_in_kernel()) {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 109bb64..d7d9703 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -65,7 +65,6 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 ISADevice *rtc_state;
 ISADevice *floppy;
 MemoryRegion *rom_memory;
-MemoryRegion *ram_memory;
 GSIState *gsi_state;
 ISABus *isa_bus;
 int pci_enabled = 1;
@@ -108,7 +107,7 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 if (!xen_enabled()) {
 pc_memory_init(get_system_memory(), kernel_filename, kernel_cmdline,
initrd_filename, below_4g_mem_size, above_4g_mem_size,
-   rom_memory, ram_memory);
+   rom_memory);
 }
 
 /* irq lines */
@@ -121,11 +120,9 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
 }
 
-q35_host-mch.ram_memory = ram_memory;
+q35_host-mch.ram_size = ram_size;
 q35_host-mch.system_memory = get_system_memory();
 q35_host-mch.address_space_io = get_system_io();
-q35_host-mch.below_4g_mem_size = below_4g_mem_size;
-q35_host-mch.above_4g_mem_size = above_4g_mem_size;
 /* pci */
 qdev_init_nofail(DEVICE(q35_host));
 host_bus = q35_host-host.pci.bus;
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 3ab5ed8..5f99431 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -244,24 +244,50 @@ static int mch_init(PCIDevice *d)
 {
 int i;
 hwaddr pci_hole64_size;
+hwaddr below_4g_mem_size, above_4g_mem_size;
 MCHPCIState *mch = MCH_PCI_DEVICE(d);
 
+if(mch-ram_size  MCH_PCI_HOLE) {
+below_4g_mem_size = MCH_PCI_HOLE;
+above_4g_mem_size = mch-ram_size - MCH_PCI_HOLE;
+} else {
+below_4g_mem_size = mch-ram_size;
+above_4g_mem_size = 0;
+}
+
+/* Allocate RAM.  We allocate it as a single memory region and use
+ * aliases to address portions of it, mostly for backwards compatibility
+ * with older qemus that used qemu_ram_alloc().
+ */
+memory_region_init_ram(mch-ram, pc.ram,
+   below_4g_mem_size + above_4g_mem_size);
+vmstate_register_ram_global(mch-ram);
+memory_region_init_alias(mch-ram_below_4g, ram-below-4g, mch-ram,
+ 0, below_4g_mem_size);
+memory_region_add_subregion(mch-system_memory, 0, mch-ram_below_4g);
+if (above_4g_mem_size  0) {
+memory_region_init_alias(mch-ram_above_4g, ram-above-4g, mch-ram,
+ below_4g_mem_size, above_4g_mem_size);
+memory_region_add_subregion(mch-system_memory, MCH_PCI_HOLE_END,
+mch-ram_above_4g);
+}
+
 /* setup pci memory regions */
 memory_region_init_alias(mch-pci_hole, pci-hole,
  mch-pci_address_space,
- mch-below_4g_mem_size,
- 0x1ULL - mch-below_4g_mem_size);
-memory_region_add_subregion(mch-system_memory, mch-below_4g_mem_size,
+ below_4g_mem_size,
+ 0x1ULL - below_4g_mem_size);
+memory_region_add_subregion(mch-system_memory, below_4g_mem_size,
 mch-pci_hole);
 pci_hole64_size = (sizeof(hwaddr) == 4 ? 0 :
((uint64_t)1  62));
 memory_region_init_alias(mch-pci_hole_64bit, pci-hole64,
  

[Qemu-devel] [PATCH RFC 15/15] move bios loading to MemoryController and ISAPc

2013-06-20 Thread Hu Tao
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/i386/pc.c| 35 ++-
 hw/i386/pc_piix.c   | 13 ++---
 hw/i386/pc_q35.c| 13 ++---
 hw/isa/isa_pc.c | 11 +++
 include/hw/i386/pc.h|  7 +++
 include/hw/isa/isa_pc.h |  1 +
 6 files changed, 33 insertions(+), 47 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c28baa2..5bb4989 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1023,36 +1023,19 @@ void pc_acpi_init(const char *default_dsdt)
 }
 }
 
-FWCfgState *pc_memory_init(MemoryRegion *system_memory,
-   const char *kernel_filename,
+FWCfgState *pc_memory_init(const char *kernel_filename,
const char *kernel_cmdline,
const char *initrd_filename,
ram_addr_t below_4g_mem_size,
-   ram_addr_t above_4g_mem_size,
-   MemoryRegion *rom_memory)
+   ram_addr_t above_4g_mem_size)
 {
-int linux_boot, i;
-MemoryRegion *option_rom_mr;
+int i;
 FWCfgState *fw_cfg;
 
-linux_boot = (kernel_filename != NULL);
-
-
-/* Initialize PC system firmware */
-pc_system_firmware_init(rom_memory);
-
-option_rom_mr = g_malloc(sizeof(*option_rom_mr));
-memory_region_init_ram(option_rom_mr, pc.rom, PC_ROM_SIZE);
-vmstate_register_ram_global(option_rom_mr);
-memory_region_add_subregion_overlap(rom_memory,
-PC_ROM_MIN_VGA,
-option_rom_mr,
-1);
-
 fw_cfg = bochs_bios_init();
 rom_set_fw(fw_cfg);
 
-if (linux_boot) {
+if (kernel_filename) {
 load_linux(fw_cfg, kernel_filename, initrd_filename, kernel_cmdline, 
below_4g_mem_size);
 }
 
@@ -1380,6 +1363,16 @@ static int memory_controller_init(PCIDevice *dev)
 dev-config[0x72] = 0x02;
 
 cpu_smm_register(c-set_smm, m);
+
+pc_system_firmware_init(m-pci_address_space);
+
+memory_region_init_ram(m-option_roms, pc.rom, PC_ROM_SIZE);
+vmstate_register_ram_global(m-option_roms);
+memory_region_add_subregion_overlap(m-pci_address_space,
+PC_ROM_MIN_VGA,
+m-option_roms,
+1);
+
 return 0;
 }
 
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 2e84cc2..fb056df 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -86,7 +86,6 @@ static void pc_init1(MemoryRegion *system_memory,
 ISADevice *rtc_state;
 ISADevice *floppy;
 MemoryRegion *pci_memory;
-MemoryRegion *rom_memory;
 DeviceState *icc_bridge;
 FWCfgState *fw_cfg = NULL;
 ISAPc *isapc = NULL;
@@ -135,18 +134,10 @@ static void pc_init1(MemoryRegion *system_memory,
 }
 isa_bus_irqs(isa_bus, gsi);
 
-if (pci_enabled) {
-rom_memory = pci_memory;
-} else {
-rom_memory = system_memory;
-}
-
 /* allocate ram and load rom/bios */
 if (!xen_enabled()) {
-fw_cfg = pc_memory_init(system_memory,
-kernel_filename, kernel_cmdline, 
initrd_filename,
-below_4g_mem_size, above_4g_mem_size,
-rom_memory);
+fw_cfg = pc_memory_init(kernel_filename, kernel_cmdline, 
initrd_filename,
+below_4g_mem_size, above_4g_mem_size);
 }
 
 if (kvm_irqchip_in_kernel()) {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index fde83b4..5fe14bb 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -64,7 +64,6 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 BusState *idebus[MAX_SATA_PORTS];
 ISADevice *rtc_state;
 ISADevice *floppy;
-MemoryRegion *rom_memory;
 GSIState *gsi_state;
 ISABus *isa_bus;
 int pci_enabled = 1;
@@ -96,18 +95,10 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 below_4g_mem_size = ram_size;
 }
 
-/* pci enabled */
-if (pci_enabled) {
-rom_memory = q35_host-pci_address_space;
-} else {
-rom_memory = get_system_memory();
-}
-
 /* allocate ram and load rom/bios */
 if (!xen_enabled()) {
-pc_memory_init(get_system_memory(), kernel_filename, kernel_cmdline,
-   initrd_filename, below_4g_mem_size, above_4g_mem_size,
-   rom_memory);
+pc_memory_init(kernel_filename, kernel_cmdline,
+   initrd_filename, below_4g_mem_size, above_4g_mem_size);
 }
 
 /* irq lines */
diff --git a/hw/isa/isa_pc.c b/hw/isa/isa_pc.c
index f73cddb..ae9f2c8 100644
--- a/hw/isa/isa_pc.c
+++ b/hw/isa/isa_pc.c
@@ -1,4 +1,6 @@
+#include hw/i386/pc.h
 #include hw/isa/isa_pc.h
+#include hw/loader.h
 
 static void isa_pc_realize(DeviceState *dev, Error **errp)
 {
@@ -17,6 +19,15 @@ 

[Qemu-devel] [PATCH RFC 11/15] introduce ISAPc

2013-06-20 Thread Hu Tao
Previous refactor breaks isapc. This patch introduces ISAPc and move
isapc initializing code to ISAPc.

Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/i386/pc_piix.c   | 15 +++
 hw/isa/Makefile.objs|  2 +-
 hw/isa/isa_pc.c | 42 ++
 include/hw/isa/isa_pc.h | 26 ++
 4 files changed, 80 insertions(+), 5 deletions(-)
 create mode 100644 hw/isa/isa_pc.c
 create mode 100644 include/hw/isa/isa_pc.h

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 77d7ef0..2e84cc2 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -49,6 +49,7 @@
 #ifdef CONFIG_XEN
 #  include xen/hvm/hvm_info_table.h
 #endif
+#include hw/isa/isa_pc.h
 
 #define MAX_IDE_BUS 2
 
@@ -72,7 +73,7 @@ static void pc_init1(MemoryRegion *system_memory,
 {
 int i;
 ram_addr_t below_4g_mem_size, above_4g_mem_size;
-PCIBus *pci_bus;
+PCIBus *pci_bus = NULL;
 ISABus *isa_bus;
 int piix3_devfn = -1;
 qemu_irq *cpu_irq;
@@ -88,6 +89,7 @@ static void pc_init1(MemoryRegion *system_memory,
 MemoryRegion *rom_memory;
 DeviceState *icc_bridge;
 FWCfgState *fw_cfg = NULL;
+ISAPc *isapc = NULL;
 
 icc_bridge = qdev_create(NULL, TYPE_ICC_BRIDGE);
 object_property_add_child(qdev_get_machine(), icc-bridge,
@@ -122,8 +124,13 @@ static void pc_init1(MemoryRegion *system_memory,
   system_memory, system_io, ram_size,
   pci_memory);
 } else {
-pci_bus = NULL;
-isa_bus = isa_bus_new(NULL, system_io);
+isapc = ISA_PC(object_new(TYPE_ISA_PC));
+isapc-ram_size = ram_size;
+isapc-address_space_mem = system_memory;
+isapc-address_space_io = system_io;
+qdev_init_nofail(DEVICE(isapc));
+
+isa_bus = isapc-bus;
 no_hpet = 1;
 }
 isa_bus_irqs(isa_bus, gsi);
@@ -161,7 +168,7 @@ static void pc_init1(MemoryRegion *system_memory,
 
 pc_register_ferr_irq(gsi[13]);
 
-pc_vga_init(isa_bus, pci_enabled ? pci_bus : NULL);
+pc_vga_init(isa_bus, pci_bus);
 if (xen_enabled()) {
 pci_create_simple(pci_bus, -1, xen-platform);
 }
diff --git a/hw/isa/Makefile.objs b/hw/isa/Makefile.objs
index 193746a..99e6a4f 100644
--- a/hw/isa/Makefile.objs
+++ b/hw/isa/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y += isa-bus.o
+common-obj-y += isa-bus.o isa_pc.o
 common-obj-$(CONFIG_APM) += apm.o
 common-obj-$(CONFIG_I82378) += i82378.o
 common-obj-$(CONFIG_ISA_MMIO) += isa_mmio.o
diff --git a/hw/isa/isa_pc.c b/hw/isa/isa_pc.c
new file mode 100644
index 000..f73cddb
--- /dev/null
+++ b/hw/isa/isa_pc.c
@@ -0,0 +1,42 @@
+#include hw/isa/isa_pc.h
+
+static void isa_pc_realize(DeviceState *dev, Error **errp)
+{
+ISAPc *isapc = ISA_PC(dev);
+
+g_assert(isapc-address_space_mem != NULL);
+g_assert(isapc-address_space_io != NULL);
+g_assert(isapc-ram_size  0);
+
+isapc-bus = isa_bus_new(NULL, isapc-address_space_io);
+
+/* Allocate RAM.  We allocate it as a single memory region and use
+ * aliases to address portions of it, mostly for backwards compatibility
+ * with older qemus that used qemu_ram_alloc().
+ */
+memory_region_init_ram(isapc-ram, pc.ram, isapc-ram_size);
+vmstate_register_ram_global(isapc-ram);
+memory_region_add_subregion(isapc-address_space_mem, 0, isapc-ram);
+}
+
+static void isa_pc_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc-realize = isa_pc_realize;
+dc-no_user = 1;
+}
+
+static const TypeInfo isa_pc_type_info = {
+.name = TYPE_ISA_PC,
+.parent = TYPE_DEVICE,
+.instance_size = sizeof(ISAPc),
+.class_init = isa_pc_class_init,
+};
+
+static void isa_pc_register_types(void)
+{
+type_register_static(isa_pc_type_info);
+}
+
+type_init(isa_pc_register_types)
diff --git a/include/hw/isa/isa_pc.h b/include/hw/isa/isa_pc.h
new file mode 100644
index 000..91a0701
--- /dev/null
+++ b/include/hw/isa/isa_pc.h
@@ -0,0 +1,26 @@
+#ifndef ISA_PC_H
+#define ISA_PC_H
+
+#include qom/object.h
+#include exec/memory.h
+#include hw/isa/isa.h
+
+#define TYPE_ISA_PC isa-pc
+#define ISA_PC(obj) OBJECT_CHECK(ISAPc, (obj), TYPE_ISA_PC)
+
+typedef struct ISAPc ISAPc;
+
+struct ISAPc {
+/* private */
+Object parent_obj;
+/* public */
+
+ISABus *bus;
+
+MemoryRegion *address_space_mem;
+MemoryRegion *address_space_io;
+MemoryRegion ram;
+ram_addr_t ram_size;
+};
+
+#endif
-- 
1.8.3.1




[Qemu-devel] [PATCH RFC 14/15] q35 mch: inherit from MemoryController

2013-06-20 Thread Hu Tao
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/i386/pc_q35.c  |   4 +-
 hw/pci-host/q35.c | 150 +-
 include/hw/pci-host/q35.h |  14 +
 3 files changed, 32 insertions(+), 136 deletions(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index d7d9703..fde83b4 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -120,8 +120,8 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 gsi = qemu_allocate_irqs(gsi_handler, gsi_state, GSI_NUM_PINS);
 }
 
-q35_host-mch.ram_size = ram_size;
-q35_host-mch.system_memory = get_system_memory();
+q35_host-mch.dev.ram_size = ram_size;
+q35_host-mch.dev.system_memory = get_system_memory();
 q35_host-mch.address_space_io = get_system_io();
 /* pci */
 qdev_init_nofail(DEVICE(q35_host));
diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
index 5f99431..0296550 100644
--- a/hw/pci-host/q35.c
+++ b/hw/pci-host/q35.c
@@ -53,10 +53,10 @@ static int q35_host_init(SysBusDevice *dev)
 return -1;
 }
 
-s-mch.pci_address_space = s-pci_address_space;
-pci-bus = pci_bus_new(DEVICE(s), pcie.0,
-s-mch.pci_address_space, s-mch.address_space_io,
-0, TYPE_PCIE_BUS);
+s-mch.dev.pci_address_space = s-pci_address_space;
+pci-bus = pci_bus_new(DEVICE(dev), pcie.0,
+   s-mch.dev.pci_address_space, 
s-mch.address_space_io,
+   0, TYPE_PCIE_BUS);
 qdev_set_parent_bus(DEVICE(s-mch), BUS(pci-bus));
 qdev_init_nofail(DEVICE(s-mch));
 
@@ -106,8 +106,8 @@ static const TypeInfo q35_host_info = {
 /* PCIe MMCFG */
 static void mch_update_pciexbar(MCHPCIState *mch)
 {
-PCIDevice *pci_dev = mch-d;
-BusState *bus = qdev_get_parent_bus(pci_dev-qdev);
+PCIDevice *pci_dev = PCI_DEVICE(mch);
+BusState *bus = qdev_get_parent_bus(DEVICE(pci_dev));
 DeviceState *qdev = bus-parent;
 Q35PCIHost *s = Q35_HOST_DEVICE(qdev);
 
@@ -144,49 +144,18 @@ static void mch_update_pciexbar(MCHPCIState *mch)
 pcie_host_mmcfg_update(s-host, enable, addr, length);
 }
 
-/* PAM */
-static void mch_update_pam(MCHPCIState *mch)
-{
-int i;
-
-memory_region_transaction_begin();
-for (i = 0; i  13; i++) {
-pam_update(mch-pam_regions[i], i,
-   mch-d.config[MCH_HOST_BRIDGE_PAM0 + ((i + 1) / 2)]);
-}
-memory_region_transaction_commit();
-}
-
-/* SMRAM */
-static void mch_update_smram(MCHPCIState *mch)
-{
-memory_region_transaction_begin();
-smram_update(mch-smram_region, mch-d.config[MCH_HOST_BRDIGE_SMRAM],
-mch-smm_enabled);
-memory_region_transaction_commit();
-}
-
-static void mch_set_smm(int smm, void *arg)
-{
-MCHPCIState *mch = arg;
-
-memory_region_transaction_begin();
-smram_set_smm(mch-smm_enabled, smm, mch-d.config[MCH_HOST_BRDIGE_SMRAM],
-mch-smram_region);
-memory_region_transaction_commit();
-}
-
 static void mch_write_config(PCIDevice *d,
-  uint32_t address, uint32_t val, int len)
+ uint32_t address, uint32_t val, int len)
 {
 MCHPCIState *mch = MCH_PCI_DEVICE(d);
+MemoryController *m = MEMORY_CONTROLLER(d);
 
 /* XXX: implement SMRAM.D_LOCK */
 pci_default_write_config(d, address, val, len);
 
 if (ranges_overlap(address, len, MCH_HOST_BRIDGE_PAM0,
MCH_HOST_BRIDGE_PAM_SIZE)) {
-mch_update_pam(mch);
+mc_update_pam(m);
 }
 
 if (ranges_overlap(address, len, MCH_HOST_BRIDGE_PCIEXBAR,
@@ -196,21 +165,23 @@ static void mch_write_config(PCIDevice *d,
 
 if (ranges_overlap(address, len, MCH_HOST_BRDIGE_SMRAM,
MCH_HOST_BRDIGE_SMRAM_SIZE)) {
-mch_update_smram(mch);
+mc_update_smram(m);
 }
 }
 
-static void mch_update(MCHPCIState *mch)
+static void mch_update(MemoryController *d)
 {
+MCHPCIState *mch = MCH_PCI_DEVICE(d);
+
 mch_update_pciexbar(mch);
-mch_update_pam(mch);
-mch_update_smram(mch);
+mc_update_pam(d);
+mc_update_smram(d);
 }
 
 static int mch_post_load(void *opaque, int version_id)
 {
-MCHPCIState *mch = opaque;
-mch_update(mch);
+MemoryController *m = opaque;
+mch_update(m);
 return 0;
 }
 
@@ -221,8 +192,8 @@ static const VMStateDescription vmstate_mch = {
 .minimum_version_id_old = 1,
 .post_load = mch_post_load,
 .fields = (VMStateField []) {
-VMSTATE_PCI_DEVICE(d, MCHPCIState),
-VMSTATE_UINT8(smm_enabled, MCHPCIState),
+VMSTATE_PCI_DEVICE(dev.dev, MCHPCIState),
+VMSTATE_UINT8(dev.smm_enabled, MCHPCIState),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -230,102 +201,39 @@ static const VMStateDescription vmstate_mch = {
 static void mch_reset(DeviceState *qdev)
 {
 PCIDevice *d = PCI_DEVICE(qdev);
-MCHPCIState *mch = MCH_PCI_DEVICE(d);
+MemoryController *m = MEMORY_CONTROLLER(d);
 
  

[Qemu-devel] [PATCH RFC 13/15] i440fx pmc: inherit from MemoryController

2013-06-20 Thread Hu Tao
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/pci-host/piix.c | 159 -
 1 file changed, 22 insertions(+), 137 deletions(-)

diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 44ebe0f..71a9b8b 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -86,19 +86,7 @@ typedef struct PIIX3State {
 OBJECT_CHECK(I440FXPMCState, (obj), TYPE_I440FX_PMC_DEVICE)
 
 struct I440FXPMCState {
-PCIDevice dev;
-MemoryRegion *system_memory;
-MemoryRegion *pci_address_space;
-MemoryRegion *ram_memory;
-MemoryRegion pci_hole;
-MemoryRegion pci_hole_64bit;
-PAMMemoryRegion pam_regions[13];
-MemoryRegion smram_region;
-uint8_t smm_enabled;
-ram_addr_t ram_size;
-MemoryRegion ram;
-MemoryRegion ram_below_4g;
-MemoryRegion ram_above_4g;
+MemoryController dev;
 };
 
 #define TYPE_I440FX_DEVICE i440FX
@@ -133,52 +121,32 @@ static int pci_slot_get_pirq(PCIDevice *pci_dev, int 
pci_intx)
 return (pci_intx + slot_addend)  3;
 }
 
-static void i440fx_pmc_update_memory_mappings(I440FXPMCState *d)
-{
-int i;
-
-memory_region_transaction_begin();
-for (i = 0; i  13; i++) {
-pam_update(d-pam_regions[i], i,
-   d-dev.config[I440FX_PAM + ((i + 1) / 2)]);
-}
-smram_update(d-smram_region, d-dev.config[I440FX_SMRAM], 
d-smm_enabled);
-memory_region_transaction_commit();
-}
-
-static void i440fx_set_smm(int val, void *arg)
+static void i440fx_pmc_reset(DeviceState *d)
 {
-I440FXPMCState *d = arg;
-
-memory_region_transaction_begin();
-smram_set_smm(d-smm_enabled, val, d-dev.config[I440FX_SMRAM],
-  d-smram_region);
-memory_region_transaction_commit();
+mc_update(MEMORY_CONTROLLER(d));
 }
 
-
 static void i440fx_write_config(PCIDevice *dev,
 uint32_t address, uint32_t val, int len)
 {
-I440FXPMCState *d = I440FX_PMC_DEVICE(dev);
-
 /* XXX: implement SMRAM.D_LOCK */
 pci_default_write_config(dev, address, val, len);
 if (ranges_overlap(address, len, I440FX_PAM, I440FX_PAM_SIZE) ||
 range_covers_byte(address, len, I440FX_SMRAM)) {
-i440fx_pmc_update_memory_mappings(d);
+mc_update(MEMORY_CONTROLLER(dev));
 }
 }
 
 static int i440fx_load_old(QEMUFile* f, void *opaque, int version_id)
 {
-I440FXPMCState *d = opaque;
+MemoryController *d = opaque;
 int ret, i;
 
 ret = pci_device_load(d-dev, f);
 if (ret  0)
 return ret;
-i440fx_pmc_update_memory_mappings(d);
+
+mc_update(MEMORY_CONTROLLER(d));
 qemu_get_8s(f, d-smm_enabled);
 
 if (version_id == 2) {
@@ -192,9 +160,9 @@ static int i440fx_load_old(QEMUFile* f, void *opaque, int 
version_id)
 
 static int i440fx_post_load(void *opaque, int version_id)
 {
-I440FXPMCState *d = opaque;
+MemoryController *d = opaque;
 
-i440fx_pmc_update_memory_mappings(d);
+mc_update(d);
 return 0;
 }
 
@@ -206,8 +174,8 @@ static const VMStateDescription vmstate_i440fx_pmc = {
 .load_state_old = i440fx_load_old,
 .post_load = i440fx_post_load,
 .fields  = (VMStateField []) {
-VMSTATE_PCI_DEVICE(dev, I440FXPMCState),
-VMSTATE_UINT8(smm_enabled, I440FXPMCState),
+VMSTATE_PCI_DEVICE(dev.dev, I440FXPMCState),
+VMSTATE_UINT8(dev.smm_enabled, I440FXPMCState),
 VMSTATE_END_OF_LIST()
 }
 };
@@ -230,7 +198,7 @@ static int i440fx_realize(SysBusDevice *dev)
 sysbus_add_io(dev, 0xcfc, s-data_mem);
 sysbus_init_ioports(s-busdev, 0xcfc, 4);
 
-f-pmc.pci_address_space = f-pci_address_space;
+f-pmc.dev.pci_address_space = f-pci_address_space;
 
 qdev_set_parent_bus(DEVICE(f-pmc), BUS(s-bus));
 qdev_init_nofail(DEVICE(f-pmc));
@@ -249,91 +217,6 @@ static void i440fx_initfn(Object *obj)
 memory_region_init(f-pci_address_space, pci, INT64_MAX);
 }
 
-static int i440fx_pmc_initfn(PCIDevice *dev)
-{
-I440FXPMCState *d = I440FX_PMC_DEVICE(dev);
-ram_addr_t ram_size;
-hwaddr below_4g_mem_size, above_4g_mem_size;
-hwaddr pci_hole_start, pci_hole_size;
-hwaddr pci_hole64_start, pci_hole64_size;
-int i;
-
-g_assert(d-system_memory != NULL);
-
-if(d-ram_size  I440FX_PMC_PCI_HOLE) {
-below_4g_mem_size = I440FX_PMC_PCI_HOLE;
-above_4g_mem_size = d-ram_size - I440FX_PMC_PCI_HOLE;
-} else {
-below_4g_mem_size = d-ram_size;
-above_4g_mem_size = 0;
-}
-
-/* Allocate RAM.  We allocate it as a single memory region and use
- * aliases to address portions of it, mostly for backwards compatibility
- * with older qemus that used qemu_ram_alloc().
- */
-memory_region_init_ram(d-ram, pc.ram,
-   below_4g_mem_size + above_4g_mem_size);
-vmstate_register_ram_global(d-ram);
-memory_region_init_alias(d-ram_below_4g, ram-below-4g, d-ram,
- 0, below_4g_mem_size);
-

Re: [Qemu-devel] [PATCH v3] vl.c: Support multiple CPU ranges on -numa option

2013-06-20 Thread Igor Mammedov
On Wed, 19 Jun 2013 10:26:42 -0300
Eduardo Habkost ehabk...@redhat.com wrote:

 On Wed, Jun 19, 2013 at 01:42:52PM +0200, Igor Mammedov wrote:
  On Tue, 18 Jun 2013 16:09:49 -0400
  Bandan Das b...@redhat.com wrote:
  
   
   This allows us to use the cpu property multiple times
   to specify multiple cpu (ranges) to the -numa option :
   
   -numa node,cpu=1,cpu=2,cpu=4
   or
   -numa node,cpu=1-3,cpu=5
   
   Signed-off-by: Bandan Das b...@redhat.com
   ---
v3: Convert to using QemuOpts
Use -cpu rather than -cpus which probably probably makes it more 
   meaningful for non-range arguments
   
   Sorry for reviving this up :)
   
   This is a follow up to earlier proposals sent by Eduardo.
   
   References :
   1. http://lists.gnu.org/archive/html/qemu-devel/2013-02/msg03832.html
   2. https://lists.gnu.org/archive/html/qemu-devel/2013-02/msg03857.html
   
   So, basically the format seemed easier to work with if we are thinking 
   of using QemuOpts for -numa. Using -cpu rather than cpus probably
   makes it less ambiguous as well IMO. However, it's probably not a good 
   idea
   if the current syntax is well established ?
 
 libvirt uses the cpus option already, so we have to keep it working.
Sure, we can leave it as it's now for some time while a new interface is
introduced/adopted. And than later deprecate cpus.

 
  In context of x86, allowing to specify CPU threads using cpu_index isn't 
  correct,
  since node calculated from APIC ID and node it gets from ACPI table could 
  differ.
  
  It could be better for CLI interface to accept socket number and build 
  always
  correct NUMA mapping internally using APIC IDs from CPUs, as it's done in 
  real
  hardware.
  
  In addition it would allow to deprecate use of cpu_index on CLI interface, 
  and
  simplify following re-factoring to use socket/[core/]thread as means to 
  address/
  specify specific CPUs there and later in monitor/qmp interface as well.
 
 What about simply accepting a QOM object path? Today we could only
 accept CPU thread objects (because there are no socket/core objects
 yet), but the day we introduce CPU socket objects, we can change the
 code to accept them without changing the syntax.
It doesn't matter if it's socket=N or QOM path, the idea is not to allow
individual CPU threads there to avoid misconfiguration, but use socket entities 
in some form in interface part. Sockets could be dummy containers for initial
implementation so not to delay sanitizing NUMA code.





[Qemu-devel] [PATCH RFC 07/15] i440fx-pmc: create pci address space

2013-06-20 Thread Hu Tao
Signed-off-by: Hu Tao hu...@cn.fujitsu.com
---
 hw/i386/pc_piix.c| 33 +++--
 hw/pci-host/piix.c   | 16 +---
 include/hw/i386/pc.h |  2 +-
 3 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 3934754..1ca705c 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -109,23 +109,6 @@ static void pc_init1(MemoryRegion *system_memory,
 below_4g_mem_size = ram_size;
 }
 
-if (pci_enabled) {
-pci_memory = g_new(MemoryRegion, 1);
-memory_region_init(pci_memory, pci, INT64_MAX);
-rom_memory = pci_memory;
-} else {
-pci_memory = NULL;
-rom_memory = system_memory;
-}
-
-/* allocate ram and load rom/bios */
-if (!xen_enabled()) {
-fw_cfg = pc_memory_init(system_memory,
-   kernel_filename, kernel_cmdline, initrd_filename,
-   below_4g_mem_size, above_4g_mem_size,
-   rom_memory, ram_memory);
-}
-
 gsi_state = g_malloc0(sizeof(*gsi_state));
 if (kvm_irqchip_in_kernel()) {
 kvm_pc_setup_irq_routing(pci_enabled);
@@ -138,7 +121,7 @@ static void pc_init1(MemoryRegion *system_memory,
 if (pci_enabled) {
 pci_bus = i440fx_init(piix3_devfn, isa_bus, gsi,
   system_memory, system_io, ram_size,
-  pci_memory, ram_memory);
+  pci_memory, ram_memory);
 } else {
 pci_bus = NULL;
 isa_bus = isa_bus_new(NULL, system_io);
@@ -146,6 +129,20 @@ static void pc_init1(MemoryRegion *system_memory,
 }
 isa_bus_irqs(isa_bus, gsi);
 
+if (pci_enabled) {
+rom_memory = pci_memory;
+} else {
+rom_memory = system_memory;
+}
+
+/* allocate ram and load rom/bios */
+if (!xen_enabled()) {
+fw_cfg = pc_memory_init(system_memory,
+kernel_filename, kernel_cmdline, 
initrd_filename,
+below_4g_mem_size, above_4g_mem_size,
+rom_memory, ram_memory);
+}
+
 if (kvm_irqchip_in_kernel()) {
 i8259 = kvm_i8259_init(isa_bus);
 } else if (xen_enabled()) {
diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 7b58d56..604a66e 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -105,7 +105,7 @@ struct I440FXPMCState {
 typedef struct I440FXState {
 PCIHostState parent_obj;
 MemoryRegion *address_space_io;
-MemoryRegion *pci_address_space;
+MemoryRegion pci_address_space;
 
 PIIX3State piix3;
 I440FXPMCState pmc;
@@ -214,7 +214,7 @@ static int i440fx_realize(SysBusDevice *dev)
 PCIHostState *s = PCI_HOST_BRIDGE(dev);
 I440FXState *f = I440FX_DEVICE(dev);
 
-s-bus = pci_bus_new(DEVICE(f), NULL, f-pci_address_space,
+s-bus = pci_bus_new(DEVICE(f), NULL, f-pci_address_space,
  f-address_space_io, 0, TYPE_PCI_BUS);
 
 memory_region_init_io(s-conf_mem, pci_host_conf_le_ops, s,
@@ -227,6 +227,8 @@ static int i440fx_realize(SysBusDevice *dev)
 sysbus_add_io(dev, 0xcfc, s-data_mem);
 sysbus_init_ioports(s-busdev, 0xcfc, 4);
 
+f-pmc.pci_address_space = f-pci_address_space;
+
 qdev_set_parent_bus(DEVICE(f-pmc), BUS(s-bus));
 qdev_init_nofail(DEVICE(f-pmc));
 
@@ -240,6 +242,8 @@ static void i440fx_initfn(Object *obj)
 object_initialize(f-pmc, TYPE_I440FX_PMC_DEVICE);
 object_property_add_child(obj, pmc, OBJECT(f-pmc), NULL);
 qdev_prop_set_uint32(DEVICE(f-pmc), addr, PCI_DEVFN(0, 0));
+
+memory_region_init(f-pci_address_space, pci, INT64_MAX);
 }
 
 static int i440fx_pmc_initfn(PCIDevice *dev)
@@ -251,7 +255,6 @@ static int i440fx_pmc_initfn(PCIDevice *dev)
 int i;
 
 g_assert(d-system_memory != NULL);
-g_assert(d-pci_address_space != NULL);
 g_assert(d-ram_memory != NULL);
 
 if(d-ram_size  I440FX_PMC_PCI_HOLE) {
@@ -313,7 +316,7 @@ static PCIBus *i440fx_common_init(const char *device_name,
   MemoryRegion *address_space_mem,
   MemoryRegion *address_space_io,
   ram_addr_t ram_size,
-  MemoryRegion *pci_address_space,
+  MemoryRegion **pci_address_space,
   MemoryRegion *ram_memory)
 {
 PCIHostState *s;
@@ -325,12 +328,10 @@ static PCIBus *i440fx_common_init(const char *device_name,
 s = PCI_HOST_BRIDGE(i440fx);
 
 i440fx-address_space_io = address_space_io;
-i440fx-pci_address_space = pci_address_space;
 
 f = i440fx-pmc;
 f-ram_size = ram_size;
 f-system_memory = address_space_mem;
-f-pci_address_space = pci_address_space;
 f-ram_memory = ram_memory;
 
 object_property_add_child(qdev_get_machine(), i440fx,
@@ -358,6 +359,7 @@ static PCIBus *i440fx_common_init(const char 

[Qemu-devel] [PATCH 0/1] ARM aarch64 TCG tlb fast lookup

2013-06-20 Thread Jani Kokkonen

This patch implements the TCG tlb fast lookup in 
tcg_out_qemu_ld/st for the aarch64 TCG target.  
Supports also CONFIG_QEMU_LDST_OPTIMIZATION.

Tested running on a x86-64 physical machine running Foundation v8,
running a linux 3.2.0 minimal host system based on linaro v8
image build 0.8.4423 for user space.

Tested guests: arm v5, PPC64, sparc, i386 linux test images.
Also tested on x86-64/linux built with buildroot.

Jani Kokkonen (1):
  tcg/aarch64: Implement tlb lookup fast path

 configure|2 +-
 include/exec/exec-all.h  |   16 +++-
 tcg/aarch64/tcg-target.c |  197 ++
 3 files changed, 162 insertions(+), 53 deletions(-)

-- 
1.7.9.5




[Qemu-devel] [PATCH 1/1] tcg/aarch64: Implement tlb lookup fast path

2013-06-20 Thread Jani Kokkonen
Supports CONFIG_QEMU_LDST_OPTIMIZATION

Signed-off-by: Jani Kokkonen jani.kokko...@huawei.com
---
 configure|2 +-
 include/exec/exec-all.h  |   16 +++-
 tcg/aarch64/tcg-target.c |  197 ++
 3 files changed, 162 insertions(+), 53 deletions(-)

diff --git a/configure b/configure
index bed0104..6a36682 100755
--- a/configure
+++ b/configure
@@ -3599,7 +3599,7 @@ echo libs_softmmu=$libs_softmmu  $config_host_mak
 echo ARCH=$ARCH  $config_host_mak
 
 case $cpu in
-  arm|i386|x86_64|ppc)
+  arm|i386|x86_64|ppc|aarch64)
 # The TCG interpreter currently does not support ld/st optimization.
 if test $tcg_interpreter = no ; then
 echo CONFIG_QEMU_LDST_OPTIMIZATION=y  $config_host_mak
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 5c31863..39c28d1 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -19,9 +19,7 @@
 
 #ifndef _EXEC_ALL_H_
 #define _EXEC_ALL_H_
-
 #include qemu-common.h
-
 /* allow to see translation results - the slowdown should be negligible, so we 
leave it */
 #define DEBUG_DISAS
 
@@ -358,6 +356,20 @@ static inline uintptr_t tcg_getpc_ldst(uintptr_t ra)
not the start of the next opcode  */
 return ra;
 }
+#elif defined(__aarch64__)
+#  define GETRA()   ((uintptr_t)__builtin_return_address(0))
+#  define GETPC_LDST()  tcg_getpc_ldst(GETRA())
+static inline uintptr_t tcg_getpc_ldst(uintptr_t ra)
+{
+int32_t b;
+ra += 4;/* skip one instruction */
+b = *(int32_t *)ra; /* load the branch insn */
+b = (b  6)  (6 - 2);/* extract the displacement */
+ra += b;/* apply the displacement  */
+ra -= 4;/* return a pointer into the current opcode,
+   not the start of the next opcode  */
+return ra;
+}
 # else
 #  error CONFIG_QEMU_LDST_OPTIMIZATION needs GETPC_LDST() implementation!
 # endif
diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 8bb195e..03da082 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -706,6 +706,41 @@ static inline void tcg_out_uxt(TCGContext *s, int s_bits,
 tcg_out_ubfm(s, 0, rd, rn, 0, bits);
 }
 
+static inline void tcg_out_addi(TCGContext *s, int ext,
+TCGReg rd, TCGReg rn, unsigned int aimm)
+{
+/* add immediate aimm unsigned 12bit value (with LSL 0 or 12) */
+/* using ADD 0x1100 | (ext) | (aimm  10) | (rn  5) | rd */
+unsigned int base = ext ? 0x9100 : 0x1100;
+
+if (aimm = 0xfff) {
+aimm = 10;
+} else {
+/* we can only shift left by 12, on assert we cannot represent */
+assert(!(aimm  0xfff));
+assert(aimm = 0xfff000);
+base |= 1  22; /* apply LSL 12 */
+aimm = 2;
+}
+
+tcg_out32(s, base | aimm | (rn  5) | rd);
+}
+
+static inline void tcg_out_subi(TCGContext *s, int ext,
+TCGReg rd, TCGReg rn, unsigned int aimm)
+{
+/* sub immediate aimm unsigned 12bit value (we use LSL 0 - no shift) */
+/* using SUB 0x5100 | (ext) | (aimm  10) | (rn  5) | rd */
+unsigned int base = ext ? 0xd100 : 0x5100;
+assert(aimm = 0xfff);
+tcg_out32(s, base | (aimm  10) | (rn  5) | rd);
+}
+
+static inline void tcg_out_nop(TCGContext *s)
+{
+tcg_out32(s, 0xd503201f);
+}
+
 #ifdef CONFIG_SOFTMMU
 #include exec/softmmu_defs.h
 
@@ -727,7 +762,106 @@ static const void * const qemu_st_helpers[4] = {
 helper_stq_mmu,
 };
 
-#else /* !CONFIG_SOFTMMU */
+static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
+{
+reloc_pc19(lb-label_ptr[0], (tcg_target_long)s-code_ptr);
+tcg_out_movr(s, 1, TCG_REG_X0, TCG_AREG0);
+tcg_out_movr(s, (TARGET_LONG_BITS == 64), TCG_REG_X1, lb-addrlo_reg);
+tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, lb-mem_index);
+tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP,
+ (tcg_target_long)qemu_ld_helpers[lb-opc  3]);
+tcg_out_callr(s, TCG_REG_TMP);
+if (lb-opc  0x04) {
+tcg_out_sxt(s, 1, lb-opc  3, lb-datalo_reg, TCG_REG_X0);
+} else {
+tcg_out_movr(s, 1, lb-datalo_reg, TCG_REG_X0);
+}
+
+tcg_out_goto(s, (tcg_target_long)lb-raddr);
+}
+
+static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
+{
+reloc_pc19(lb-label_ptr[0], (tcg_target_long)s-code_ptr);
+
+tcg_out_movr(s, 1, TCG_REG_X0, TCG_AREG0);
+tcg_out_movr(s, (TARGET_LONG_BITS == 64), TCG_REG_X1, lb-addrlo_reg);
+tcg_out_movr(s, 1, TCG_REG_X2, lb-datalo_reg);
+tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X3, lb-mem_index);
+tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP,
+ (tcg_target_long)qemu_st_helpers[lb-opc  3]);
+tcg_out_callr(s, TCG_REG_TMP);
+
+tcg_out_nop(s);
+tcg_out_goto(s, (tcg_target_long)lb-raddr);
+}
+
+void tcg_out_tb_finalize(TCGContext *s)
+{
+int i;
+  

Re: [Qemu-devel] [PATCH v3 1/2] pvpanic: initialization cleanup

2013-06-20 Thread Stefano Stabellini
On Wed, 19 Jun 2013, Michael S. Tsirkin wrote:
 -int pvpanic_init(ISABus *bus)
 +void pvpanic_init(ISABus *bus)
  {
 -isa_create_simple(bus, TYPE_ISA_PVPANIC_DEVICE);
 -return 0;
 +ISADevice *dev;
 +FWCfgState *fw_cfg = fw_cfg_find();
 +if (!fw_cfg) {
 +return;
 +}
 +dev = isa_create_simple (bus, TYPE_ISA_PVPANIC_DEVICE);
 +pvpanic_fw_cfg(dev, fw_cfg);
  }

looks OK to me



Re: [Qemu-devel] [PATCH 1/2] PPC: Add dump_mmu() for 6xx

2013-06-20 Thread Alexander Graf

On 18.06.2013, at 18:04, Fabien Chouteau wrote:

 On 06/18/2013 05:31 PM, Alexander Graf wrote:
 +for (type = 0; type  2; type++)
 
 You need braces on these. Please run your patch through checkpatch.pl :).
 
 I did ;)
 
 - ./scripts/checkpatch.pl 0001-PPC-Add-dump_mmu-for-6xx.patch
 total: 0 errors, 0 warnings, 51 lines checked
 
 0001-PPC-Add-dump_mmu-for-6xx.patch has no obvious style problems and is 
 ready for submission.

Meh - broken script :). According to the CODING_STYLE convention all of the 
above need to be cluttered with braces ;).

 
 
 +for (way = 0; way  env-nb_ways; way++)
 +for (entry = env-nb_tlb * type + env-tlb_per_way * way;
 + entry  (env-nb_tlb * type + env-tlb_per_way * (way + 
 1));
 + entry++) {
 +
 +tlb = env-tlb.tlb6[entry];
 +cpu_fprintf(f, TLB %02d/%02d %s way:%d %s [
 +TARGET_FMT_lx   TARGET_FMT_lx ]\n,
 +entry % env-nb_tlb, env-nb_tlb,
 +type ? code : data, way,
 +pte_is_valid(tlb-pte0) ? valid : inval,
 +tlb-EPN, tlb-EPN + TARGET_PAGE_SIZE);
 +}
 
 I thought 6xx and 74xx also support HTAB and SRs? Shouldn't we dump those as 
 well?
 
 
 I don't know what that is, can you send me an example of what the printf line 
 should be?

SRs are similar to the SLB that book3s_64 print out. Just that there are a 
fixed smaller number of them (16). Basically you'd dump the env-sr array, 
similar to how the debug functions in get_segment_6xx_tlb() dump it.

For the HTAB I think SDR1 should be enough, so you don't need to do too much 
here. If you like, you can just dump the decoded fields env-htab_base and 
env-htab_mask. Dumping the whole HTAB would just explode the output.

However, you also should definitely dump all (valid) BATs. Check out 
get_bat_6xx_tlb() for debug code that dumps BATs.


Alex




Re: [Qemu-devel] [PATCH v3] vl.c: Support multiple CPU ranges on -numa option

2013-06-20 Thread Michael Tokarev
20.06.2013 13:52, Paolo Bonzini wrote:
 Il 20/06/2013 11:30, Igor Mammedov ha scritto:
 libvirt uses the cpus option already, so we have to keep it working.
 Sure, we can leave it as it's now for some time while a new interface is
 introduced/adopted. And than later deprecate cpus.
 
 So, you used a new name because the new behavior of -numa
 node,cpus=1-2,cpus=3-4 would be incompatible with the old.

BTW, as I tried to touch exactly the same place yesterday (trying
to convert it to QemuOpts) -- what does this node mean?

For example, with

  -device [type=]devicetype,foo=bar,xzy=abc

this creates a new device for each invocation of option.  But
what does this `-numa node' mean?  Can there be anything else
besides node?  Why it is needed/used for?

This -numa option is the last one which uses the old option
parsing mechanism (there's also some smbios-related thing
but it's simple to convert, I almost got it ready yesterday),
but it is rather non-standard.

Thanks,

/mjt



Re: [Qemu-devel] [PATCH v5 03/11] block: add basic backup support to block driver

2013-06-20 Thread Stefan Hajnoczi
On Wed, Jun 19, 2013 at 01:14:17PM +0200, Paolo Bonzini wrote:
 Il 19/06/2013 12:50, Kevin Wolf ha scritto:
   +/* Publish progress */
   +job-sectors_read += n;
   +job-common.offset += n * BDRV_SECTOR_SIZE;
  This is interesting, because the function is not only called by the
  background job, but also by write notifiers. So 'offset' in a literal
  sense doesn't make too much sense because we're not operating purely
  sequential.
  
  The QAPI documentation describes 'offset' like this:
  
  # @offset: the current progress value
  
  If we take it as just that, I think we could actually consider this code
  correct, because it's a useful measure for the progress (each sector is
  copied only once, either by the job or by a notifier), even though it
  really has nothing to do with an offset into the image.
 
 Yes, this is similar to what we do for mirroring.  I think it is a feature.

Yes, we explicitly defined offset to be an opaque progress value for
this reason.

I will still add a comment since the name offset does make the
drive-backup code suggest the wrong thing.

Stefan



Re: [Qemu-devel] [PATCH v5 03/11] block: add basic backup support to block driver

2013-06-20 Thread Stefan Hajnoczi
On Wed, Jun 19, 2013 at 01:19:34PM +0200, Paolo Bonzini wrote:
 Il 19/06/2013 12:50, Kevin Wolf ha scritto:
   +
   +DPRINTF(%s enter %s C% PRId64  % PRId64  %d\n,
   +__func__, bdrv_get_device_name(bs), start, sector_num, 
   nb_sectors);
  Maybe put the first %s and __func__ directly into the DPRINTF macro?
  
 
 Or just use tracepoints.  backup_do_cow could definitely be one, and it
 would subsume another DPRINTF (backup_run loop).
 
 hbitmap_get and block_job_completed are two other useful tracepoint that
 is not present.
 
 All that's left then are the DPRINTF for failed readv and writev, which
 could also be useful in generic code (bdrv_co_*_done).
 
 Can be done as a follow-up, of course.

I need to respin anyway.  The only reason for DPRINTF() is that I
originally wanted to change as little as possible from the orginal
patch.  But we've gone so far we might as well do this too :).

Stefan



[Qemu-devel] [PATCH] Graphics: Switch to 800x600x32 as default mode

2013-06-20 Thread Alexander Graf
We have stayed at 800x600x15 as default graphics mode for the last 9 years.
If there ever was a reason to be there, surely nobody remembers it.

However, recently non-Linux PPC guests started to show bad effects on 15 bit
color mode. They do work just fine with 32 bits however.

So let's switch to 32 bit color as the default graphic mode.

Reported-by: Mark Cave-Ayland mark.cave-ayl...@ilande.co.uk
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch_init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch_init.c b/arch_init.c
index a8b91ee..8ba3d07 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -65,7 +65,7 @@ int graphic_depth = 8;
 #else
 int graphic_width = 800;
 int graphic_height = 600;
-int graphic_depth = 15;
+int graphic_depth = 32;
 #endif
 
 
-- 
1.8.1.4




Re: [Qemu-devel] [PATCH v5 03/11] block: add basic backup support to block driver

2013-06-20 Thread Stefan Hajnoczi
On Wed, Jun 19, 2013 at 12:50:16PM +0200, Kevin Wolf wrote:
 Am 30.05.2013 um 14:34 hat Stefan Hajnoczi geschrieben:
  +DPRINTF(%s enter %s C% PRId64  % PRId64  %d\n,
  +__func__, bdrv_get_device_name(bs), start, sector_num, 
  nb_sectors);
 
 Maybe put the first %s and __func__ directly into the DPRINTF macro?

Will use trace events instead.

  +wait_for_overlapping_requests(job, start, end);
  +cow_request_begin(cow_request, job, start, end);
  +
  +total_sectors = bdrv_getlength(bs);
  +if (total_sectors  0) {
  +if (error_is_read) {
  +*error_is_read = true;
  +}
  +ret = total_sectors;
  +goto out;
  +}
  +total_sectors /= BDRV_SECTOR_SIZE;
 
 Why is this needed? There are two callers of the function: One is the
 write notifier, which has already a sector number that is checked
 against the image size. The other one is background job that gets the
 size once when it starts. I don't think there's a reason to call the
 block driver each time and potentially do an expensive request.
 
 At first I thought it has something to do with resizing images, but it's
 forbidden while a block job is running (otherwise the job's main loop
 exit condition would be wrong, too), so that doesn't make it necessary.

Thanks, I will eliminate the call.

  +/* Publish progress */
  +job-sectors_read += n;
  +job-common.offset += n * BDRV_SECTOR_SIZE;
 
 This is interesting, because the function is not only called by the
 background job, but also by write notifiers. So 'offset' in a literal
 sense doesn't make too much sense because we're not operating purely
 sequential.
 
 The QAPI documentation describes 'offset' like this:
 
 # @offset: the current progress value
 
 If we take it as just that, I think we could actually consider this code
 correct, because it's a useful measure for the progress (each sector is
 copied only once, either by the job or by a notifier), even though it
 really has nothing to do with an offset into the image.
 
 Maybe a comment would be appropriate.

Will add the comment.

  +static BlockJobType backup_job_type = {
  +.instance_size = sizeof(BackupBlockJob),
  +.job_type = backup,
  +.set_speed = backup_set_speed,
  +.iostatus_reset = backup_iostatus_reset,
  +};
 
 Align = on the same column? Should probably be const, too.

Thanks for pointing the const out.  I'll throw in alignment as a bonus,
there is a team of ASCII artists here who do amazing whitespace work ;).

  +static void coroutine_fn backup_run(void *opaque)
  +{
  +BackupBlockJob *job = opaque;
  +BlockDriverState *bs = job-common.bs;
  +BlockDriverState *target = job-target;
  +BlockdevOnError on_target_error = job-on_target_error;
  +NotifierWithReturn before_write = {
  +.notify = backup_before_write_notify,
  +};
  +int64_t start, end;
  +int ret = 0;
  +
  +QLIST_INIT(job-inflight_reqs);
  +qemu_co_rwlock_init(job-flush_rwlock);
  +
  +start = 0;
  +end = DIV_ROUND_UP(bdrv_getlength(bs) / BDRV_SECTOR_SIZE,
 
 bdrv_getlength() can fail.

Will fix.

  +   BACKUP_SECTORS_PER_CLUSTER);
  +
  +job-bitmap = hbitmap_alloc(end, 0);
  +
  +bdrv_set_on_error(target, on_target_error, on_target_error);
  +bdrv_iostatus_enable(target);
 
 Mirroring also has:
 
 bdrv_set_enable_write_cache(s-target, true);
 
 Wouldn't it make sense for backup as well?

I guess so.

  +/* we need to yield so that qemu_aio_flush() returns.
  + * (without, VM does not reboot)
  + */
  +if (job-common.speed) {
  +uint64_t delay_ns = ratelimit_calculate_delay(
  +job-limit, job-sectors_read);
  +job-sectors_read = 0;
  +block_job_sleep_ns(job-common, rt_clock, delay_ns);
 
 Here's the other implication of counting the progress of copies
 initiated by write notifiers: The more copies they trigger, the less
 additional copies are made by the background job.
 
 I'm willing to count this as a feature rather than a bug.

Yes, the guest does not get throttled by the block job.

  +} else {
  +block_job_sleep_ns(job-common, rt_clock, 0);
  +}
  +
  +if (block_job_is_cancelled(job-common)) {
  +break;
  +}
  +
  +DPRINTF(backup_run loop C% PRId64 \n, start);
  +
  +ret = backup_do_cow(bs, start * BACKUP_SECTORS_PER_CLUSTER, 1,
  +error_is_read);
 
 You're taking advantage of the fact that backup_do_cow() rounds this one
 sector up to 64k, which is a much more reasonable size. But why not
 specify BACKUP_SECTORS_PER_CLUSTER as nb_sectors when this is what you
 really assume?

Sounds good though I need to double-check if we run into issues when
hitting the end of the disk (if not aligned to
BACKUP_SECTORS_PER_CLUSTER).



Re: [Qemu-devel] [PATCH] Graphics: Switch to 800x600x32 as default mode

2013-06-20 Thread Peter Maydell
On 20 June 2013 13:09, Alexander Graf ag...@suse.de wrote:
 We have stayed at 800x600x15 as default graphics mode for the last 9 years.
 If there ever was a reason to be there, surely nobody remembers it.

 However, recently non-Linux PPC guests started to show bad effects on 15 bit
 color mode. They do work just fine with 32 bits however.

Have we identified what the actual problem with these guests is?
32 bit might be a more sensible default but it masks a bug
doesn't seem like a very solid justification for the change...

thanks
-- PMM



Re: [Qemu-devel] [PATCH v5 06/11] block: add drive-backup QMP command

2013-06-20 Thread Stefan Hajnoczi
On Wed, Jun 19, 2013 at 01:07:42PM +0200, Kevin Wolf wrote:
 Am 30.05.2013 um 14:34 hat Stefan Hajnoczi geschrieben:
  @drive-backup
  
  Start a point-in-time copy of a block device to a new destination.  The
  status of ongoing drive-backup operations can be checked with
  query-block-jobs where the BlockJobInfo.type field has the value 'backup'.
  The operation can be stopped before it has completed using the
  block-job-cancel command.
  
  @device: the name of the device which should be copied.
  
  @target: the target of the new image. If the file exists, or if it
   is a device, the existing file/device will be used as the new
   destination.  If it does not exist, a new file will be created.
  
  @format: #optional the format of the new destination, default is to
   probe if @mode is 'existing', else the format of the source
  
  @mode: #optional whether and how QEMU should create a new image, default is
 'absolute-paths'.
  
  @speed: #optional the maximum speed, in bytes per second
  
  @on-source-error: #optional the action to take on an error on the source,
default 'report'.  'stop' and 'enospc' can only be used
if the block device supports io-status (see BlockInfo).
  
  @on-target-error: #optional the action to take on an error on the target,
default 'report' (no limitations, since this applies to
a different block device than @device).
  
  Note that @on-source-error and @on-target-error only affect background I/O.
  If an error occurs during a guest write request, the device's rerror/werror
  actions will be used.
  
  Returns: nothing on success
   If @device is not a valid block device, DeviceNotFound
  
  Since 1.6
  
  Signed-off-by: Stefan Hajnoczi stefa...@redhat.com
 
 Reviewed-by: Kevin Wolf kw...@redhat.com
 
 But what about HMP, the series doesn't seem to have a separate patch to
 add a command there?

I can send it as a follow-up if you want.  The test cases use QMP and so
do management tools - I didn't feel the need for HMP.

Stefan



Re: [Qemu-devel] [PATCH v2] s390: Implement dump-guest-memory support for target s390x

2013-06-20 Thread Alexander Graf

On 12.06.2013, at 10:09, Jens Freimann wrote:

 From: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
 
 With this patch dump-guest-memory on s390 produces an ELF formatted,
 crash-readable dump. 
 In order to implement this, the arch-specific part of dump-guest-memory
 was added:
 target-s390x/arch_dump.c contains the whole set of function for writing
 Elf note sections of all types for s390x.
 
 Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
 Signed-off-by: Jens Freimann jf...@linux.vnet.ibm.com

 
 

[...]

 diff --git a/target-s390x/cpu-qom.h b/target-s390x/cpu-qom.h
 index 34d45c2..4dfc8f1 100644
 --- a/target-s390x/cpu-qom.h
 +++ b/target-s390x/cpu-qom.h
 @@ -72,5 +72,14 @@ static inline S390CPU *s390_env_get_cpu(CPUS390XState *env)
 #define ENV_OFFSET offsetof(S390CPU, env)
 
 void s390_cpu_do_interrupt(CPUState *cpu);
 +int s390_cpu_write_elf64_note(WriteCoreDumpFunction f, CPUState *cs,
 + int cpuid, void *opaque);
 +
 +static inline int s390_cpu_write_elf64_qemunote(WriteCoreDumpFunction f,
 +CPUState *env, void *opaque)
 +{
 +return 0;
 +}

This one looks odd to me. cpu-qom.h shouldn't really have a static inline to 
not write notes :). Either put the function into arch_dump.c as well and only 
export the header here or drop the whole thing and make sure that 
write_elf64_qemunote == NULL still works.

The rest looks good, without any guarantees that the format is actually 
correctly implemented.


Alex

 +
 
 #endif
 diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
 index 23fe51f..9376a6c 100644
 --- a/target-s390x/cpu.c
 +++ b/target-s390x/cpu.c
 @@ -171,6 +171,10 @@ static void s390_cpu_class_init(ObjectClass *oc, void 
 *data)
 
 cc-do_interrupt = s390_cpu_do_interrupt;
 dc-vmsd = vmstate_s390_cpu;
 +#ifndef CONFIG_USER_ONLY
 +cc-write_elf64_note = s390_cpu_write_elf64_note;
 +cc-write_elf64_qemunote = s390_cpu_write_elf64_qemunote;
 +#endif
 }
 
 static const TypeInfo s390_cpu_type_info = {
 -- 
 1.8.1.6
 
 




Re: [Qemu-devel] [PATCH] Graphics: Switch to 800x600x32 as default mode

2013-06-20 Thread Alexander Graf

On 20.06.2013, at 14:19, Peter Maydell wrote:

 On 20 June 2013 13:09, Alexander Graf ag...@suse.de wrote:
 We have stayed at 800x600x15 as default graphics mode for the last 9 years.
 If there ever was a reason to be there, surely nobody remembers it.
 
 However, recently non-Linux PPC guests started to show bad effects on 15 bit
 color mode. They do work just fine with 32 bits however.
 
 Have we identified what the actual problem with these guests is?
 32 bit might be a more sensible default but it masks a bug
 doesn't seem like a very solid justification for the change...

Since Linux works just fine, I'd say it falls under the category of QEMU 
diverges from typical test systems.

Mark, have you explored the breakages any deeper?


Alex




Re: [Qemu-devel] [PATCH] Add Xen platform PCI device version 2.

2013-06-20 Thread Anthony Liguori
Paul Durrant paul.durr...@citrix.com writes:

 The XenServer 6.1+ Citrix Windows PV bus driver binds to a new version
 of the Xen platform device (since the newer driver set cannot co-exist
 with previous drivers which bind to the existing xen-platform type of
 device). This patch introduces a new xen-platform-2 device type with
 the appropriate device_id and revision.

 Signed-off-by: Paul Durrant paul.durr...@citrix.com
 ---
  hw/xen/xen_platform.c|   75 
 ++
  include/hw/pci/pci_ids.h |1 +
  2 files changed, 63 insertions(+), 13 deletions(-)

 diff --git a/hw/xen/xen_platform.c b/hw/xen/xen_platform.c
 index b6c6793..6edb850 100644
 --- a/hw/xen/xen_platform.c
 +++ b/hw/xen/xen_platform.c
 @@ -48,6 +48,20 @@
  
  #define PFFLAG_ROM_LOCK 1 /* Sets whether ROM memory area is RW or RO */
  
 +typedef struct {
 +const char *name;
 +const char *desc;
 +uint16_t device_id;
 +uint8_t revision;
 +uint16_t subsystem_vendor_id;
 +uint16_t subsystem_id;
 +} PCIXenPlatformDeviceInfo;
 +
 +typedef struct PCIXenPlatformDeviceClass {
 +PCIDeviceClass  parent_class;
 +PCIXenPlatformDeviceInfoinfo;
 +} PCIXenPlatformDeviceClass;
 +
  typedef struct PCIXenPlatformState {
  PCIDevice  pci_dev;
  MemoryRegion fixed_io;
 @@ -372,8 +386,13 @@ static const VMStateDescription vmstate_xen_platform = {
  static int xen_platform_initfn(PCIDevice *dev)
  {
  PCIXenPlatformState *d = DO_UPCAST(PCIXenPlatformState, pci_dev, dev);
 +PCIDeviceClass *k = PCI_DEVICE_GET_CLASS(dev);
 +__attribute__((unused)) PCIXenPlatformDeviceClass *u;
  uint8_t *pci_conf;
  
 +u = container_of(k, PCIXenPlatformDeviceClass, parent_class);
 +DPRINTF(initializing %s\n, u-info.name);
 +
  pci_conf = d-pci_dev.config;
  
  pci_set_word(pci_conf + PCI_COMMAND, PCI_COMMAND_IO | 
 PCI_COMMAND_MEMORY);
 @@ -402,33 +421,63 @@ static void platform_reset(DeviceState *dev)
  platform_fixed_ioport_reset(s);
  }
  
 +static PCIXenPlatformDeviceInfo platform_devices[] = {
 +{
 +.name = xen-platform,
 +.desc = XEN platform pci device (version 1),
 +.device_id = PCI_DEVICE_ID_XEN_PLATFORM,
 +.revision = 1,
 +.subsystem_vendor_id = PCI_VENDOR_ID_XEN,
 +.subsystem_id = PCI_DEVICE_ID_XEN_PLATFORM,
 +}, {
 +.name = xen-platform-2,
 +.desc = XEN platform pci device (version 2),
 +.device_id = PCI_DEVICE_ID_XEN_PLATFORM_V2,
 +.revision = 2,
 +.subsystem_vendor_id = PCI_VENDOR_ID_XEN,
 +.subsystem_id = PCI_DEVICE_ID_XEN_PLATFORM_V2,
 +}
 +};
 +
  static void xen_platform_class_init(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
  PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 +PCIXenPlatformDeviceInfo *info = data;
 +PCIXenPlatformDeviceClass *u;
 +
 +u = container_of(k, PCIXenPlatformDeviceClass, parent_class);
  
  k-init = xen_platform_initfn;
  k-vendor_id = PCI_VENDOR_ID_XEN;
 -k-device_id = PCI_DEVICE_ID_XEN_PLATFORM;
 +k-device_id = info-device_id;
  k-class_id = PCI_CLASS_OTHERS  8 | 0x80;
 -k-subsystem_vendor_id = PCI_VENDOR_ID_XEN;
 -k-subsystem_id = PCI_DEVICE_ID_XEN_PLATFORM;
 -k-revision = 1;
 -dc-desc = XEN platform pci device;
 +k-subsystem_vendor_id = info-subsystem_vendor_id;
 +k-subsystem_id = info-subsystem_id;
 +k-revision = info-revision;
 +dc-desc = info-desc;
  dc-reset = platform_reset;
  dc-vmsd = vmstate_xen_platform;
 +u-info = *info;
  }
  
 -static const TypeInfo xen_platform_info = {
 -.name  = xen-platform,
 -.parent= TYPE_PCI_DEVICE,
 -.instance_size = sizeof(PCIXenPlatformState),
 -.class_init= xen_platform_class_init,
 -};
 -
  static void xen_platform_register_types(void)
  {
 -type_register_static(xen_platform_info);
 +TypeInfo type_info = {
 +.parent= TYPE_PCI_DEVICE,
 +.instance_size = sizeof(PCIXenPlatformState),
 +.class_size= sizeof(PCIXenPlatformDeviceClass),
 +.class_init= xen_platform_class_init,
 +};
 +int i;
 +for (i = 0; i  ARRAY_SIZE(platform_devices); i++) {
 +PCIXenPlatformDeviceInfo *info = platform_devices[i];
 +
 +type_info.name = info-name;
 +type_info.class_data = info;
 +
 +type_register(type_info);
 +}

I can't tell if this is an RFC or meant a complete patch.  But the
approach you're taking is overly complex.  v2 of the device can just
derive from v1 and in the class_init function change the PCI
information.

Also, if you are going to be adding logic for v2, you should use QOM
cast macros, not container_of.

I don't understand why two devices are required here and the thread
doesn't really answer that either.  Is there a spec for the Xen platform
devices?  Take a look at docs/specs for some examples in the tree.


Re: [Qemu-devel] [RFC PATCH 0/4] per-object libraries

2013-06-20 Thread Paolo Bonzini
Il 20/06/2013 12:06, Peter Maydell ha scritto:
  This only leaves Darwin.  I have no idea about that, and I don't have
  anymore a machine to test it.  Andreas or Peter, can you shed light?
 If you have something concrete you'd like me to test I can test it.

You can modify the configure probe to detect GNU libtool (Mac OS X calls
it glibtool, IIRC), and attach the log of make libcacard.la V=1.

Paolo



Re: [Qemu-devel] [PATCH] Add Xen platform PCI device version 2.

2013-06-20 Thread Paul Durrant
 -Original Message-
 I don't understand why two devices are required here and the thread
 doesn't really answer that either.  Is there a spec for the Xen platform
 devices?  Take a look at docs/specs for some examples in the tree.
 
 It certainly helps to have one for discussions like this.
 

Anthony,

  I'm going to take a different approach so please disregard this patch now.

  Paul



[Qemu-devel] KVM call agenda for 2013-06-25

2013-06-20 Thread Michael S. Tsirkin
Please, send any topic that you are interested in covering.

Thanks, MST

-- 
MST



Re: [Qemu-devel] [RFC PATCH 0/4] per-object libraries

2013-06-20 Thread Peter Maydell
On 20 June 2013 13:39, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 20/06/2013 12:06, Peter Maydell ha scritto:
  This only leaves Darwin.  I have no idea about that, and I don't have
  anymore a machine to test it.  Andreas or Peter, can you shed light?
 If you have something concrete you'd like me to test I can test it.

 You can modify the configure probe to detect GNU libtool (Mac OS X calls
 it glibtool, IIRC), and attach the log of make libcacard.la V=1.

Unfortunately MacOSX doesn't have the 'nss' prereq so
CONFIG_SMARTCARD_NSS isn't set and make just says
make: *** No rule to make target `libcacard.la'.  Stop.

-- PMM



Re: [Qemu-devel] [RFC PATCH v6 3/3] Force auto-convegence of live migration

2013-06-20 Thread Paolo Bonzini
Il 14/06/2013 15:58, Chegu Vinod ha scritto:
 If a user chooses to turn on the auto-converge migration capability
 these changes detect the lack of convergence and throttle down the
 guest. i.e. force the VCPUs out of the guest for some duration
 and let the migration thread catchup and help converge.

Hi Vinod,

pretty much the same comments I sent you yesterday on the obsolete
version of the patch still apply.

 Verified the convergence using the following :
  - Java Warehouse workload running on a 20VCPU/256G guest(~80% busy)
  - OLTP like workload running on a 80VCPU/512G guest (~80% busy)
 
 Sample results with Java warehouse workload : (migrate speed set to 20Gb and
 migrate downtime set to 4seconds).
 
  (qemu) info migrate
  capabilities: xbzrle: off auto-converge: off  
  Migration status: active
  total time: 1487503 milliseconds
  expected downtime: 519 milliseconds
  transferred ram: 383749347 kbytes
  remaining ram: 2753372 kbytes
  total ram: 268444224 kbytes
  duplicate: 65461532 pages
  skipped: 64901568 pages
  normal: 95750218 pages
  normal bytes: 383000872 kbytes
  dirty pages rate: 67551 pages
 
  ---
 
  (qemu) info migrate
  capabilities: xbzrle: off auto-converge: on   
  Migration status: completed
  total time: 241161 milliseconds
  downtime: 6373 milliseconds
  transferred ram: 28235307 kbytes
  remaining ram: 0 kbytes
  total ram: 268444224 kbytes
  duplicate: 64946416 pages
  skipped: 64903523 pages
  normal: 7044971 pages
  normal bytes: 28179884 kbytes
 
 Signed-off-by: Chegu Vinod chegu_vi...@hp.com
 ---
  arch_init.c |   85 
 +++
  1 files changed, 85 insertions(+), 0 deletions(-)
 
 diff --git a/arch_init.c b/arch_init.c
 index 5d32ecf..69c6c8c 100644
 --- a/arch_init.c
 +++ b/arch_init.c
 @@ -104,6 +104,8 @@ int graphic_depth = 15;
  #endif
  
  const uint32_t arch_type = QEMU_ARCH;
 +static bool mig_throttle_on;
 +static void throttle_down_guest_to_converge(void);
  
  /***/
  /* ram save/restore */
 @@ -378,8 +380,15 @@ static void migration_bitmap_sync(void)
  uint64_t num_dirty_pages_init = migration_dirty_pages;
  MigrationState *s = migrate_get_current();
  static int64_t start_time;
 +static int64_t bytes_xfer_prev;
  static int64_t num_dirty_pages_period;
  int64_t end_time;
 +int64_t bytes_xfer_now;
 +static int dirty_rate_high_cnt;
 +
 +if (!bytes_xfer_prev) {
 +bytes_xfer_prev = ram_bytes_transferred();
 +}
  
  if (!start_time) {
  start_time = qemu_get_clock_ms(rt_clock);
 @@ -404,6 +413,23 @@ static void migration_bitmap_sync(void)
  
  /* more than 1 second = 1000 millisecons */
  if (end_time  start_time + 1000) {
 +if (migrate_auto_converge()) {
 +/* The following detection logic can be refined later. For now:
 +   Check to see if the dirtied bytes is 50% more than the approx.
 +   amount of bytes that just got transferred since the last time 
 we
 +   were in this routine. If that happens N times (for now N==4)
 +   we turn on the throttle down logic */
 +bytes_xfer_now = ram_bytes_transferred();
 +if (s-dirty_pages_rate 
 +((num_dirty_pages_period*TARGET_PAGE_SIZE) 
 +((bytes_xfer_now - bytes_xfer_prev)/2))) {
 +if (dirty_rate_high_cnt++  4) {

Too many parentheses, and please remove the nested if.

 +DPRINTF(Unable to converge. Throtting down guest\n);

Please use tracepoint instead.

 +mig_throttle_on = true;

Need to reset dirty_rate_high_cnt here, and both
dirty_rate_high_cnt/mig_throttle_on if you see !migrate_auto_converge().
 This ensures that throttling does not kick in automatically if you
disable and re-enable the feature.  It also lets you remove a bunch of
migrate_auto_converge() checks.

You also need to reset dirty_rate_high_cnt/mig_throttle_on in the setup
phase of migration.

 +}
 + }
 + bytes_xfer_prev = bytes_xfer_now;
 +}
  s-dirty_pages_rate = num_dirty_pages_period * 1000
  / (end_time - start_time);
  s-dirty_bytes_rate = s-dirty_pages_rate * TARGET_PAGE_SIZE;
 @@ -628,6 +654,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
  }
  total_sent += bytes_sent;
  acct_info.iterations++;
 +throttle_down_guest_to_converge();

You can use a shorter name, like check_cpu_throttling().

  /* we want to check in the 1st loop, just in case it was the 1st time
 and we had to sync the dirty bitmap.
 qemu_get_clock_ns() is a bit expensive, so we only check each some
 @@ -1098,3 +1125,61 @@ TargetInfo *qmp_query_target(Error **errp)
  
  return info;
  }
 +
 +static bool throttling_needed(void)
 +{
 +if (!migrate_auto_converge()) {

Re: [Qemu-devel] [RFC 06/13] qemu-thread: add TLS wrappers

2013-06-20 Thread Stefan Hajnoczi
On Thu, Jun 20, 2013 at 10:50:32AM +0200, Paolo Bonzini wrote:
 Il 20/06/2013 09:26, Fam Zheng ha scritto:
  On Fri, 06/14 11:48, Stefan Hajnoczi wrote:
  From: Paolo Bonzini pbonz...@redhat.com
 
  Fast TLS is not available on some platforms, but it is always nice to
  use it.  This wrapper implementation falls back to pthread_get/setspecific
  on POSIX systems that lack __thread, but uses the dynamic linker's TLS
  support on Linux and Windows.
 
  The user shall call alloc_foo() in every thread that needs to access the
  variable---exactly once and before any access.  foo is the name of the
  variable as passed to DECLARE_TLS and DEFINE_TLS.  Then, get_foo() will
  return the address of the variable.  It is guaranteed to remain the same
  across the lifetime of a thread, so you can cache it.
  
  Would tls_alloc_foo() and tls_get_foo() be easier to read and less
  possible for name conflict?
 
 Fine by me.

Nice, idea.  Will fix in the next version.

Stefan



Re: [Qemu-devel] [PATCH v3] vl.c: Support multiple CPU ranges on -numa option

2013-06-20 Thread Paolo Bonzini
Il 20/06/2013 13:34, Michael Tokarev ha scritto:
 20.06.2013 13:52, Paolo Bonzini wrote:
 Il 20/06/2013 11:30, Igor Mammedov ha scritto:
 libvirt uses the cpus option already, so we have to keep it working.
 Sure, we can leave it as it's now for some time while a new interface is
 introduced/adopted. And than later deprecate cpus.

 So, you used a new name because the new behavior of -numa
 node,cpus=1-2,cpus=3-4 would be incompatible with the old.
 
 BTW, as I tried to touch exactly the same place yesterday (trying
 to convert it to QemuOpts) -- what does this node mean?
 
 For example, with
 
   -device [type=]devicetype,foo=bar,xzy=abc
 
 this creates a new device for each invocation of option.  But
 what does this `-numa node' mean?  Can there be anything else
 besides node?  Why it is needed/used for?

Nothing, I think it's just that somebody took inspiration from -device. :)

 This -numa option is the last one which uses the old option
 parsing mechanism (there's also some smbios-related thing
 but it's simple to convert, I almost got it ready yesterday),

That would be awesome.

Paolo



Re: [Qemu-devel] [PATCH 1/2] PPC: Add dump_mmu() for 6xx

2013-06-20 Thread Fabien Chouteau
On 06/20/2013 01:16 PM, Alexander Graf wrote:
 
 On 18.06.2013, at 18:04, Fabien Chouteau wrote:
 
 On 06/18/2013 05:31 PM, Alexander Graf wrote:
 +for (type = 0; type  2; type++)

 You need braces on these. Please run your patch through checkpatch.pl :).

 I did ;)

 - ./scripts/checkpatch.pl 0001-PPC-Add-dump_mmu-for-6xx.patch
 total: 0 errors, 0 warnings, 51 lines checked

 0001-PPC-Add-dump_mmu-for-6xx.patch has no obvious style problems and is 
 ready for submission.
 
 Meh - broken script :). According to the CODING_STYLE convention all of the 
 above need to be cluttered with braces ;).

Will do.

 +for (way = 0; way  env-nb_ways; way++)
 +for (entry = env-nb_tlb * type + env-tlb_per_way * way;
 + entry  (env-nb_tlb * type + env-tlb_per_way * (way + 
 1));
 + entry++) {
 +
 +tlb = env-tlb.tlb6[entry];
 +cpu_fprintf(f, TLB %02d/%02d %s way:%d %s [
 +TARGET_FMT_lx   TARGET_FMT_lx ]\n,
 +entry % env-nb_tlb, env-nb_tlb,
 +type ? code : data, way,
 +pte_is_valid(tlb-pte0) ? valid : inval,
 +tlb-EPN, tlb-EPN + TARGET_PAGE_SIZE);
 +}

 I thought 6xx and 74xx also support HTAB and SRs? Shouldn't we dump those 
 as well?


 I don't know what that is, can you send me an example of what the printf 
 line should be?
 
 SRs are similar to the SLB that book3s_64 print out. Just that there are a 
 fixed smaller number of them (16). Basically you'd dump the env-sr array, 
 similar to how the debug functions in get_segment_6xx_tlb() dump it.
 
 For the HTAB I think SDR1 should be enough, so you don't need to do too much 
 here. If you like, you can just dump the decoded fields env-htab_base and 
 env-htab_mask. Dumping the whole HTAB would just explode the output.
 
 However, you also should definitely dump all (valid) BATs. Check out 
 get_bat_6xx_tlb() for debug code that dumps BATs.
 

Ok I'll have a look at that, and BATs should definitely be printed out.




-- 
Fabien Chouteau



[Qemu-devel] [PULL 00/21] pci,net,misc enhancements

2013-06-20 Thread Michael S. Tsirkin
From: Michael S. Tsirkin m...@redhat.com

The following changes since commit 90a2541b763b31d2b551b07e24aae3de5266d31b:

  target-i386: fix over 80 chars warnings (2013-06-15 17:50:38 +)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_anthony

for you to fetch changes up to f96c30047009f8a9c3cecf68104d8d99f989f54d:

  pci: Fold host_buses list into PCIHostState functionality (2013-06-19 
18:35:05 +0300)


pci,net,misc enhancements

This includes some pci and net-related enhancements:

Better support for systems with multiple PCI root buses
A new management interface for access to rx filter in NICs
KVM Speedup for MSI updates on kvm
FW cfg interface for more robust pci programming in BIOS
Minor fixes/cleanups for fw cfg and cross-version migration -
because of dependencies with other patches

Signed-off-by: Michael S. Tsirkin m...@redhat.com


Amos Kong (1):
  net: add support of mac-programming over macvtap in QEMU side

Andrew Jones (1):
  e1000: cleanup process_tx_desc

David Gibson (10):
  pci: Cleanup configuration for pci-hotplug.c
  pci: Move pci_read_devaddr to pci-hotplug-old.c
  pci: Abolish pci_find_root_bus()
  pci: Use helper to find device's root bus in pci_find_domain()
  pci: Replace pci_find_domain() with more general pci_root_bus_path()
  pci: Add root bus argument to pci_get_bus_devfn()
  pci: Add root bus parameter to pci_nic_init()
  pci: Simpler implementation of primary PCI bus
  pci: Remove domain from PCIHostBus
  pci: Fold host_buses list into PCIHostState functionality

Michael S. Tsirkin (9):
  range: add Range structure
  pci: store PCI hole ranges in guestinfo structure
  pc: pass PCI hole ranges to Guests
  pc_piix: cleanup init compat handling
  kvm: zero-initialize KVM_SET_GSI_ROUTING input
  kvm: skip system call when msi route is unchanged
  MAINTAINERS: s/Marcelo/Paolo/
  pvpanic: initialization cleanup
  pvpanic: fix fwcfg for big endian hosts

 MAINTAINERS |   2 +-
 QMP/qmp-events.txt  |  17 
 default-configs/i386-softmmu.mak|   3 +-
 default-configs/ppc64-softmmu.mak   |   2 -
 default-configs/x86_64-softmmu.mak  |   3 +-
 hmp-commands.hx |   4 +-
 hw/alpha/dp264.c|   2 +-
 hw/arm/realview.c   |   6 +-
 hw/arm/versatilepb.c|   2 +-
 hw/i386/pc.c|  74 ++-
 hw/i386/pc_piix.c   |  40 +---
 hw/i386/pc_q35.c|  18 +++-
 hw/mips/mips_fulong2e.c |   6 +-
 hw/mips/mips_malta.c|   6 +-
 hw/misc/pvpanic.c   |  31 ---
 hw/net/e1000.c  |  18 ++--
 hw/net/virtio-net.c | 111 ++
 hw/pci-host/piix.c  |   9 ++
 hw/pci-host/q35.c   |  17 
 hw/pci/Makefile.objs|   2 +-
 hw/pci/{pci-hotplug.c = pci-hotplug-old.c} |  75 ---
 hw/pci/pci.c| 137 ++--
 hw/pci/pci_host.c   |   1 +
 hw/pci/pcie_aer.c   |   9 +-
 hw/ppc/e500.c   |   2 +-
 hw/ppc/mac_newworld.c   |   2 +-
 hw/ppc/mac_oldworld.c   |   2 +-
 hw/ppc/ppc440_bamboo.c  |   2 +-
 hw/ppc/prep.c   |   2 +-
 hw/ppc/spapr.c  |   2 +-
 hw/ppc/spapr_pci.c  |  10 ++
 hw/sh4/r2d.c|   5 +-
 hw/sparc64/sun4u.c  |   2 +-
 include/hw/i386/pc.h|  22 -
 include/hw/pci-host/q35.h   |   2 +
 include/hw/pci/pci.h|  17 ++--
 include/hw/pci/pci_host.h   |  12 +++
 include/monitor/monitor.h   |   1 +
 include/net/net.h   |   3 +
 include/qemu/range.h|  16 
 include/qemu/typedefs.h |   1 +
 kvm-all.c   |  23 +++--
 monitor.c   |   1 +
 net/net.c   |  47 ++
 qapi-schema.json|  75 +++
 qmp-commands.hx |  63 +
 46 files changed, 733 insertions(+), 174 deletions(-)
 rename hw/pci/{pci-hotplug.c = pci-hotplug-old.c} (78%)




[Qemu-devel] [PULL 01/21] range: add Range structure

2013-06-20 Thread Michael S. Tsirkin
Sometimes we need to pass ranges around, add a
handy structure for this purpose.

Note: memory.c defines its own concept of AddrRange structure for
working with 128 addresses.  It's necessary there for doing range math.
This is not needed for most users: struct Range is
much simpler, and is only used for passing the range around.

Cc: Peter Maydell peter.mayd...@linaro.org
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 include/qemu/range.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/include/qemu/range.h b/include/qemu/range.h
index 3502372..b76cc0d 100644
--- a/include/qemu/range.h
+++ b/include/qemu/range.h
@@ -1,6 +1,22 @@
 #ifndef QEMU_RANGE_H
 #define QEMU_RANGE_H
 
+#include inttypes.h
+
+/*
+ * Operations on 64 bit address ranges.
+ * Notes:
+ *   - ranges must not wrap around 0, but can include the last byte ~0x0LL.
+ *   - this can not represent a full 0 to ~0x0LL range.
+ */
+
+/* A structure representing a range of addresses. */
+struct Range {
+uint64_t begin; /* First byte of the range, or 0 if empty. */
+uint64_t end;   /* 1 + the last byte. 0 if range empty or ends at ~0x0LL. 
*/
+};
+typedef struct Range Range;
+
 /* Get last byte of a range from offset + length.
  * Undefined for ranges that wrap around 0. */
 static inline uint64_t range_get_last(uint64_t offset, uint64_t len)
-- 
MST




[Qemu-devel] [PULL 03/21] pc: pass PCI hole ranges to Guests

2013-06-20 Thread Michael S. Tsirkin
Guest currently has to jump through lots of hoops to guess the PCI hole
ranges.  It's fragile, and makes us change BIOS each time we add a new
chipset.  Let's report the window in a ROM file, to make BIOS do exactly
what QEMU intends.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/i386/pc.c | 26 ++
 hw/i386/pc_piix.c| 16 +++-
 hw/i386/pc_q35.c | 12 ++--
 include/hw/i386/pc.h |  1 +
 4 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index c685ee8..e1ed760 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -990,6 +990,31 @@ void pc_cpus_init(const char *cpu_model, DeviceState 
*icc_bridge)
 }
 }
 
+/* pci-info ROM file. Little endian format */
+typedef struct PcRomPciInfo {
+uint64_t w32_min;
+uint64_t w32_max;
+uint64_t w64_min;
+uint64_t w64_max;
+} PcRomPciInfo;
+
+static void pc_fw_cfg_guest_info(PcGuestInfo *guest_info)
+{
+PcRomPciInfo *info;
+if (!guest_info-has_pci_info) {
+return;
+}
+
+info = g_malloc(sizeof *info);
+info-w32_min = cpu_to_le64(guest_info-pci_info.w32.begin);
+info-w32_max = cpu_to_le64(guest_info-pci_info.w32.end);
+info-w64_min = cpu_to_le64(guest_info-pci_info.w64.begin);
+info-w64_max = cpu_to_le64(guest_info-pci_info.w64.end);
+/* Pass PCI hole info to guest via a side channel.
+ * Required so guest PCI enumeration does the right thing. */
+fw_cfg_add_file(guest_info-fw_cfg, etc/pci-info, info, sizeof *info);
+}
+
 typedef struct PcGuestInfoState {
 PcGuestInfo info;
 Notifier machine_done;
@@ -1001,6 +1026,7 @@ void pc_guest_info_machine_done(Notifier *notifier, void 
*data)
 PcGuestInfoState *guest_info_state = container_of(notifier,
   PcGuestInfoState,
   machine_done);
+pc_fw_cfg_guest_info(guest_info_state-info);
 }
 
 PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size,
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index a70a0d9..d87da95 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -57,6 +57,7 @@ static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 };
 static const int ide_irq[MAX_IDE_BUS] = { 14, 15 };
 
 static bool has_pvpanic = true;
+static bool has_pci_info = true;
 
 /* PC hardware initialisation */
 static void pc_init1(MemoryRegion *system_memory,
@@ -121,6 +122,7 @@ static void pc_init1(MemoryRegion *system_memory,
 }
 
 guest_info = pc_guest_info_init(below_4g_mem_size, above_4g_mem_size);
+guest_info-has_pci_info = has_pci_info;
 
 /* Set PCI window size the way seabios has always done it. */
 /* Power of 2 so bios can cover it with a single MTRR */
@@ -258,8 +260,15 @@ static void pc_init_pci(QEMUMachineInitArgs *args)
  initrd_filename, cpu_model, 1, 1);
 }
 
+static void pc_init_pci_1_5(QEMUMachineInitArgs *args)
+{
+has_pci_info = false;
+pc_init_pci(args);
+}
+
 static void pc_init_pci_1_4(QEMUMachineInitArgs *args)
 {
+has_pci_info = false;
 has_pvpanic = false;
 x86_cpu_compat_set_features(n270, FEAT_1_ECX, 0, CPUID_EXT_MOVBE);
 pc_init_pci(args);
@@ -267,6 +276,7 @@ static void pc_init_pci_1_4(QEMUMachineInitArgs *args)
 
 static void pc_init_pci_1_3(QEMUMachineInitArgs *args)
 {
+has_pci_info = false;
 enable_compat_apic_id_mode();
 has_pvpanic = false;
 pc_init_pci(args);
@@ -275,6 +285,7 @@ static void pc_init_pci_1_3(QEMUMachineInitArgs *args)
 /* PC machine init function for pc-1.1 to pc-1.2 */
 static void pc_init_pci_1_2(QEMUMachineInitArgs *args)
 {
+has_pci_info = false;
 disable_kvm_pv_eoi();
 enable_compat_apic_id_mode();
 has_pvpanic = false;
@@ -284,6 +295,7 @@ static void pc_init_pci_1_2(QEMUMachineInitArgs *args)
 /* PC machine init function for pc-0.14 to pc-1.0 */
 static void pc_init_pci_1_0(QEMUMachineInitArgs *args)
 {
+has_pci_info = false;
 disable_kvm_pv_eoi();
 enable_compat_apic_id_mode();
 has_pvpanic = false;
@@ -300,6 +312,7 @@ static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs 
*args)
 const char *initrd_filename = args-initrd_filename;
 const char *boot_device = args-boot_device;
 has_pvpanic = false;
+has_pci_info = false;
 disable_kvm_pv_eoi();
 enable_compat_apic_id_mode();
 pc_init1(get_system_memory(),
@@ -318,6 +331,7 @@ static void pc_init_isa(QEMUMachineInitArgs *args)
 const char *initrd_filename = args-initrd_filename;
 const char *boot_device = args-boot_device;
 has_pvpanic = false;
+has_pci_info = false;
 if (cpu_model == NULL)
 cpu_model = 486;
 disable_kvm_pv_eoi();
@@ -353,7 +367,7 @@ static QEMUMachine pc_i440fx_machine_v1_6 = {
 static QEMUMachine pc_i440fx_machine_v1_5 = {
 .name = pc-i440fx-1.5,
 .desc = Standard PC (i440FX + PIIX, 1996),
-.init = pc_init_pci,
+.init = pc_init_pci_1_5,
 

[Qemu-devel] [PULL 05/21] e1000: cleanup process_tx_desc

2013-06-20 Thread Michael S. Tsirkin
From: Andrew Jones drjo...@redhat.com

Coverity complains about two overruns in process_tx_desc(). The
complaints are false positives, but we might as well eliminate
them. The problem is that hdr is defined as an unsigned int,
but then used to offset an array of size 65536, and another of
size 256 bytes. hdr will actually never be greater than 255
though, as it's assigned only once and to the value of
tp-hdr_len, which is an uint8_t. This patch simply gets rid of
hdr, replacing it with tp-hdr_len, which makes it consistent
with all other tp member use in the function.

v2:
 - also cleanup coding style issues in the touched lines

Signed-off-by: Andrew Jones drjo...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/net/e1000.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index e6f46f0..620f947 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -556,7 +556,7 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
 uint32_t txd_lower = le32_to_cpu(dp-lower.data);
 uint32_t dtype = txd_lower  (E1000_TXD_CMD_DEXT | E1000_TXD_DTYP_D);
 unsigned int split_size = txd_lower  0x, bytes, sz, op;
-unsigned int msh = 0xf, hdr = 0;
+unsigned int msh = 0xf;
 uint64_t addr;
 struct e1000_context_desc *xp = (struct e1000_context_desc *)dp;
 struct e1000_tx *tp = s-tx;
@@ -603,8 +603,7 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
 
 addr = le64_to_cpu(dp-buffer_addr);
 if (tp-tse  tp-cptse) {
-hdr = tp-hdr_len;
-msh = hdr + tp-mss;
+msh = tp-hdr_len + tp-mss;
 do {
 bytes = split_size;
 if (tp-size + bytes  msh)
@@ -612,14 +611,16 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
 
 bytes = MIN(sizeof(tp-data) - tp-size, bytes);
 pci_dma_read(s-dev, addr, tp-data + tp-size, bytes);
-if ((sz = tp-size + bytes) = hdr  tp-size  hdr)
-memmove(tp-header, tp-data, hdr);
+sz = tp-size + bytes;
+if (sz = tp-hdr_len  tp-size  tp-hdr_len) {
+memmove(tp-header, tp-data, tp-hdr_len);
+}
 tp-size = sz;
 addr += bytes;
 if (sz == msh) {
 xmit_seg(s);
-memmove(tp-data, tp-header, hdr);
-tp-size = hdr;
+memmove(tp-data, tp-header, tp-hdr_len);
+tp-size = tp-hdr_len;
 }
 } while (split_size -= bytes);
 } else if (!tp-tse  tp-cptse) {
@@ -633,8 +634,9 @@ process_tx_desc(E1000State *s, struct e1000_tx_desc *dp)
 
 if (!(txd_lower  E1000_TXD_CMD_EOP))
 return;
-if (!(tp-tse  tp-cptse  tp-size  hdr))
+if (!(tp-tse  tp-cptse  tp-size  tp-hdr_len)) {
 xmit_seg(s);
+}
 tp-tso_frames = 0;
 tp-sum_needed = 0;
 tp-vlan_needed = 0;
-- 
MST




[Qemu-devel] [PULL 06/21] kvm: zero-initialize KVM_SET_GSI_ROUTING input

2013-06-20 Thread Michael S. Tsirkin
kvm_add_routing_entry makes an attempt to
zero-initialize any new routing entry.
However, it fails to initialize padding
within the u field of the structure
kvm_irq_routing_entry.

Other functions like kvm_irqchip_update_msi_route
also fail to initialize the padding field in
kvm_irq_routing_entry.

While mostly harmless, this would prevent us from
reusing these fields for something useful in
the future.

It's better to just make sure all input is initialized.

Once it is, we can also drop complex field by field assignment and just
do the simple *a = *b to update a route entry.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 kvm-all.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 405480e..f119ce1 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1006,11 +1006,8 @@ static void kvm_add_routing_entry(KVMState *s,
 }
 n = s-irq_routes-nr++;
 new = s-irq_routes-entries[n];
-memset(new, 0, sizeof(*new));
-new-gsi = entry-gsi;
-new-type = entry-type;
-new-flags = entry-flags;
-new-u = entry-u;
+
+*new = *entry;
 
 set_gsi(s, entry-gsi);
 
@@ -1029,9 +1026,7 @@ static int kvm_update_routing_entry(KVMState *s,
 continue;
 }
 
-entry-type = new_entry-type;
-entry-flags = new_entry-flags;
-entry-u = new_entry-u;
+*entry = *new_entry;
 
 kvm_irqchip_commit_routes(s);
 
@@ -1043,7 +1038,7 @@ static int kvm_update_routing_entry(KVMState *s,
 
 void kvm_irqchip_add_irq_route(KVMState *s, int irq, int irqchip, int pin)
 {
-struct kvm_irq_routing_entry e;
+struct kvm_irq_routing_entry e = {};
 
 assert(pin  s-gsi_count);
 
@@ -1156,7 +1151,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
 return virq;
 }
 
-route = g_malloc(sizeof(KVMMSIRoute));
+route = g_malloc0(sizeof(KVMMSIRoute));
 route-kroute.gsi = virq;
 route-kroute.type = KVM_IRQ_ROUTING_MSI;
 route-kroute.flags = 0;
@@ -1177,7 +1172,7 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg)
 
 int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
 {
-struct kvm_irq_routing_entry kroute;
+struct kvm_irq_routing_entry kroute = {};
 int virq;
 
 if (!kvm_gsi_routing_enabled()) {
@@ -1203,7 +1198,7 @@ int kvm_irqchip_add_msi_route(KVMState *s, MSIMessage msg)
 
 int kvm_irqchip_update_msi_route(KVMState *s, int virq, MSIMessage msg)
 {
-struct kvm_irq_routing_entry kroute;
+struct kvm_irq_routing_entry kroute = {};
 
 if (!kvm_irqchip_in_kernel()) {
 return -ENOSYS;
-- 
MST




[Qemu-devel] [PULL 04/21] pc_piix: cleanup init compat handling

2013-06-20 Thread Michael S. Tsirkin
Make sure 1.4 calls 1.5, 1.3 calls 1.4 etc.
This way it's enough to add enough new compat hook
in a single place in piix.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/i386/pc_piix.c | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index d87da95..234424a 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -268,38 +268,28 @@ static void pc_init_pci_1_5(QEMUMachineInitArgs *args)
 
 static void pc_init_pci_1_4(QEMUMachineInitArgs *args)
 {
-has_pci_info = false;
 has_pvpanic = false;
 x86_cpu_compat_set_features(n270, FEAT_1_ECX, 0, CPUID_EXT_MOVBE);
-pc_init_pci(args);
+pc_init_pci_1_5(args);
 }
 
 static void pc_init_pci_1_3(QEMUMachineInitArgs *args)
 {
-has_pci_info = false;
 enable_compat_apic_id_mode();
-has_pvpanic = false;
-pc_init_pci(args);
+pc_init_pci_1_4(args);
 }
 
 /* PC machine init function for pc-1.1 to pc-1.2 */
 static void pc_init_pci_1_2(QEMUMachineInitArgs *args)
 {
-has_pci_info = false;
 disable_kvm_pv_eoi();
-enable_compat_apic_id_mode();
-has_pvpanic = false;
-pc_init_pci(args);
+pc_init_pci_1_3(args);
 }
 
 /* PC machine init function for pc-0.14 to pc-1.0 */
 static void pc_init_pci_1_0(QEMUMachineInitArgs *args)
 {
-has_pci_info = false;
-disable_kvm_pv_eoi();
-enable_compat_apic_id_mode();
-has_pvpanic = false;
-pc_init_pci(args);
+pc_init_pci_1_2(args);
 }
 
 /* PC init function for pc-0.10 to pc-0.13, and reused by xenfv */
-- 
MST




  1   2   3   >