Re: [Qemu-devel] [PATCH] RFC: powerpc: add PVR compatibility check

2013-11-04 Thread Alexey Kardashevskiy
On 11/04/2013 06:47 PM, Alexander Graf wrote:
 
 On 04.11.2013, at 04:36, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 If QEMU is started with KVM enabled and -cpu specified, and the CPU is not
 from the family which the host is running on, an error should be displayed
 so this the patch does.

 Cc: Andreas Färber afaer...@suse.de
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---

 Is that correct to assume that the closest abstract class is a CPU family?
 It is most likely true but I want to double check :)
 
 I don't think this is correct. KVM on POWER7 is compatible to run a 750
 vcpu for example.


Are you talking about PR KVM or HV KVM now? How does it work? What are the
PCR's v2.05/v.206 bits in this case? They must be set to something, no?

I understood this as with KVM we have to create CPU of the family which the
host CPU belongs to and if we want to support some lower version, then we
use compatibility mode. Hm.


 Is there any nicer way of doing what the patch does?
 

 The only instance that knows whether it's compatible or not is the kvm
 kernel module. Currently the only way we can check compatibility is
 through the pvr value that user space pushes into the kernel.

HV KVM does not virtualize PVR and the userspace can only try PCR which
defines very few compatibility modes and KVM can fail on setting wrong modes.

 I see two ways we can enhance this. We could add checks to kvm's HV
 mode to make sure the guest vcpu type is compatible. Since the list
 of supported PVRs is quite small this might even be feasible.

Since the list is small and we know all possible combinations - why to
bother about this in the kernel?

 The other thing that would be nice would be to transfer a full blob of
 capabilities into kvm that we can match for, similar to how cpuid on x86
 works. That way we can then match host features with guest features and
 can check compatibility on a much more fine grained level.

We have such a blob, it is called client-architecture-support :) But it
is PAPR, i.e. proprietary :( And again, there is nothing (yet?) which we
cannot process in QEMU and can in KVM.


 The big benefit of the second approach is that when someone crazy enough
 comes in to implement e500mc on book3s kvm for example, he could do that
 simply by setting a few different capability bits. It would also make
 paired single selection more obvious for example. And we could limit
 Altivec access to only CPUs that have it rather than exposing it for all
 as we do today.

I am confused. How do existing guest kernels know if Altivec is supported
or not? I thought this is nailed to PVR and you cannot expose standalone
features.

Adding Paul to cc:.


-- 
Alexey



Re: [Qemu-devel] [PATCH] RFC: powerpc: add PVR compatibility check

2013-11-04 Thread Alexander Graf

On 04.11.2013, at 09:58, Alexey Kardashevskiy a...@ozlabs.ru wrote:

 On 11/04/2013 06:47 PM, Alexander Graf wrote:
 
 On 04.11.2013, at 04:36, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 If QEMU is started with KVM enabled and -cpu specified, and the CPU is not
 from the family which the host is running on, an error should be displayed
 so this the patch does.
 
 Cc: Andreas Färber afaer...@suse.de
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
 ---
 
 Is that correct to assume that the closest abstract class is a CPU family?
 It is most likely true but I want to double check :)
 
 I don't think this is correct. KVM on POWER7 is compatible to run a 750
 vcpu for example.
 
 
 Are you talking about PR KVM or HV KVM now?

We are talking about QEMU here which means we always have to take the whole 
picture into account. The 750-on-POWER7 case only works with PR KVM.

 How does it work? What are the
 PCR's v2.05/v.206 bits in this case? They must be set to something, no?
 
 I understood this as with KVM we have to create CPU of the family which the
 host CPU belongs to and if we want to support some lower version, then we
 use compatibility mode. Hm.

That's HV KVM specific. There is no reason a user couldn't use QEMU on PR KVM.

 
 
 Is there any nicer way of doing what the patch does?
 
 
 The only instance that knows whether it's compatible or not is the kvm
 kernel module. Currently the only way we can check compatibility is
 through the pvr value that user space pushes into the kernel.
 
 HV KVM does not virtualize PVR and the userspace can only try PCR which
 defines very few compatibility modes and KVM can fail on setting wrong modes.

HV KVM should simply fail when vcpu pvr != host pvr.

 
 I see two ways we can enhance this. We could add checks to kvm's HV
 mode to make sure the guest vcpu type is compatible. Since the list
 of supported PVRs is quite small this might even be feasible.
 
 Since the list is small and we know all possible combinations - why to
 bother about this in the kernel?

Yeah, we really need to check that guest vpcu == host vcpu for HV KVM.

 
 The other thing that would be nice would be to transfer a full blob of
 capabilities into kvm that we can match for, similar to how cpuid on x86
 works. That way we can then match host features with guest features and
 can check compatibility on a much more fine grained level.
 
 We have such a blob, it is called client-architecture-support :) But it
 is PAPR, i.e. proprietary :( And again, there is nothing (yet?) which we
 cannot process in QEMU and can in KVM.

Please start to think outside of the HV KVM box.

 
 
 The big benefit of the second approach is that when someone crazy enough
 comes in to implement e500mc on book3s kvm for example, he could do that
 simply by setting a few different capability bits. It would also make
 paired single selection more obvious for example. And we could limit
 Altivec access to only CPUs that have it rather than exposing it for all
 as we do today.
 
 I am confused. How do existing guest kernels know if Altivec is supported
 or not? I thought this is nailed to PVR and you cannot expose standalone
 features.

Yes, today the only way we tell the kernel whether a guest vcpu supports 
Altivec is through PVR. That was a bad design decision. I think it would make 
more sense to give kvm a full list of features it can then base on rather than 
only the PVR. We could then check those features against host features, in the 
emulator and in external feature (altivec, spe, fpu, etc) enablement.


Alex




Re: [Qemu-devel] [PATCH] RFC: powerpc: add PVR compatibility check

2013-11-04 Thread Alexey Kardashevskiy
On 11/04/2013 08:05 PM, Alexander Graf wrote:
 
 On 04.11.2013, at 09:58, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 On 11/04/2013 06:47 PM, Alexander Graf wrote:

 On 04.11.2013, at 04:36, Alexey Kardashevskiy a...@ozlabs.ru wrote:

 If QEMU is started with KVM enabled and -cpu specified, and the CPU is not
 from the family which the host is running on, an error should be displayed
 so this the patch does.

 Cc: Andreas Färber afaer...@suse.de
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru

 ---

 Is that correct to assume that the closest abstract class is a CPU family?
 It is most likely true but I want to double check :)

 I don't think this is correct. KVM on POWER7 is compatible to run a 750
 vcpu for example.


 Are you talking about PR KVM or HV KVM now?
 
 We are talking about QEMU here which means we always have to take the
 whole picture into account. The 750-on-POWER7 case only works with PR
 KVM.

 How does it work? What are the
 PCR's v2.05/v.206 bits in this case? They must be set to something, no?

 I understood this as with KVM we have to create CPU of the family which the
 host CPU belongs to and if we want to support some lower version, then we
 use compatibility mode. Hm.
 
 That's HV KVM specific. There is no reason a user couldn't use QEMU on PR KVM.
 


 Is there any nicer way of doing what the patch does?


 The only instance that knows whether it's compatible or not is the kvm
 kernel module. Currently the only way we can check compatibility is
 through the pvr value that user space pushes into the kernel.

 HV KVM does not virtualize PVR and the userspace can only try PCR which
 defines very few compatibility modes and KVM can fail on setting wrong modes.
 
 HV KVM should simply fail when vcpu pvr != host pvr.


More precisely, pvrmask != host_pvrmask. That is what I really wanted
here but I do not know how to distinguish PR and HV KVM in
target-ppc/translate_init.c.


 I see two ways we can enhance this. We could add checks to kvm's HV
 mode to make sure the guest vcpu type is compatible. Since the list
 of supported PVRs is quite small this might even be feasible.

 Since the list is small and we know all possible combinations - why to
 bother about this in the kernel?
 
 Yeah, we really need to check that guest vpcu == host vcpu for HV KVM.
 

 The other thing that would be nice would be to transfer a full blob of
 capabilities into kvm that we can match for, similar to how cpuid on x86
 works. That way we can then match host features with guest features and
 can check compatibility on a much more fine grained level.

 We have such a blob, it is called client-architecture-support :) But it
 is PAPR, i.e. proprietary :( And again, there is nothing (yet?) which we
 cannot process in QEMU and can in KVM.
 
 Please start to think outside of the HV KVM box.


I am trying. I looked through PowerISA and did not find any mechanism to
tell the guest whether the host supports Altivec or not. So I assumed that
PR KVM can only do it by setting a PVR of a CPU which does or does not
support Altivec. Yes, my patch does not take PR KVM into account, this is
why it is an RFC patch and I am asking all these stupid questions as I do
not really understand where to insert such a check in QEMU.

After all, now it seems right to do this check in KVM to avoid having PR
vs. HV cases in QEMU.



 The big benefit of the second approach is that when someone crazy enough
 comes in to implement e500mc on book3s kvm for example, he could do that
 simply by setting a few different capability bits. It would also make
 paired single selection more obvious for example. And we could limit
 Altivec access to only CPUs that have it rather than exposing it for all
 as we do today.

 I am confused. How do existing guest kernels know if Altivec is supported
 or not? I thought this is nailed to PVR and you cannot expose standalone
 features.
 

 Yes, today the only way we tell the kernel whether a guest vcpu supports
 Altivec is through PVR. That was a bad design decision. I think it would
 make more sense to give kvm a full list of features it can then base on
 rather than only the PVR. We could then check those features against
 host features, in the emulator and in external feature (altivec, spe,
 fpu, etc) enablement.

What will KVM do with those bits? How exactly will it tell the _guest_ that
it does or does not support Altivec? I mean, in addition to setting a
correct PVR? May be there is some good specification (besides PowerISA)
which I am missing?


-- 
Alexey



Re: [Qemu-devel] pvpanic plans?

2013-11-04 Thread Christian Borntraeger
On 31/10/13 15:30, Michael S. Tsirkin wrote:
 On Thu, Oct 24, 2013 at 10:39:03AM +0800, Hu Tao wrote:
 Hi All,

 I know it's been a long time since this thread. But qemu 1.7 is
 releasing, do you have any consensus on this?

 Thanks.
 
 
 I think the biggest issue is the new PANICKED state.

I thought the problem was that the new device broke windows and all
the following hazzle.

 Guests already have simple ways to halt the CPU,
 and actually do.  I think a new state was a mistake.
 So how about the following? Does it break anything?
 (Untested).

Please note that on s390 we also do the panic state (on a disabled wait)


target-s390x/kvm.c
...

monitor_protocol_event(QEVENT_GUEST_PANICKED, data);
qobject_decref(data);
vm_stop(RUN_STATE_GUEST_PANICKED);
...

Currently it is possible to restart libvirt, e.g. after an update and then it 
will
be able to fetch the full state of a guest via QMP. It will then also be able to
detect that this guest panicked some time ago.
I think one issue when removing the PANICKED state is that libvirt can then no 
longer detect that state, correct?

Christian


 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c
 index 226e298..2055afc 100644
 --- a/hw/misc/pvpanic.c
 +++ b/hw/misc/pvpanic.c
 @@ -51,7 +51,6 @@ static void handle_event(int event)
 
  if (event  PVPANIC_PANICKED) {
  panicked_mon_event(pause);
 -vm_stop(RUN_STATE_GUEST_PANICKED);
  return;
  }
  }
 
 




Re: [Qemu-devel] [PATCH] RFC: powerpc: add PVR compatibility check

2013-11-04 Thread Alexander Graf

On 04.11.2013, at 10:24, Alexey Kardashevskiy a...@ozlabs.ru wrote:

 On 11/04/2013 08:05 PM, Alexander Graf wrote:
 
 On 04.11.2013, at 09:58, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 On 11/04/2013 06:47 PM, Alexander Graf wrote:
 
 On 04.11.2013, at 04:36, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 If QEMU is started with KVM enabled and -cpu specified, and the CPU is not
 from the family which the host is running on, an error should be displayed
 so this the patch does.
 
 Cc: Andreas Färber afaer...@suse.de
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 
 ---
 
 Is that correct to assume that the closest abstract class is a CPU family?
 It is most likely true but I want to double check :)
 
 I don't think this is correct. KVM on POWER7 is compatible to run a 750
 vcpu for example.
 
 
 Are you talking about PR KVM or HV KVM now?
 
 We are talking about QEMU here which means we always have to take the
 whole picture into account. The 750-on-POWER7 case only works with PR
 KVM.
 
 How does it work? What are the
 PCR's v2.05/v.206 bits in this case? They must be set to something, no?
 
 I understood this as with KVM we have to create CPU of the family which the
 host CPU belongs to and if we want to support some lower version, then we
 use compatibility mode. Hm.
 
 That's HV KVM specific. There is no reason a user couldn't use QEMU on PR 
 KVM.
 
 
 
 Is there any nicer way of doing what the patch does?
 
 
 The only instance that knows whether it's compatible or not is the kvm
 kernel module. Currently the only way we can check compatibility is
 through the pvr value that user space pushes into the kernel.
 
 HV KVM does not virtualize PVR and the userspace can only try PCR which
 defines very few compatibility modes and KVM can fail on setting wrong 
 modes.
 
 HV KVM should simply fail when vcpu pvr != host pvr.
 
 
 More precisely, pvrmask != host_pvrmask.

Are you sure? After all, we can not fake a different PVR to the guest in the 
first place, no? So if we declare -cpu POWER7_v10 on a POWER7_v20 system as 
supported the result the guest sees would be wrong, as it would see a v20 CPU, 
not a v10 CPU.

 That is what I really wanted
 here but I do not know how to distinguish PR and HV KVM in
 target-ppc/translate_init.c.

You shouldn't. Only the kernel should really care about the difference.

 
 
 I see two ways we can enhance this. We could add checks to kvm's HV
 mode to make sure the guest vcpu type is compatible. Since the list
 of supported PVRs is quite small this might even be feasible.
 
 Since the list is small and we know all possible combinations - why to
 bother about this in the kernel?
 
 Yeah, we really need to check that guest vpcu == host vcpu for HV KVM.
 
 
 The other thing that would be nice would be to transfer a full blob of
 capabilities into kvm that we can match for, similar to how cpuid on x86
 works. That way we can then match host features with guest features and
 can check compatibility on a much more fine grained level.
 
 We have such a blob, it is called client-architecture-support :) But it
 is PAPR, i.e. proprietary :( And again, there is nothing (yet?) which we
 cannot process in QEMU and can in KVM.
 
 Please start to think outside of the HV KVM box.
 
 
 I am trying. I looked through PowerISA and did not find any mechanism to
 tell the guest whether the host supports Altivec or not. So I assumed that
 PR KVM can only do it by setting a PVR of a CPU which does or does not
 support Altivec. Yes, my patch does not take PR KVM into account, this is
 why it is an RFC patch and I am asking all these stupid questions as I do
 not really understand where to insert such a check in QEMU.
 
 After all, now it seems right to do this check in KVM to avoid having PR
 vs. HV cases in QEMU.

Yup :).

 
 
 
 The big benefit of the second approach is that when someone crazy enough
 comes in to implement e500mc on book3s kvm for example, he could do that
 simply by setting a few different capability bits. It would also make
 paired single selection more obvious for example. And we could limit
 Altivec access to only CPUs that have it rather than exposing it for all
 as we do today.
 
 I am confused. How do existing guest kernels know if Altivec is supported
 or not? I thought this is nailed to PVR and you cannot expose standalone
 features.
 
 
 Yes, today the only way we tell the kernel whether a guest vcpu supports
 Altivec is through PVR. That was a bad design decision. I think it would
 make more sense to give kvm a full list of features it can then base on
 rather than only the PVR. We could then check those features against
 host features, in the emulator and in external feature (altivec, spe,
 fpu, etc) enablement.
 
 What will KVM do with those bits? How exactly will it tell the _guest_ that
 it does or does not support Altivec? I mean, in addition to setting a
 correct PVR? May be there is some good specification (besides PowerISA)
 which I am missing?

Ok, 

Re: [Qemu-devel] [RESEND][PATCH] migration: drop MADVISE_DONT_NEED for incoming zero pages

2013-11-04 Thread Peter Lieven

On 24.10.2013 12:07, Juan Quintela wrote:

Peter Lieven p...@kamp.de wrote:

The madvise for zeroed out pages was introduced when every transferred
zero page was memset to zero and thus allocated. Since commit
211ea740 we check for zeroness of a target page before we memset
it to zero. Additionally we memmap target memory so it is essentially
zero initialized (except for e.g. option roms and bios which are loaded
into target memory although they shouldn't).

It was reported recently that this madvise causes a performance degradation
in some situations. As the madvise should only be called rarely and if it's 
called
it is likely on a busy page (it was non-zero and changed to zero during 
migration)
drop it completely.

Reviewed-by: Juan Quintela quint...@redhat.com

I take it.  I am on KVM Forum/LinuxCon this week.  Will send when back
at home.

Ping

Peter



[Qemu-devel] [PATCH v2] block: per caller dirty bitmap

2013-11-04 Thread Fam Zheng
Previously a BlockDriverState has only one dirty bitmap, so only one
caller (e.g. a block job) can keep track of writing. This changes the
dirty bitmap to a list and creates a BdrvDirtyBitmap for each caller, the
lifecycle is managed with these new functions:

bdrv_create_dirty_bitmap
bdrv_release_dirty_bitmap

Where BdrvDirtyBitmap is a linked list wrapper structure of HBitmap.

In place of bdrv_set_dirty_tracking, a BdrvDirtyBitmap pointer argument
is added to these functions, since each caller has its own dirty bitmap:

bdrv_get_dirty
bdrv_dirty_iter_init
bdrv_get_dirty_count

bdrv_set_dirty and bdrv_reset_dirty prototypes are unchanged but will
internally walk the list of all dirty bitmaps and set them one by one.

Signed-off-by: Fam Zheng f...@redhat.com

---
v2: Added BdrvDirtyBitmap [Paolo]

Signed-off-by: Fam Zheng f...@redhat.com
---
 block-migration.c | 22 +
 block.c   | 81 ---
 block/mirror.c| 23 --
 block/qapi.c  |  8 -
 include/block/block.h | 11 ---
 include/block/block_int.h |  2 +-
 6 files changed, 85 insertions(+), 62 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index daf9ec1..8f4e826 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -58,6 +58,7 @@ typedef struct BlkMigDevState {
 /* Protected by block migration lock.  */
 unsigned long *aio_bitmap;
 int64_t completed_sectors;
+BdrvDirtyBitmap *dirty_bitmap;
 } BlkMigDevState;
 
 typedef struct BlkMigBlock {
@@ -309,12 +310,21 @@ static int mig_save_device_bulk(QEMUFile *f, 
BlkMigDevState *bmds)
 
 /* Called with iothread lock taken.  */
 
-static void set_dirty_tracking(int enable)
+static void set_dirty_tracking(void)
 {
 BlkMigDevState *bmds;
 
 QSIMPLEQ_FOREACH(bmds, block_mig_state.bmds_list, entry) {
-bdrv_set_dirty_tracking(bmds-bs, enable ? BLOCK_SIZE : 0);
+bmds-dirty_bitmap = bdrv_create_dirty_bitmap(bmds-bs, BLOCK_SIZE);
+}
+}
+
+static void unset_dirty_tracking(void)
+{
+BlkMigDevState *bmds;
+
+QSIMPLEQ_FOREACH(bmds, block_mig_state.bmds_list, entry) {
+bdrv_release_dirty_bitmap(bmds-bs, bmds-dirty_bitmap);
 }
 }
 
@@ -432,7 +442,7 @@ static int mig_save_device_dirty(QEMUFile *f, 
BlkMigDevState *bmds,
 } else {
 blk_mig_unlock();
 }
-if (bdrv_get_dirty(bmds-bs, sector)) {
+if (bdrv_get_dirty(bmds-bs, bmds-dirty_bitmap, sector)) {
 
 if (total_sectors - sector  BDRV_SECTORS_PER_DIRTY_CHUNK) {
 nr_sectors = total_sectors - sector;
@@ -554,7 +564,7 @@ static int64_t get_remaining_dirty(void)
 int64_t dirty = 0;
 
 QSIMPLEQ_FOREACH(bmds, block_mig_state.bmds_list, entry) {
-dirty += bdrv_get_dirty_count(bmds-bs);
+dirty += bdrv_get_dirty_count(bmds-bs, bmds-dirty_bitmap);
 }
 
 return dirty  BDRV_SECTOR_BITS;
@@ -569,7 +579,7 @@ static void blk_mig_cleanup(void)
 
 bdrv_drain_all();
 
-set_dirty_tracking(0);
+unset_dirty_tracking();
 
 blk_mig_lock();
 while ((bmds = QSIMPLEQ_FIRST(block_mig_state.bmds_list)) != NULL) {
@@ -604,7 +614,7 @@ static int block_save_setup(QEMUFile *f, void *opaque)
 init_blk_migration(f);
 
 /* start track dirty blocks */
-set_dirty_tracking(1);
+set_dirty_tracking();
 qemu_mutex_unlock_iothread();
 
 ret = flush_blks(f);
diff --git a/block.c b/block.c
index 58efb5b..ef7a55f 100644
--- a/block.c
+++ b/block.c
@@ -49,6 +49,11 @@
 #include windows.h
 #endif
 
+typedef struct BdrvDirtyBitmap {
+HBitmap *bitmap;
+QLIST_ENTRY(BdrvDirtyBitmap) list;
+} BdrvDirtyBitmap;
+
 #define NOT_DONE 0x7fff /* used while emulated sync operation in progress 
*/
 
 typedef enum {
@@ -323,6 +328,7 @@ BlockDriverState *bdrv_new(const char *device_name)
 BlockDriverState *bs;
 
 bs = g_malloc0(sizeof(BlockDriverState));
+QLIST_INIT(bs-dirty_bitmaps);
 pstrcpy(bs-device_name, sizeof(bs-device_name), device_name);
 if (device_name[0] != '\0') {
 QTAILQ_INSERT_TAIL(bdrv_states, bs, list);
@@ -1615,7 +1621,7 @@ static void bdrv_move_feature_fields(BlockDriverState 
*bs_dest,
 bs_dest-iostatus   = bs_src-iostatus;
 
 /* dirty bitmap */
-bs_dest-dirty_bitmap   = bs_src-dirty_bitmap;
+bs_dest-dirty_bitmaps  = bs_src-dirty_bitmaps;
 
 /* reference count */
 bs_dest-refcnt = bs_src-refcnt;
@@ -1648,7 +1654,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState 
*bs_old)
 
 /* bs_new must be anonymous and shouldn't have anything fancy enabled */
 assert(bs_new-device_name[0] == '\0');
-assert(bs_new-dirty_bitmap == NULL);
+assert(QLIST_EMPTY(bs_new-dirty_bitmaps));
 assert(bs_new-job == NULL);
 assert(bs_new-dev == NULL);
 assert(bs_new-in_use == 0);
@@ -1709,6 +1715,7 @@ static void bdrv_delete(BlockDriverState *bs)
 

Re: [Qemu-devel] How to introduce bs-node_name ?

2013-11-04 Thread Stefan Hajnoczi
On Wed, Oct 30, 2013 at 02:49:32PM +0100, Markus Armbruster wrote:
 The first proposal is to add another parameter, say id.  Users can
 then refer either to an arbitrary BDS by id, or (for backward
 compatibility) to the root BDS by device.  When the code sees
 device, it'll look up the BB, then fetch its root BDS.
 
 CON: Existing parameter device becomes compatibility cruft.
 
 PRO: Clean and obvious semantics (in my opinion).

This proposal gets my vote.

 The second proposal is to press the existing parameter device into
 service for referring to BDS node_name.
 
 To keep backward compatibility, we obviously need to ensure that
 whenever the old code accepts a value of device, the new code accepts
 it as well, and both resolve it to the same BDS.

Different legacy commands given the same device name might need to
operate on different nodes.  Dynamic renaming does not solve this
problem, so I'm not convinced we can always choose a device name
matching a node name.

Device name commands are higher-level than graph node commands.  For
example, block_set_io_throttle makes sense on a device but less sense on
a graph node, unless we add the implicit assumption that the new
throttling node is created on top of the given node or updated in place
if the throttling node already exists (!!).

Stefan



Re: [Qemu-devel] [PATCH] exec: fix regression by making system-memory region UINT64_MAX size

2013-11-04 Thread Marcel Apfelbaum
On Mon, 2013-11-04 at 08:18 +0200, Michael S. Tsirkin wrote:
 On Sun, Nov 03, 2013 at 09:26:06PM +, Peter Maydell wrote:
  On 3 November 2013 20:48, Marcel Apfelbaum marce...@redhat.com wrote:
   The problem appears when a root memory region within an
   address space with size  UINT64_MAX has overlapping children
   with the same size. If the size of the root memory region is UINT64_MAX
   everyting is ok.
  
   Solved the regression by making the system-memory region
   of size UINT64_MAX instead of INT64_MAX.
  
   Signed-off-by: Marcel Apfelbaum marce...@redhat.com
   ---
   In the mean time I am investigating why the
   root memory region has to be UINT64_MAX size in order
   to have overlapping children
  
system_memory = g_malloc(sizeof(*system_memory));
   -memory_region_init(system_memory, NULL, system, INT64_MAX);
   +memory_region_init(system_memory, NULL, system, UINT64_MAX);
address_space_init(address_space_memory, system_memory, memory);
  
  As you say above we should investigate why this caused a
  problem, but I was surprised the system memory space isn't
  already maximum size. It turns out that that change was
  introduced in commit 8417cebf in an attempt to avoid overflow
  issues by sticking to signed 64 bit arithmetic. This approach was
  subsequently ditched in favour of using proper 128 bit arithmetic
  in commit 08dafab4, but we never changed the init call for
  the system memory back to UINT64_MAX. So I think this is
  a good change in itself.
  
  -- PMM
 
 I think I debugged it.
 
 So this patch seems to help simply because we only have
 sanity checking asserts in the subpage path. UINT64_MAX will make
 the region a number of full pages and avoid
 hitting the checks.
 
 
 I think I see what the issue is: exec.c
 assumes that TARGET_PHYS_ADDR_SPACE_BITS is enough
 to render any section in system memory:
 number of page table levels is calculated from that:
 
 #define P_L2_LEVELS \
   (((TARGET_PHYS_ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / L2_BITS) + 1)
 
 any other bits are simply ignored:
 
 for (i = P_L2_LEVELS - 1; i = 0  !lp.is_leaf; i--) {
 if (lp.ptr == PHYS_MAP_NODE_NIL) {
 return sections[PHYS_SECTION_UNASSIGNED];
 }
 p = nodes[lp.ptr];
 lp = p[(index  (i * L2_BITS))  (L2_SIZE - 1)];
 }
 
 so mask by L2_SIZE - 1 means that each round looks at L2_BITS bits,
 and there are at most P_L2_LEVELS.
 
 Any other bits are simply ignored.

Michael, thanks for helping to debug this issue.
Let me see if I got it right:
If the system memory size is INT64_MAX (0x7fff), the address of the
last page (0x7) has more bits (55) that TARGET_PHYS_ADDR_SPACE_BITS 
(52)
and cannot be correctly mapped into page levels?

Thanks,
Marcel

 This is very wrong and can break in a number of other ways,
 for example I think we will also hit this assert
 if we have a non aligned 64 bit BAR of a PCI device.
 
 I think the fastest solution is to just limit
 system memory size of TARGET_PAGE_BITS.
 I sent a patch like this.
 
 
 






Re: [Qemu-devel] [PATCH v3] ppc: introduce CPUPPCState::cpu_dt_id

2013-11-04 Thread Alexander Graf

On 04.11.2013, at 02:10, Alexey Kardashevskiy a...@ozlabs.ru wrote:

 Normally CPUState::cpu_index is used to pick the right CPU for various
 operations. However default consecutive numbering does not always work
 for POWERPC.
 
 For example, on POWER7 (which supports 4 threads per core),
 -smp 8,threads=4 should create CPUs with indexes 0,1,2,3,4,5,6,7 and
 -smp 8,threads=1 should create CPUs with indexes 0,4,8,12,16,20,24,28.
 
 These indexes are reflected in /proc/device-tree/cpus/PowerPC,POWER7@XX
 and used to call KVM VCPU's ioctls. In order to achieve this,
 kvmppc_fixup_cpu() was introduced. Roughly speaking, it multiplies
 cpu_index by the number of threads per core.
 
 This approach has disadvantages such as:
 1. NUMA configuration stays broken after the fixup;
 2. CPU-related commands from QEMU Monitor do not work properly as
 the accept fixed CPU indexes and the user does not really know
 what they are after fixup as the number of threads per core changes
 between CPU versions and via QEMU command line.
 
 This introduces a new @cpu_dt_id field in the CPUPPCState struct which
 is set from @cpu_index by default but can be fixed later to the value
 which a hypervisor can accept. This also introduces two POWERPC-arch
 specific functions:
 1. int ppc_get_vcpu_dt_id(CPUState *cs) - returns a device-tree ID
 for a CPU;
 2. CPUState *ppc_get_vcpu_by_dt_id(int cpu_dt_id) - finds CPUState by
 a device-tree CPU ID.
 
 This uses the new functions to:
 1. fix emulated XICS hypercall handlers as they receive fixed CPU indexes;
 2. fix XICS-KVM to enable in-kernel XICS on right CPU;
 3. compose correct device-tree.
 
 This removes @cpu_index fixup as @cpu_dt_id is used instead so QEMU monitor
 can accept command-line CPU indexes again.

So this patch feels awkward. We use the dt fixup at random places in completely 
dt unrelated code paths.

What we really have are 3 semantically separate entities:

  * QEMU internal cpu id
  * KVM internal cpu id
  * DT exposed cpu id

As you have noted, it's a good idea to keep the QEMU internal cpu id linear, 
thus completely separate from the others. The DT exposed cpu id should be 100% 
local to hw/ppc/spapr*.c. I don't think any code outside of the DT generation 
and anything that accesses the Virtual Processor Number in sPAPR needs to 
care about the DT cpu id. All that code is 100% KVM agnostic.

The KVM internal cpu id should probably be a new field in the generic CPUState 
that gets used by kvm specific code that needs to know the KVM internal cpu id 
a vcpu was created with. The flow should be that kvm_arch_vcpu_id() tells 
kvm_init_vcpu() the kvm internal id to use which then stores it in CPUState. 
That way you can always get the KVM intenal cpu id from a CPU struct. All the 
references to this ID should _only_ happen from KVM only code.


Alex




Re: [Qemu-devel] How to introduce bs-node_name ?

2013-11-04 Thread Fam Zheng


On 11/04/2013 05:31 PM, Stefan Hajnoczi wrote:

On Wed, Oct 30, 2013 at 02:49:32PM +0100, Markus Armbruster wrote:

The first proposal is to add another parameter, say id.  Users can
then refer either to an arbitrary BDS by id, or (for backward
compatibility) to the root BDS by device.  When the code sees
device, it'll look up the BB, then fetch its root BDS.

CON: Existing parameter device becomes compatibility cruft.

PRO: Clean and obvious semantics (in my opinion).

This proposal gets my vote.


The second proposal is to press the existing parameter device into
service for referring to BDS node_name.

To keep backward compatibility, we obviously need to ensure that
whenever the old code accepts a value of device, the new code accepts
it as well, and both resolve it to the same BDS.

Different legacy commands given the same device name might need to
operate on different nodes.

Could you give an example for this?



Dynamic renaming does not solve this
problem, so I'm not convinced we can always choose a device name
matching a node name.

Device name commands are higher-level than graph node commands.  For
example, block_set_io_throttle makes sense on a device but less sense on
a graph node, unless we add the implicit assumption that the new
throttling node is created on top of the given node or updated in place
if the throttling node already exists (!!).
Throttling a node could be useful too, for example if we want to 
throttle backing_hd which is on shared storage, but not to throttle on 
the local image.


My ignorant question is: Why can't we just use one namespace, make sure 
no name collision between node_name and device_name, or even just drop 
device_name, so we treat the root node's node_name as device_name? For 
commands that only accept a device, this can be enforced in its 
implementation by checking against the whole graph to verify this.


Fam



Re: [Qemu-devel] [PATCH] ossaudio: do not enable by default

2013-11-04 Thread Gerd Hoffmann
On So, 2013-11-03 at 08:45 -0800, Anthony Liguori wrote:
 Modern Linux's no longer support /dev/dsp so enabling it by
 default causes audio failures on newer Linux distros.

That will break sound on BSD.

I think we should do something like this instead:

--- a/configure
+++ b/configure
@@ -554,7 +554,7 @@ Haiku)
   LIBS=-lposix_error_mapper -lnetwork $LIBS
 ;;
 *)
-  audio_drv_list=oss
+  audio_drv_list=pa alsa oss
   audio_possible_drivers=oss alsa sdl esd pa
   linux=yes
   linux_user=yes

i.e. build pulseaudio and alsa by default on linux and prioritize them
over oss.

cheers,
  Gerd






Re: [Qemu-devel] [PATCH] exec: limit system memory size

2013-11-04 Thread Marcel Apfelbaum
On Mon, 2013-11-04 at 08:06 +0200, Michael S. Tsirkin wrote:
 The page table logic in exec.c assumes
 that memory addresses are at most TARGET_PHYS_ADDR_SPACE_BITS.
 
 But pci addresses are full 64 bit so if we try to render them ignoring
 the extra bits, we get strange effects with sections overlapping each
 other.
 
 To fix, simply limit the system memory size to
  1  TARGET_PHYS_ADDR_SPACE_BITS,
 pci addresses will be rendered within that.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  exec.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)
 
 diff --git a/exec.c b/exec.c
 index 030118e..c7a8df5 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -1801,7 +1801,12 @@ void address_space_destroy_dispatch(AddressSpace *as)
  static void memory_map_init(void)
  {
  system_memory = g_malloc(sizeof(*system_memory));
 -memory_region_init(system_memory, NULL, system, INT64_MAX);
 +
 +assert(TARGET_PHYS_ADDR_SPACE_BITS = 64);
 +
 +memory_region_init(system_memory, NULL, system,
 +   TARGET_PHYS_ADDR_SPACE_BITS == 64 ?
 +   UINT64_MAX : (0x1ULL  TARGET_PHYS_ADDR_SPACE_BITS));

Michael, thanks again for the help.

I am concerned that we cannot use all the UINT64_MAX
address space.

By the way, this patch makes the memory size aligned to page size,
so the call to register_subpage (the asserted code) is diverted
to register_multipage that does not have an assert.
That leads me to another question?
Maybe the fact that INT64_MAX is not aligned to page size makes
all the trouble? 

What do you think?

Regarding this patch:
Maybe we should to add an assert inside memory_region_init 
in order to protect all the code that creates memory regions?

And also maybe we should add a define MAX_MEMORY_REGION_SIZE
to be used in all places we want a maximum size memory region?

Thanks,
Marcel

  address_space_init(address_space_memory, system_memory, memory);
  
  system_io = g_malloc(sizeof(*system_io));






Re: [Qemu-devel] [PATCH] exec: fix regression by making system-memory region UINT64_MAX size

2013-11-04 Thread Michael S. Tsirkin
On Mon, Nov 04, 2013 at 11:33:56AM +0200, Marcel Apfelbaum wrote:
 On Mon, 2013-11-04 at 08:18 +0200, Michael S. Tsirkin wrote:
  On Sun, Nov 03, 2013 at 09:26:06PM +, Peter Maydell wrote:
   On 3 November 2013 20:48, Marcel Apfelbaum marce...@redhat.com wrote:
The problem appears when a root memory region within an
address space with size  UINT64_MAX has overlapping children
with the same size. If the size of the root memory region is UINT64_MAX
everyting is ok.
   
Solved the regression by making the system-memory region
of size UINT64_MAX instead of INT64_MAX.
   
Signed-off-by: Marcel Apfelbaum marce...@redhat.com
---
In the mean time I am investigating why the
root memory region has to be UINT64_MAX size in order
to have overlapping children
   
 system_memory = g_malloc(sizeof(*system_memory));
-memory_region_init(system_memory, NULL, system, INT64_MAX);
+memory_region_init(system_memory, NULL, system, UINT64_MAX);
 address_space_init(address_space_memory, system_memory, memory);
   
   As you say above we should investigate why this caused a
   problem, but I was surprised the system memory space isn't
   already maximum size. It turns out that that change was
   introduced in commit 8417cebf in an attempt to avoid overflow
   issues by sticking to signed 64 bit arithmetic. This approach was
   subsequently ditched in favour of using proper 128 bit arithmetic
   in commit 08dafab4, but we never changed the init call for
   the system memory back to UINT64_MAX. So I think this is
   a good change in itself.
   
   -- PMM
  
  I think I debugged it.
  
  So this patch seems to help simply because we only have
  sanity checking asserts in the subpage path. UINT64_MAX will make
  the region a number of full pages and avoid
  hitting the checks.
  
  
  I think I see what the issue is: exec.c
  assumes that TARGET_PHYS_ADDR_SPACE_BITS is enough
  to render any section in system memory:
  number of page table levels is calculated from that:
  
  #define P_L2_LEVELS \
  (((TARGET_PHYS_ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / L2_BITS) + 1)
  
  any other bits are simply ignored:
  
  for (i = P_L2_LEVELS - 1; i = 0  !lp.is_leaf; i--) {
  if (lp.ptr == PHYS_MAP_NODE_NIL) {
  return sections[PHYS_SECTION_UNASSIGNED];
  }
  p = nodes[lp.ptr];
  lp = p[(index  (i * L2_BITS))  (L2_SIZE - 1)];
  }
  
  so mask by L2_SIZE - 1 means that each round looks at L2_BITS bits,
  and there are at most P_L2_LEVELS.
  
  Any other bits are simply ignored.
 
 Michael, thanks for helping to debug this issue.
 Let me see if I got it right:
 If the system memory size is INT64_MAX (0x7fff), the address of 
 the
 last page (0x7) has more bits (55) that 
 TARGET_PHYS_ADDR_SPACE_BITS (52)
 and cannot be correctly mapped into page levels?
 
 Thanks,
 Marcel

Yes, I think that's it.

  This is very wrong and can break in a number of other ways,
  for example I think we will also hit this assert
  if we have a non aligned 64 bit BAR of a PCI device.
  
  I think the fastest solution is to just limit
  system memory size of TARGET_PAGE_BITS.
  I sent a patch like this.
  
  
  
 
 



Re: [Qemu-devel] [PATCH v3] ppc: introduce CPUPPCState::cpu_dt_id

2013-11-04 Thread Alexey Kardashevskiy
On 11/04/2013 08:41 PM, Alexander Graf wrote:
 
 On 04.11.2013, at 02:10, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 Normally CPUState::cpu_index is used to pick the right CPU for various
 operations. However default consecutive numbering does not always work
 for POWERPC.

 For example, on POWER7 (which supports 4 threads per core),
 -smp 8,threads=4 should create CPUs with indexes 0,1,2,3,4,5,6,7 and
 -smp 8,threads=1 should create CPUs with indexes 0,4,8,12,16,20,24,28.

 These indexes are reflected in /proc/device-tree/cpus/PowerPC,POWER7@XX
 and used to call KVM VCPU's ioctls. In order to achieve this,
 kvmppc_fixup_cpu() was introduced. Roughly speaking, it multiplies
 cpu_index by the number of threads per core.

 This approach has disadvantages such as:
 1. NUMA configuration stays broken after the fixup;
 2. CPU-related commands from QEMU Monitor do not work properly as
 the accept fixed CPU indexes and the user does not really know
 what they are after fixup as the number of threads per core changes
 between CPU versions and via QEMU command line.

 This introduces a new @cpu_dt_id field in the CPUPPCState struct which
 is set from @cpu_index by default but can be fixed later to the value
 which a hypervisor can accept. This also introduces two POWERPC-arch
 specific functions:
 1. int ppc_get_vcpu_dt_id(CPUState *cs) - returns a device-tree ID
 for a CPU;
 2. CPUState *ppc_get_vcpu_by_dt_id(int cpu_dt_id) - finds CPUState by
 a device-tree CPU ID.

 This uses the new functions to:
 1. fix emulated XICS hypercall handlers as they receive fixed CPU indexes;
 2. fix XICS-KVM to enable in-kernel XICS on right CPU;
 3. compose correct device-tree.

 This removes @cpu_index fixup as @cpu_dt_id is used instead so QEMU monitor
 can accept command-line CPU indexes again.
 
 So this patch feels awkward. We use the dt fixup at random places in 
 completely dt unrelated code paths.

Yep. I called it smp_cpu_index in the first place :)

 What we really have are 3 semantically separate entities:
 
   * QEMU internal cpu id
   * KVM internal cpu id
   * DT exposed cpu id
 

 As you have noted, it's a good idea to keep the QEMU internal cpu id
 linear, thus completely separate from the others. The DT exposed cpu id
 should be 100% local to hw/ppc/spapr*.c. I don't think any code outside
 of the DT generation and anything that accesses the Virtual Processor
 Number in sPAPR needs to care about the DT cpu id. All that code is
 100% KVM agnostic.
 
 The KVM internal cpu id should probably be a new field in the generic
 CPUState that gets used by kvm specific code that needs to know the KVM
 internal cpu id a vcpu was created with. The flow should be that
 kvm_arch_vcpu_id() tells kvm_init_vcpu() the kvm internal id to use
 which then stores it in CPUState. That way you can always get the KVM
 intenal cpu id from a CPU struct. All the references to this ID should
 _only_ happen from KVM only code.


If DT id is local to spapr*, then how do I implement kvm_arch_vcpu_id()?
Where will it get the fixed value? Do the same calculation in two different
places (for device tree and for kvm)?


-- 
Alexey



Re: [Qemu-devel] [PATCH] exec: limit system memory size

2013-11-04 Thread Michael S. Tsirkin
On Mon, Nov 04, 2013 at 11:50:05AM +0200, Marcel Apfelbaum wrote:
 On Mon, 2013-11-04 at 08:06 +0200, Michael S. Tsirkin wrote:
  The page table logic in exec.c assumes
  that memory addresses are at most TARGET_PHYS_ADDR_SPACE_BITS.
  
  But pci addresses are full 64 bit so if we try to render them ignoring
  the extra bits, we get strange effects with sections overlapping each
  other.
  
  To fix, simply limit the system memory size to
   1  TARGET_PHYS_ADDR_SPACE_BITS,
  pci addresses will be rendered within that.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   exec.c | 6 +-
   1 file changed, 5 insertions(+), 1 deletion(-)
  
  diff --git a/exec.c b/exec.c
  index 030118e..c7a8df5 100644
  --- a/exec.c
  +++ b/exec.c
  @@ -1801,7 +1801,12 @@ void address_space_destroy_dispatch(AddressSpace *as)
   static void memory_map_init(void)
   {
   system_memory = g_malloc(sizeof(*system_memory));
  -memory_region_init(system_memory, NULL, system, INT64_MAX);
  +
  +assert(TARGET_PHYS_ADDR_SPACE_BITS = 64);
  +
  +memory_region_init(system_memory, NULL, system,
  +   TARGET_PHYS_ADDR_SPACE_BITS == 64 ?
  +   UINT64_MAX : (0x1ULL  
  TARGET_PHYS_ADDR_SPACE_BITS));
 
 Michael, thanks again for the help.
 
 I am concerned that we cannot use all the UINT64_MAX
 address space.

Well, exec isn't ready for this, it expects at most
TARGET_PHYS_ADDR_SPACE_BITS.
Fortunately there's no way for CPU to initiate io outside
this area.

So this is another place where device to device IO
would be broken.

 By the way, this patch makes the memory size aligned to page size,
 so the call to register_subpage (the asserted code) is diverted
 to register_multipage that does not have an assert.

I tested with (0x1ULL  TARGET_PHYS_ADDR_SPACE_BITS) - 1
as well, works fine.

 That leads me to another question?
 Maybe the fact that INT64_MAX is not aligned to page size makes
 all the trouble? 

It's not what's causing the trouble, it's merely making
the bug visible since subpage is the only path that actually checks that
sections do not overlap.

 What do you think?
 
 Regarding this patch:
 Maybe we should to add an assert inside memory_region_init 
 in order to protect all the code that creates memory regions?

To make sure sections don't overlap? Sure. Go for it.

 And also maybe we should add a define MAX_MEMORY_REGION_SIZE
 to be used in all places we want a maximum size memory region?
 
 Thanks,
 Marcel

I think we can't define this in a target independent way really.

   address_space_init(address_space_memory, system_memory, memory);
   
   system_io = g_malloc(sizeof(*system_io));
 
 



Re: [Qemu-devel] [PATCH v3] ppc: introduce CPUPPCState::cpu_dt_id

2013-11-04 Thread Alexander Graf

On 04.11.2013, at 10:58, Alexey Kardashevskiy a...@ozlabs.ru wrote:

 On 11/04/2013 08:41 PM, Alexander Graf wrote:
 
 On 04.11.2013, at 02:10, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 Normally CPUState::cpu_index is used to pick the right CPU for various
 operations. However default consecutive numbering does not always work
 for POWERPC.
 
 For example, on POWER7 (which supports 4 threads per core),
 -smp 8,threads=4 should create CPUs with indexes 0,1,2,3,4,5,6,7 and
 -smp 8,threads=1 should create CPUs with indexes 0,4,8,12,16,20,24,28.
 
 These indexes are reflected in /proc/device-tree/cpus/PowerPC,POWER7@XX
 and used to call KVM VCPU's ioctls. In order to achieve this,
 kvmppc_fixup_cpu() was introduced. Roughly speaking, it multiplies
 cpu_index by the number of threads per core.
 
 This approach has disadvantages such as:
 1. NUMA configuration stays broken after the fixup;
 2. CPU-related commands from QEMU Monitor do not work properly as
 the accept fixed CPU indexes and the user does not really know
 what they are after fixup as the number of threads per core changes
 between CPU versions and via QEMU command line.
 
 This introduces a new @cpu_dt_id field in the CPUPPCState struct which
 is set from @cpu_index by default but can be fixed later to the value
 which a hypervisor can accept. This also introduces two POWERPC-arch
 specific functions:
 1. int ppc_get_vcpu_dt_id(CPUState *cs) - returns a device-tree ID
 for a CPU;
 2. CPUState *ppc_get_vcpu_by_dt_id(int cpu_dt_id) - finds CPUState by
 a device-tree CPU ID.
 
 This uses the new functions to:
 1. fix emulated XICS hypercall handlers as they receive fixed CPU indexes;
 2. fix XICS-KVM to enable in-kernel XICS on right CPU;
 3. compose correct device-tree.
 
 This removes @cpu_index fixup as @cpu_dt_id is used instead so QEMU monitor
 can accept command-line CPU indexes again.
 
 So this patch feels awkward. We use the dt fixup at random places in 
 completely dt unrelated code paths.
 
 Yep. I called it smp_cpu_index in the first place :)
 
 What we really have are 3 semantically separate entities:
 
  * QEMU internal cpu id
  * KVM internal cpu id
  * DT exposed cpu id
 
 
 As you have noted, it's a good idea to keep the QEMU internal cpu id
 linear, thus completely separate from the others. The DT exposed cpu id
 should be 100% local to hw/ppc/spapr*.c. I don't think any code outside
 of the DT generation and anything that accesses the Virtual Processor
 Number in sPAPR needs to care about the DT cpu id. All that code is
 100% KVM agnostic.
 
 The KVM internal cpu id should probably be a new field in the generic
 CPUState that gets used by kvm specific code that needs to know the KVM
 internal cpu id a vcpu was created with. The flow should be that
 kvm_arch_vcpu_id() tells kvm_init_vcpu() the kvm internal id to use
 which then stores it in CPUState. That way you can always get the KVM
 intenal cpu id from a CPU struct. All the references to this ID should
 _only_ happen from KVM only code.
 
 
 If DT id is local to spapr*, then how do I implement kvm_arch_vcpu_id()?
 Where will it get the fixed value? Do the same calculation in two different
 places (for device tree and for kvm)?

kvm_arch_vcpu_id() won't get called until qemu_init_vcpu() is issued from 
ppc_cpu_realizefn(). So if instead of calling cpu_ppc_init() you split up the 
function and set the kvm id property before realize from ppc_spapr_init(), that 
should work, no?


Alex




Re: [Qemu-devel] [RFC PATCH] spapr: add ibmveth to the supported network adapters list

2013-11-04 Thread Paolo Bonzini
Il 02/11/2013 13:07, Markus Armbruster ha scritto:
 Why do you even need a new category?  Just take everything in the
 network category and put it in the help.
 
 You *can't* currently construct -net nic,model=help output from
 registered qdev NICs, because:
 
 * -net nic generally doesn't accept all qdev NICs, and
 
 * the model names accepted by -net nic need *not* match the qdev device
   model names (example: virtio != virtio-net), and

Both could be solved by accepting legacy aliases, plus all non-no_user NICs.

 * -net nic may accept model names that map to a non-qdevified NIC (not
sure such NICs still exist).

I think any of those (and also SysBus NICs) would only accept the
default model.

Paolo



[Qemu-devel] [PATCH v2 0/3] Make thread pool implementation modular

2013-11-04 Thread Matthias Brugger
v2:
 - fix issues found by checkpatch.pl
 - change the descritpion of patch 3

This patch series makes the thread pool implementation modular.
This allows each drive to use a special implementation.
The patch series prepares qemu to be able to include thread pools different to
the one actually implemented. It will allow to implement approaches like
paravirtualized block requests [1].

[1] 
http://www.linux-kvm.org/wiki/images/5/53/2012-forum-Brugger-lightningtalk.pdf

Matthias Brugger (3):
  Make thread pool implementation modular
  Block layer uses modular thread pool
  Add workerthreads configuration option

 async.c |  4 ++--
 block/raw-posix.c   | 15 +++
 block/raw-win32.c   |  9 +++--
 blockdev.c  | 13 +
 include/block/aio.h |  2 +-
 include/block/thread-pool.h | 11 +++
 include/qemu-common.h   |  2 ++
 qemu-options.hx |  2 +-
 thread-pool.c   | 33 +
 9 files changed, 81 insertions(+), 10 deletions(-)

-- 
1.8.1.2




[Qemu-devel] [PATCH v2 3/3] Add workerthreads configuration option

2013-11-04 Thread Matthias Brugger
This patch allows to choose at the command line level which thread pool
implementation will be used by every block device.

Signed-off-by: Matthias Brugger matthias@gmail.com
---
 blockdev.c  | 13 +
 qemu-options.hx |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index b260477..a16cc9a 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -387,6 +387,15 @@ static DriveInfo *blockdev_init(QDict *bs_opts,
 }
 }
 #endif
+buf = qemu_opt_get(opts, workerthreads);
+if (buf != NULL) {
+if (!strcmp(buf, pool)) {
+/* this is the default */
+} else {
+error_report(invalid workerthreads option);
+return NULL;
+}
+}
 
 if ((buf = qemu_opt_get(opts, format)) != NULL) {
 if (is_help_option(buf)) {
@@ -2269,6 +2278,10 @@ QemuOptsList qemu_common_drive_opts = {
 .type = QEMU_OPT_STRING,
 .help = disk serial number,
 },{
+.name = workerthreads,
+.type = QEMU_OPT_STRING,
+.help = type of worker threads (pool),
+},{
 .name = rerror,
 .type = QEMU_OPT_STRING,
 .help = read error action,
diff --git a/qemu-options.hx b/qemu-options.hx
index 5dc8b75..6f22242 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -408,7 +408,7 @@ DEF(drive, HAS_ARG, QEMU_OPTION_drive,
[,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n

[,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n
[,serial=s][,addr=A][,id=name][,aio=threads|native]\n
-   [,readonly=on|off][,copy-on-read=on|off]\n
+   [,workerthreads=pool][,readonly=on|off][,copy-on-read=on|off]\n
[[,bps=b]|[[,bps_rd=r][,bps_wr=w]]]\n
[[,iops=i]|[[,iops_rd=r][,iops_wr=w]]]\n
[[,bps_max=bm]|[[,bps_rd_max=rm][,bps_wr_max=wm]]]\n
-- 
1.8.1.2




[Qemu-devel] [PATCH v2 1/3] Make thread pool implementation modular

2013-11-04 Thread Matthias Brugger
This patch introduces function pointers for the thread pool, so that
it's implementation can be set at run-time.

Signed-off-by: Matthias Brugger matthias@gmail.com
---
 include/block/thread-pool.h | 11 +++
 thread-pool.c   | 33 +
 2 files changed, 44 insertions(+)

diff --git a/include/block/thread-pool.h b/include/block/thread-pool.h
index 32afcdd..1f73712 100644
--- a/include/block/thread-pool.h
+++ b/include/block/thread-pool.h
@@ -38,4 +38,15 @@ int coroutine_fn thread_pool_submit_co(ThreadPool *pool,
 ThreadPoolFunc *func, void *arg);
 void thread_pool_submit(ThreadPool *pool, ThreadPoolFunc *func, void *arg);
 
+ThreadPoolFuncArr *thread_pool_probe(void);
+void thread_pool_delete(ThreadPoolFuncArr *tpf);
+
+struct ThreadPoolFuncArr {
+BlockDriverAIOCB *(*thread_pool_submit_aio)(ThreadPool *pool,
+ThreadPoolFunc *func, void *arg, BlockDriverCompletionFunc *cb,
+void *opaque);
+ThreadPool *(*thread_pool_new)(AioContext *ctx);
+};
+
+
 #endif
diff --git a/thread-pool.c b/thread-pool.c
index 3735fd3..53294a9 100644
--- a/thread-pool.c
+++ b/thread-pool.c
@@ -26,6 +26,7 @@
 #include qemu/main-loop.h
 
 static void do_spawn_thread(ThreadPool *pool);
+static void thread_pool_aio_free(ThreadPool *pool);
 
 typedef struct ThreadPoolElement ThreadPoolElement;
 
@@ -77,6 +78,7 @@ struct ThreadPool {
 int pending_threads; /* threads created but not running yet */
 int pending_cancellations; /* whether we need a cond_broadcast */
 bool stopping;
+void (*thread_pool_free)(ThreadPool *pool);
 };
 
 static void *worker_thread(void *opaque)
@@ -300,6 +302,7 @@ static void thread_pool_init_one(ThreadPool *pool, 
AioContext *ctx)
 qemu_sem_init(pool-sem, 0);
 pool-max_threads = 64;
 pool-new_thread_bh = aio_bh_new(ctx, spawn_thread_bh_fn, pool);
+pool-thread_pool_free = thread_pool_aio_free;
 
 QLIST_INIT(pool-head);
 QTAILQ_INIT(pool-request_list);
@@ -316,6 +319,11 @@ ThreadPool *thread_pool_new(AioContext *ctx)
 
 void thread_pool_free(ThreadPool *pool)
 {
+pool-thread_pool_free(pool);
+}
+
+void thread_pool_aio_free(ThreadPool *pool)
+{
 if (!pool) {
 return;
 }
@@ -346,3 +354,28 @@ void thread_pool_free(ThreadPool *pool)
 event_notifier_cleanup(pool-notifier);
 g_free(pool);
 }
+
+ThreadPoolFuncArr *thread_pool_probe(void)
+{
+ThreadPoolFuncArr *tpf_pool = NULL;
+
+if (tpf_pool) {
+return tpf_pool;
+}
+
+tpf_pool = g_new(ThreadPoolFuncArr, 1);
+if (!tpf_pool) {
+printf(error allocating thread pool\n);
+return NULL;
+}
+
+tpf_pool-thread_pool_submit_aio = thread_pool_submit_aio;
+tpf_pool-thread_pool_new = thread_pool_new;
+
+return tpf_pool;
+}
+
+void thread_pool_delete(ThreadPoolFuncArr *tpf)
+{
+g_free(tpf);
+}
-- 
1.8.1.2




[Qemu-devel] [PATCH v2 2/3] Block layer uses modular thread pool

2013-11-04 Thread Matthias Brugger
With this patch, the calls to the thread pool functions pass through the
new modular thread pool implementation.

Signed-off-by: Matthias Brugger matthias@gmail.com
---
 async.c   |  4 ++--
 block/raw-posix.c | 15 +++
 block/raw-win32.c |  9 +++--
 include/block/aio.h   |  2 +-
 include/qemu-common.h |  2 ++
 5 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/async.c b/async.c
index 5fb3fa6..e66f70f 100644
--- a/async.c
+++ b/async.c
@@ -232,10 +232,10 @@ GSource *aio_get_g_source(AioContext *ctx)
 return ctx-source;
 }
 
-ThreadPool *aio_get_thread_pool(AioContext *ctx)
+ThreadPool *aio_get_thread_pool(AioContext *ctx, ThreadPoolFuncArr *tpf)
 {
 if (!ctx-thread_pool) {
-ctx-thread_pool = thread_pool_new(ctx);
+ctx-thread_pool = tpf-thread_pool_new(ctx);
 }
 return ctx-thread_pool;
 }
diff --git a/block/raw-posix.c b/block/raw-posix.c
index f6d48bb..f747301 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -142,6 +142,7 @@ typedef struct BDRVRawState {
 bool is_xfs : 1;
 #endif
 bool has_discard : 1;
+ThreadPoolFuncArr *tpf;
 } BDRVRawState;
 
 typedef struct BDRVRawReopenState {
@@ -345,6 +346,9 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 int ret;
 
 s-type = FTYPE_FILE;
+
+s-tpf = thread_pool_probe();
+
 ret = raw_open_common(bs, options, flags, 0, local_err);
 if (error_is_set(local_err)) {
 error_propagate(errp, local_err);
@@ -792,6 +796,7 @@ static BlockDriverAIOCB *paio_submit(BlockDriverState *bs, 
int fd,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockDriverCompletionFunc *cb, void *opaque, int type)
 {
+BDRVRawState *s = bs-opaque;
 RawPosixAIOData *acb = g_slice_new(RawPosixAIOData);
 ThreadPool *pool;
 
@@ -807,8 +812,8 @@ static BlockDriverAIOCB *paio_submit(BlockDriverState *bs, 
int fd,
 acb-aio_offset = sector_num * 512;
 
 trace_paio_submit(acb, opaque, sector_num, nb_sectors, type);
-pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
-return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
+pool = aio_get_thread_pool(bdrv_get_aio_context(bs), s-tpf);
+return s-tpf-thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
 }
 
 static BlockDriverAIOCB *raw_aio_submit(BlockDriverState *bs,
@@ -874,6 +879,8 @@ static void raw_close(BlockDriverState *bs)
 qemu_close(s-fd);
 s-fd = -1;
 }
+
+thread_pool_delete(s-tpf);
 }
 
 static int raw_truncate(BlockDriverState *bs, int64_t offset)
@@ -1490,8 +1497,8 @@ static BlockDriverAIOCB *hdev_aio_ioctl(BlockDriverState 
*bs,
 acb-aio_offset = 0;
 acb-aio_ioctl_buf = buf;
 acb-aio_ioctl_cmd = req;
-pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
-return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
+pool = aio_get_thread_pool(bdrv_get_aio_context(bs), s-tpf);
+return s-tpf-thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
 }
 
 #elif defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
diff --git a/block/raw-win32.c b/block/raw-win32.c
index 2741e4d..76df266 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -46,6 +46,7 @@ typedef struct RawWin32AIOData {
 size_t aio_nbytes;
 off64_t aio_offset;
 int aio_type;
+ThreadPoolFuncArr *tpf;
 } RawWin32AIOData;
 
 typedef struct BDRVRawState {
@@ -159,8 +160,8 @@ static BlockDriverAIOCB *paio_submit(BlockDriverState *bs, 
HANDLE hfile,
 acb-aio_offset = sector_num * 512;
 
 trace_paio_submit(acb, opaque, sector_num, nb_sectors, type);
-pool = aio_get_thread_pool(bdrv_get_aio_context(bs));
-return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
+pool = aio_get_thread_pool(bdrv_get_aio_context(bs), s-tpf);
+return s-tpf-thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque);
 }
 
 int qemu_ftruncate64(int fd, int64_t length)
@@ -248,6 +249,8 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 
 s-type = FTYPE_FILE;
 
+s-tpf = thread_pool_probe();
+
 opts = qemu_opts_create_nofail(raw_runtime_opts);
 qemu_opts_absorb_qdict(opts, options, local_err);
 if (error_is_set(local_err)) {
@@ -339,6 +342,8 @@ static void raw_close(BlockDriverState *bs)
 {
 BDRVRawState *s = bs-opaque;
 CloseHandle(s-hfile);
+
+thread_pool_delete(s-tpf);
 }
 
 static int raw_truncate(BlockDriverState *bs, int64_t offset)
diff --git a/include/block/aio.h b/include/block/aio.h
index 2efdf41..22d6fa0 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -231,7 +231,7 @@ void aio_set_event_notifier(AioContext *ctx,
 GSource *aio_get_g_source(AioContext *ctx);
 
 /* Return the ThreadPool bound to this AioContext */
-struct ThreadPool *aio_get_thread_pool(AioContext *ctx);
+struct ThreadPool *aio_get_thread_pool(AioContext *ctx, ThreadPoolFuncArr 
*tpf);
 
 /* Functions to operate on the main QEMU 

Re: [Qemu-devel] [PATCH 3/3] block: per caller dirty bitmap

2013-11-04 Thread Paolo Bonzini
Il 04/11/2013 07:59, Fam Zheng ha scritto:

 I think callers outside block.c should only call
 hbitmap_set/hbitmap_reset; resetting is typically done before processing
 sectors and setting after an error (both of which happen privately to
 each task).

 Thus you probably should add a fourth patch which makes
 bdrv_(re)set_dirty static and remove
 bdrv_get_dirty/bdrv_dirty_iter_init/bdrv_get_dirty_count.
 I like the idea of adding a wrapper struct (will be BdrvDirtyBitmap) to
 HBitmap so that patch 1 is not needed, and HBitmap becomes (almost)
 internal to block.c.
 
 But I'm not sure removing
 bdrv_get_dirty/bdrv_dirty_iter_init/bdrv_get_dirty_count is good, as we
 are exposing BdrvDirtyBitmap, we should also provide operations on it,
 instead of let callers to handle HBitmap pointer inside. In other words,
 I prefer to define BdrvDirtyBitmap structure in block.c and only put a
 type declaration in header.

If you want to expose BdrvDirtyBitmap, having wrappers is fine.  I was
thinking of exposing HBitmap instead and keeping BdrvDirtyBitmap internal.

Either way is fine, and as the person who writes the code you have the
privilege of making the choice. :)

Paolo



Re: [Qemu-devel] [PATCH v2] ppc: introduce CPUPPCState::cpu_dt_id

2013-11-04 Thread Paolo Bonzini
Il 01/11/2013 04:21, Alexey Kardashevskiy ha scritto:
 Normally CPUState::cpu_index is used to pick the right CPU for various
 operations. However default consecutive numbering does not always work
 for POWERPC.
 
 For example, on POWER7 (which supports 4 threads per core),
 -smp 8,threads=4 should create CPUs with indexes 0,1,2,3,4,5,6,7 and
 -smp 8,threads=1 should create CPUs with indexes 0,4,8,12,16,20,24,28.
 
 These indexes are reflected in /proc/device-tree/cpus/PowerPC,POWER7@XX
 and used to call KVM VCPU's ioctls. In order to achieve this,
 kvmppc_fixup_cpu() was introduced. Roughly speaking, it multiplies
 cpu_index by the number of threads per core.
 
 This approach has disadvantages such as:
 1. NUMA configuration stays broken after the fixup;
 2. CPU-related commands from QEMU Monitor do not work properly as
 the accept fixed CPU indexes and the user does not really know
 what they are after fixup as the number of threads per core changes
 between CPU versions and via QEMU command line.
 
 This introduces a new @cpu_dt_id field in the CPUPPCState struct which
 is set from @cpu_index by default but can be fixed later to the value
 which a hypervisor can accept. This also introduces two POWERPC-arch
 specific functions:
 1. int ppc_get_vcpu_dt_id(CPUState *cs) - returns a device-tree ID
 for a CPU;
 2. CPUState *ppc_get_vcpu_by_dt_id(int cpu_dt_id) - finds CPUState by
 a device-tree CPU ID.
 
 This uses the new functions to:
 1. fix emulated XICS hypercall handlers as they receive fixed CPU indexes;
 2. fix XICS-KVM to enable in-kernel XICS on right CPU;
 3. compose correct device-tree.
 
 This removes @cpu_index fixup as @cpu_dt_id is used instead so QEMU monitor
 can accept command-line CPU indexes again.
 
 Cc: Badari Pulavarty  pbad...@linux.vnet.ibm.com
 Cc: Paul Mackerras pau...@samba.org
 Cc: David Gibson da...@gibson.dropbear.id.au
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 Changes:
 v2:
 * added PPC-specific ppc_get_vcpu_dt_id() and ppc_get_vcpu_by_dt_id()
 * fixed kvm_arch_vcpu_id() to use ppc_get_vcpu_dt_id()
 * fixed emulated XICS
 * removed kvm_arch_vcpu_id() stub for non-KVM case

Not having non-PPC code in the patch is definitely a good sign!

Acked-by: Paolo Bonzini pbonz...@redhat.com



Re: [Qemu-devel] [PATCH v2] block: per caller dirty bitmap

2013-11-04 Thread Paolo Bonzini
Il 04/11/2013 10:30, Fam Zheng ha scritto:
 diff --git a/include/block/block.h b/include/block/block.h
 index 3560deb..06f424c 100644
 --- a/include/block/block.h
 +++ b/include/block/block.h
 @@ -388,12 +388,15 @@ void *qemu_blockalign(BlockDriverState *bs, size_t 
 size);
  bool bdrv_qiov_is_aligned(BlockDriverState *bs, QEMUIOVector *qiov);
  
  struct HBitmapIter;
 -void bdrv_set_dirty_tracking(BlockDriverState *bs, int granularity);
 -int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
 +typedef struct BdrvDirtyBitmap BdrvDirtyBitmap;
 +BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs, int 
 granularity);
 +void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap 
 *bitmap);
 +int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t 
 sector);
  void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int 
 nr_sectors);
  void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int 
 nr_sectors);
 -void bdrv_dirty_iter_init(BlockDriverState *bs, struct HBitmapIter *hbi);
 -int64_t bdrv_get_dirty_count(BlockDriverState *bs);
 +void bdrv_dirty_iter_init(BlockDriverState *bs,
 +  BdrvDirtyBitmap *bitmap, struct HBitmapIter *hbi);
 +int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
  
  void bdrv_enable_copy_on_read(BlockDriverState *bs);
  void bdrv_disable_copy_on_read(BlockDriverState *bs);

You do not really need the BDS argument to the functions, do you?  (Or
do you have other plans?)

Apart from this, looks good.

Paolo



[Qemu-devel] [PATCH] pc: disable acpi info for isapc and old pc machine

2013-11-04 Thread Michael S. Tsirkin
Disable acpi build for isapc and no_kvmclock machine
types (used by xen), since acpi build currently expects pci.

Reported-by: Andreas Färber afaer...@suse.de
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 hw/i386/pc_piix.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 24a98cb..4fdb7b6 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -309,6 +309,7 @@ static void pc_init_pci_1_2(QEMUMachineInitArgs *args)
 static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs *args)
 {
 has_pci_info = false;
+has_acpi_build = false;
 disable_kvm_pv_eoi();
 enable_compat_apic_id_mode();
 pc_init1(args, 1, 0);
@@ -317,6 +318,7 @@ static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs 
*args)
 static void pc_init_isa(QEMUMachineInitArgs *args)
 {
 has_pci_info = false;
+has_acpi_build = false;
 if (!args-cpu_model) {
 args-cpu_model = 486;
 }
-- 
MST



Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Alexander Graf

On 01.11.2013, at 11:21, Alexey Kardashevskiy a...@ozlabs.ru wrote:

 SLOF gets really confused if RTAS/device-tree and everything else
 what SLOF can use is not in the very first block of the very first
 memory node.
 
 This makes sure that the RMA area is where SLOF expects it to be.
 
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: Nikunj A Dadhania nik...@linux.vnet.ibm.com
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 hw/ppc/spapr.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)
 
 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
 index 09dc635..09a5d94 100644
 --- a/hw/ppc/spapr.c
 +++ b/hw/ppc/spapr.c
 @@ -1113,7 +1113,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
 int i;
 MemoryRegion *sysmem = get_system_memory();
 MemoryRegion *ram = g_new(MemoryRegion, 1);
 -hwaddr rma_alloc_size;
 +hwaddr rma_alloc_size, node0_size;
 uint32_t initrd_base = 0;
 long kernel_size = 0, initrd_size = 0;
 long load_limit, rtas_limit, fw_size;
 @@ -1154,6 +1154,12 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
 spapr-rma_size = MIN(spapr-rma_size, 0x1000);
 }
 }
 +/*
 + * SLOF gets confused if RMA resides not in the first block
 + * of the first memory node so let's fix it.
 + */
 +node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
 +spapr-rma_size = MIN(spapr-rma_size, node0_size);

So if I create a NUMA node of 4MB that will be my RMA? That sounds pretty 
broken, especially on 970.

Why does SLOF have any issues with NUMA memory nodes? It can just ignore them, 
no?


Alex




Re: [Qemu-devel] [PATCH v2] block: per caller dirty bitmap

2013-11-04 Thread Fam Zheng


On 11/04/2013 06:37 PM, Paolo Bonzini wrote:

Il 04/11/2013 10:30, Fam Zheng ha scritto:

diff --git a/include/block/block.h b/include/block/block.h
index 3560deb..06f424c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -388,12 +388,15 @@ void *qemu_blockalign(BlockDriverState *bs, size_t size);
  bool bdrv_qiov_is_aligned(BlockDriverState *bs, QEMUIOVector *qiov);
  
  struct HBitmapIter;

-void bdrv_set_dirty_tracking(BlockDriverState *bs, int granularity);
-int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
+typedef struct BdrvDirtyBitmap BdrvDirtyBitmap;
+BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs, int 
granularity);
+void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
+int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap, int64_t 
sector);
  void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
  void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int 
nr_sectors);
-void bdrv_dirty_iter_init(BlockDriverState *bs, struct HBitmapIter *hbi);
-int64_t bdrv_get_dirty_count(BlockDriverState *bs);
+void bdrv_dirty_iter_init(BlockDriverState *bs,
+  BdrvDirtyBitmap *bitmap, struct HBitmapIter *hbi);
+int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap *bitmap);
  
  void bdrv_enable_copy_on_read(BlockDriverState *bs);

  void bdrv_disable_copy_on_read(BlockDriverState *bs);

You do not really need the BDS argument to the functions, do you?  (Or
do you have other plans?)


I just wanted to keep the pattern of those bdrv_* family, no other plans 
for it.


Fam




[Qemu-devel] [RESEND] [PATCH] hw/9pfs: fix P9_STATS_GEN handling

2013-11-04 Thread Kirill A. Shutemov
From: Kirill A. Shutemov kirill.shute...@linux.intel.com

Currently we have few issues with P9_STATS_GEN:

 - We don't try to read st_gen anything except files or directories, but
   still set P9_STATS_GEN bit in st_result_mask. It may mislead client:
   we present garbage as valid st_gen.

 - If we failed to get valid st_gen with ENOTTY, we ignore error, but
   still set P9_STATS_GEN bit in st_result_mask.

 - If we failed to get valid st_gen with any other errno, we fail
   getattr altogether. It's excessive: we block valid client use-cases,
   like chdir(2) to non-readable directory with execution bit set.

The patch fixes these issues and cleanup code a bit.

Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com
Reviewed-by: Daniel P. Berrange berra...@redhat.com
---
 hw/9pfs/cofile.c   |  4 
 hw/9pfs/virtio-9p-handle.c |  8 +++-
 hw/9pfs/virtio-9p-local.c  | 10 ++
 hw/9pfs/virtio-9p-proxy.c  |  3 ++-
 hw/9pfs/virtio-9p.c| 12 ++--
 5 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/hw/9pfs/cofile.c b/hw/9pfs/cofile.c
index 194c1306c665..2efebf35710f 100644
--- a/hw/9pfs/cofile.c
+++ b/hw/9pfs/cofile.c
@@ -38,10 +38,6 @@ int v9fs_co_st_gen(V9fsPDU *pdu, V9fsPath *path, mode_t 
st_mode,
 });
 v9fs_path_unlock(s);
 }
-/* The ioctl may not be supported depending on the path */
-if (err == -ENOTTY) {
-err = 0;
-}
 return err;
 }
 
diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c
index fe8e0ed19dcc..17002a3d2867 100644
--- a/hw/9pfs/virtio-9p-handle.c
+++ b/hw/9pfs/virtio-9p-handle.c
@@ -582,6 +582,7 @@ static int handle_unlinkat(FsContext *ctx, V9fsPath *dir,
 static int handle_ioc_getversion(FsContext *ctx, V9fsPath *path,
  mode_t st_mode, uint64_t *st_gen)
 {
+#ifdef FS_IOC_GETVERSION
 int err;
 V9fsFidOpenState fid_open;
 
@@ -590,7 +591,8 @@ static int handle_ioc_getversion(FsContext *ctx, V9fsPath 
*path,
  * We can get fd for regular files and directories only
  */
 if (!S_ISREG(st_mode)  !S_ISDIR(st_mode)) {
-return 0;
+errno = ENOTTY;
+return -1;
 }
 err = handle_open(ctx, path, O_RDONLY, fid_open);
 if (err  0) {
@@ -599,6 +601,10 @@ static int handle_ioc_getversion(FsContext *ctx, V9fsPath 
*path,
 err = ioctl(fid_open.fd, FS_IOC_GETVERSION, st_gen);
 handle_close(ctx, fid_open);
 return err;
+#else
+errno = ENOTTY;
+return -1;
+#endif
 }
 
 static int handle_init(FsContext *ctx)
diff --git a/hw/9pfs/virtio-9p-local.c b/hw/9pfs/virtio-9p-local.c
index fc93e9e6e8da..df0dbffa7ac4 100644
--- a/hw/9pfs/virtio-9p-local.c
+++ b/hw/9pfs/virtio-9p-local.c
@@ -1068,8 +1068,8 @@ err_out:
 static int local_ioc_getversion(FsContext *ctx, V9fsPath *path,
 mode_t st_mode, uint64_t *st_gen)
 {
-int err;
 #ifdef FS_IOC_GETVERSION
+int err;
 V9fsFidOpenState fid_open;
 
 /*
@@ -1077,7 +1077,8 @@ static int local_ioc_getversion(FsContext *ctx, V9fsPath 
*path,
  * We can get fd for regular files and directories only
  */
 if (!S_ISREG(st_mode)  !S_ISDIR(st_mode)) {
-return 0;
+errno = ENOTTY;
+return -1;
 }
 err = local_open(ctx, path, O_RDONLY, fid_open);
 if (err  0) {
@@ -1085,10 +1086,11 @@ static int local_ioc_getversion(FsContext *ctx, 
V9fsPath *path,
 }
 err = ioctl(fid_open.fd, FS_IOC_GETVERSION, st_gen);
 local_close(ctx, fid_open);
+return err;
 #else
-err = -ENOTTY;
+errno = ENOTTY;
+return -1;
 #endif
-return err;
 }
 
 static int local_init(FsContext *ctx)
diff --git a/hw/9pfs/virtio-9p-proxy.c b/hw/9pfs/virtio-9p-proxy.c
index 5f44bb758b35..b57966d9d883 100644
--- a/hw/9pfs/virtio-9p-proxy.c
+++ b/hw/9pfs/virtio-9p-proxy.c
@@ -1086,7 +1086,8 @@ static int proxy_ioc_getversion(FsContext *fs_ctx, 
V9fsPath *path,
  * we can get fd for regular files and directories only
  */
 if (!S_ISREG(st_mode)  !S_ISDIR(st_mode)) {
-return 0;
+errno = ENOTTY;
+return -1;
 }
 err = v9fs_request(fs_ctx-private, T_GETVERSION, st_gen, s, path);
 if (err  0) {
diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c
index 8cbb8ae32a03..3e51fcd152f8 100644
--- a/hw/9pfs/virtio-9p.c
+++ b/hw/9pfs/virtio-9p.c
@@ -1080,10 +1080,18 @@ static void v9fs_getattr(void *opaque)
 /*  fill st_gen if requested and supported by underlying fs */
 if (request_mask  P9_STATS_GEN) {
 retval = v9fs_co_st_gen(pdu, fidp-path, stbuf.st_mode, v9stat_dotl);
-if (retval  0) {
+switch (retval) {
+case 0:
+/* we have valid st_gen: update result mask */
+v9stat_dotl.st_result_mask |= P9_STATS_GEN;
+break;
+case -EINTR:
+/* request cancelled */
 goto out;
+default:
+/* failed to get st_gen: not fatal, 

Re: [Qemu-devel] [PATCH] exec: limit system memory size

2013-11-04 Thread Paolo Bonzini
Il 04/11/2013 11:07, Michael S. Tsirkin ha scritto:
 On Mon, Nov 04, 2013 at 11:50:05AM +0200, Marcel Apfelbaum wrote:
 On Mon, 2013-11-04 at 08:06 +0200, Michael S. Tsirkin wrote:
 The page table logic in exec.c assumes
 that memory addresses are at most TARGET_PHYS_ADDR_SPACE_BITS.

 But pci addresses are full 64 bit so if we try to render them ignoring
 the extra bits, we get strange effects with sections overlapping each
 other.

 To fix, simply limit the system memory size to
  1  TARGET_PHYS_ADDR_SPACE_BITS,
 pci addresses will be rendered within that.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  exec.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

 diff --git a/exec.c b/exec.c
 index 030118e..c7a8df5 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -1801,7 +1801,12 @@ void address_space_destroy_dispatch(AddressSpace *as)
  static void memory_map_init(void)
  {
  system_memory = g_malloc(sizeof(*system_memory));
 -memory_region_init(system_memory, NULL, system, INT64_MAX);
 +
 +assert(TARGET_PHYS_ADDR_SPACE_BITS = 64);
 +
 +memory_region_init(system_memory, NULL, system,
 +   TARGET_PHYS_ADDR_SPACE_BITS == 64 ?
 +   UINT64_MAX : (0x1ULL  
 TARGET_PHYS_ADDR_SPACE_BITS));

 Michael, thanks again for the help.

 I am concerned that we cannot use all the UINT64_MAX
 address space.
 
 Well, exec isn't ready for this, it expects at most
 TARGET_PHYS_ADDR_SPACE_BITS.
 Fortunately there's no way for CPU to initiate io outside
 this area.
 
 So this is another place where device to device IO
 would be broken.

However, firmware that places BARs where the CPU cannot access them 
would be interesting to say the least.  This applies both to device-
device I/O and to non-aligned 4K BARs.

This patch looks good; however, on top of it can you test 
kvm-unit-tests with TARGET_PHYS_ADDR_SPACE_BITS=64 and see whether 
there is a measurable slowdown (in the inl_from_qemu tests)?  If not, 
we can just get rid of TARGET_PHYS_ADDR_SPACE_BITS in exec.c.

Note that L2_BITS is shared between translate-all.c and exec.c only for 
historical reasons (the translate-all.c code used to be in exec.c).  It 
is probably a good idea to split them like this:

diff --git a/exec.c b/exec.c
index 2e31ffc..3faea0e 100644
--- a/exec.c
+++ b/exec.c
@@ -88,7 +88,15 @@ struct PhysPageEntry {
 uint16_t ptr : 15;
 };
 
-typedef PhysPageEntry Node[L2_SIZE];
+/* Size of the L2 (and L3, etc) page tables.  */
+#define ADDR_SPACE_BITS 64
+
+#define P_L2_BITS 10
+#define P_L2_SIZE (1  P_L2_BITS)
+
+#define P_L2_LEVELS (((ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / P_L2_BITS) + 
1)
+
+typedef PhysPageEntry Node[P_L2_SIZE];
 
 struct AddressSpaceDispatch {
 /* This is a multi-level map on the physical address space.
@@ -155,7 +163,7 @@ static uint16_t phys_map_node_alloc(void)
 ret = next_map.nodes_nb++;
 assert(ret != PHYS_MAP_NODE_NIL);
 assert(ret != next_map.nodes_nb_alloc);
-for (i = 0; i  L2_SIZE; ++i) {
+for (i = 0; i  P_L2_SIZE; ++i) {
 next_map.nodes[ret][i].is_leaf = 0;
 next_map.nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
 }
@@ -168,13 +176,13 @@ static void phys_page_set_level(PhysPageEntry *lp, hwaddr 
*index,
 {
 PhysPageEntry *p;
 int i;
-hwaddr step = (hwaddr)1  (level * L2_BITS);
+hwaddr step = (hwaddr)1  (level * P_L2_BITS);
 
 if (!lp-is_leaf  lp-ptr == PHYS_MAP_NODE_NIL) {
 lp-ptr = phys_map_node_alloc();
 p = next_map.nodes[lp-ptr];
 if (level == 0) {
-for (i = 0; i  L2_SIZE; i++) {
+for (i = 0; i  P_L2_SIZE; i++) {
 p[i].is_leaf = 1;
 p[i].ptr = PHYS_SECTION_UNASSIGNED;
 }
@@ -182,9 +190,9 @@ static void phys_page_set_level(PhysPageEntry *lp, hwaddr 
*index,
 } else {
 p = next_map.nodes[lp-ptr];
 }
-lp = p[(*index  (level * L2_BITS))  (L2_SIZE - 1)];
+lp = p[(*index  (level * P_L2_BITS))  (P_L2_SIZE - 1)];
 
-while (*nb  lp  p[L2_SIZE]) {
+while (*nb  lp  p[P_L2_SIZE]) {
 if ((*index  (step - 1)) == 0  *nb = step) {
 lp-is_leaf = true;
 lp-ptr = leaf;
@@ -218,7 +226,7 @@ static MemoryRegionSection *phys_page_find(PhysPageEntry 
lp, hwaddr index,
 return sections[PHYS_SECTION_UNASSIGNED];
 }
 p = nodes[lp.ptr];
-lp = p[(index  (i * L2_BITS))  (L2_SIZE - 1)];
+lp = p[(index  (i * P_L2_BITS))  (P_L2_SIZE - 1)];
 }
 return sections[lp.ptr];
 }
diff --git a/translate-all.c b/translate-all.c
index aeda54d..1c63d78 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -96,12 +96,16 @@ typedef struct PageDesc {
 # define L1_MAP_ADDR_SPACE_BITS  TARGET_VIRT_ADDR_SPACE_BITS
 #endif
 
+/* Size of the L2 (and L3, etc) page tables.  */
+#define V_L2_BITS 10
+#define V_L2_SIZE (1  V_L2_BITS)
+
 /* The bits remaining after N lower levels of page tables.  */
 #define V_L1_BITS_REM \
-

Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Benjamin Herrenschmidt
On Mon, 2013-11-04 at 11:44 +0100, Alexander Graf wrote:
 On 01.11.2013, at 11:21, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
  SLOF gets really confused if RTAS/device-tree and everything else
  what SLOF can use is not in the very first block of the very first
  memory node.
  
  This makes sure that the RMA area is where SLOF expects it to be.
  
  Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
  Cc: Nikunj A Dadhania nik...@linux.vnet.ibm.com
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
  ---
  hw/ppc/spapr.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)
  
  diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
  index 09dc635..09a5d94 100644
  --- a/hw/ppc/spapr.c
  +++ b/hw/ppc/spapr.c
  @@ -1113,7 +1113,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
  int i;
  MemoryRegion *sysmem = get_system_memory();
  MemoryRegion *ram = g_new(MemoryRegion, 1);
  -hwaddr rma_alloc_size;
  +hwaddr rma_alloc_size, node0_size;
  uint32_t initrd_base = 0;
  long kernel_size = 0, initrd_size = 0;
  long load_limit, rtas_limit, fw_size;
  @@ -1154,6 +1154,12 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
  spapr-rma_size = MIN(spapr-rma_size, 0x1000);
  }
  }
  +/*
  + * SLOF gets confused if RMA resides not in the first block
  + * of the first memory node so let's fix it.
  + */
  +node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
  +spapr-rma_size = MIN(spapr-rma_size, node0_size);
 
 So if I create a NUMA node of 4MB that will be my RMA? That sounds pretty 
 broken, especially on 970.
 
 Why does SLOF have any issues with NUMA memory nodes? It can just ignore 
 them, no?

Because the only way SLOF knows about the RMA is by using the first
reg entry of the first memory node and that's *all* SLOF knows about.

If we start putting things like the DT, SLOF itself, etc... outside of
that region, it will crash.

So we constrain things to the rma that way.

Creating 4M nodes makes no sense anyway

Cheers,
Ben.





Re: [Qemu-devel] [PATCH v2] block: per caller dirty bitmap

2013-11-04 Thread Paolo Bonzini
Il 04/11/2013 11:47, Fam Zheng ha scritto:

 -void bdrv_set_dirty_tracking(BlockDriverState *bs, int granularity);
 -int bdrv_get_dirty(BlockDriverState *bs, int64_t sector);
 +typedef struct BdrvDirtyBitmap BdrvDirtyBitmap;
 +BdrvDirtyBitmap *bdrv_create_dirty_bitmap(BlockDriverState *bs, int
 granularity);
 +void bdrv_release_dirty_bitmap(BlockDriverState *bs, BdrvDirtyBitmap
 *bitmap);
 +int bdrv_get_dirty(BlockDriverState *bs, BdrvDirtyBitmap *bitmap,
 int64_t sector);
   void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int
 nr_sectors);
   void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector, int
 nr_sectors);
 -void bdrv_dirty_iter_init(BlockDriverState *bs, struct HBitmapIter
 *hbi);
 -int64_t bdrv_get_dirty_count(BlockDriverState *bs);
 +void bdrv_dirty_iter_init(BlockDriverState *bs,
 +  BdrvDirtyBitmap *bitmap, struct
 HBitmapIter *hbi);
 +int64_t bdrv_get_dirty_count(BlockDriverState *bs, BdrvDirtyBitmap
 *bitmap);
 void bdrv_enable_copy_on_read(BlockDriverState *bs);
   void bdrv_disable_copy_on_read(BlockDriverState *bs);
 You do not really need the BDS argument to the functions, do you?  (Or
 do you have other plans?)
 
 I just wanted to keep the pattern of those bdrv_* family, no other plans
 for it.

Kevin, Stefan, any second opinions?

Paolo



Re: [Qemu-devel] [PULL v2 00/30] Block patches

2013-11-04 Thread Kevin Wolf
Am 31.10.2013 um 21:50 hat Anthony Liguori geschrieben:
 On Thu, Oct 31, 2013 at 4:48 PM, Kevin Wolf kw...@redhat.com wrote:
  The following changes since commit fc8ead74674b7129e8f31c2595c76658e5622197:
 
Merge remote-tracking branch 'qemu-kvm/uq/master' into staging 
  (2013-10-18 10:03:24 -0700)
 
  are available in the git repository at:
 
 make check is failing for me.  It's very consistent.  See below:
 
   /x86_64/ide/bmdma/setup: OK
   /x86_64/ide/bmdma/simple_rw: OK
   /x86_64/ide/bmdma/short_prdt:OK
   /x86_64/ide/bmdma/long_prdt: OK
   /x86_64/ide/bmdma/no_busmaster:
 Broken pipe
 FAIL
 GTester: last random seed: R02Sc1c266d6688697d47a58af063ff343c5
 (pid=720)
   /x86_64/ide/bmdma/teardown:  FAIL
 GTester: last random seed: R02S2c481529260b1d513b7a498b45a5b420
 (pid=736)
 FAIL: tests/ide-test

I see that you already merged this, but just to make sure I ran 'make
check-qtest-x86_64' multiple times on two different machines on the tag
I used for this pull request, and I couldn't reproduce it.

I wonder what the difference between our test environments is.

Kevin



Re: [Qemu-devel] [PATCH] ossaudio: do not enable by default

2013-11-04 Thread Peter Maydell
On 3 November 2013 16:45, Anthony Liguori anth...@codemonkey.ws wrote:
 Modern Linux's no longer support /dev/dsp so enabling it by
 default causes audio failures on newer Linux distros.

 Signed-off-by: Anthony Liguori aligu...@amazon.com
 ---
  audio/ossaudio.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/audio/ossaudio.c b/audio/ossaudio.c
 index 007c641..3e04a58 100644
 --- a/audio/ossaudio.c
 +++ b/audio/ossaudio.c
 @@ -932,7 +932,7 @@ struct audio_driver oss_audio_driver = {
  .init   = oss_audio_init,
  .fini   = oss_audio_fini,
  .pcm_ops= oss_pcm_ops,
 -.can_be_default = 1,
 +.can_be_default = 0,
  .max_voices_out = INT_MAX,
  .max_voices_in  = INT_MAX,
  .voice_size_out = sizeof (OSSVoiceOut),

This doesn't look like the right fix for the problem
given in the commit message. If the oss init function
doesn't cleanly return a failure so we can loop round
and try another driver then the init function should
be fixed. If QEMU code itself is printing warnings
then we should silence them. If you're just seeing
warnings from some system audio library (ALSA/pulse/etc)
then either file upstream bugs or just pass configure
the correct audio driver option for your distro.

thanks
-- PMM



Re: [Qemu-devel] How to introduce bs-node_name ?

2013-11-04 Thread Kevin Wolf
Am 04.11.2013 um 10:48 hat Fam Zheng geschrieben:
 
 On 11/04/2013 05:31 PM, Stefan Hajnoczi wrote:
 On Wed, Oct 30, 2013 at 02:49:32PM +0100, Markus Armbruster wrote:
 The first proposal is to add another parameter, say id.  Users can
 then refer either to an arbitrary BDS by id, or (for backward
 compatibility) to the root BDS by device.  When the code sees
 device, it'll look up the BB, then fetch its root BDS.
 
 CON: Existing parameter device becomes compatibility cruft.
 
 PRO: Clean and obvious semantics (in my opinion).
 This proposal gets my vote.
 
 The second proposal is to press the existing parameter device into
 service for referring to BDS node_name.
 
 To keep backward compatibility, we obviously need to ensure that
 whenever the old code accepts a value of device, the new code accepts
 it as well, and both resolve it to the same BDS.
 Different legacy commands given the same device name might need to
 operate on different nodes.
 Could you give an example for this?
 
 
 Dynamic renaming does not solve this
 problem, so I'm not convinced we can always choose a device name
 matching a node name.
 
 Device name commands are higher-level than graph node commands.  For
 example, block_set_io_throttle makes sense on a device but less sense on
 a graph node, unless we add the implicit assumption that the new
 throttling node is created on top of the given node or updated in place
 if the throttling node already exists (!!).
 Throttling a node could be useful too, for example if we want to
 throttle backing_hd which is on shared storage, but not to throttle
 on the local image.
 
 My ignorant question is: Why can't we just use one namespace, make
 sure no name collision between node_name and device_name, or even
 just drop device_name, so we treat the root node's node_name as
 device_name? For commands that only accept a device, this can be
 enforced in its implementation by checking against the whole graph
 to verify this.

Markus described it somewhere in this thread: Live snapshots.
Currently, the device_name moves to the new BDS on the top (and
compatibility requires us to keep it that way), whereas a node name
should, of course, stay at its node.

When you consider this, the single namespace, as much as I would have
loved it, is pretty much dead.

Kevin



Re: [Qemu-devel] [PATCH] exec: limit system memory size

2013-11-04 Thread Michael S. Tsirkin
On Mon, Nov 04, 2013 at 11:54:34AM +0100, Paolo Bonzini wrote:
 Il 04/11/2013 11:07, Michael S. Tsirkin ha scritto:
  On Mon, Nov 04, 2013 at 11:50:05AM +0200, Marcel Apfelbaum wrote:
  On Mon, 2013-11-04 at 08:06 +0200, Michael S. Tsirkin wrote:
  The page table logic in exec.c assumes
  that memory addresses are at most TARGET_PHYS_ADDR_SPACE_BITS.
 
  But pci addresses are full 64 bit so if we try to render them ignoring
  the extra bits, we get strange effects with sections overlapping each
  other.
 
  To fix, simply limit the system memory size to
   1  TARGET_PHYS_ADDR_SPACE_BITS,
  pci addresses will be rendered within that.
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   exec.c | 6 +-
   1 file changed, 5 insertions(+), 1 deletion(-)
 
  diff --git a/exec.c b/exec.c
  index 030118e..c7a8df5 100644
  --- a/exec.c
  +++ b/exec.c
  @@ -1801,7 +1801,12 @@ void address_space_destroy_dispatch(AddressSpace 
  *as)
   static void memory_map_init(void)
   {
   system_memory = g_malloc(sizeof(*system_memory));
  -memory_region_init(system_memory, NULL, system, INT64_MAX);
  +
  +assert(TARGET_PHYS_ADDR_SPACE_BITS = 64);
  +
  +memory_region_init(system_memory, NULL, system,
  +   TARGET_PHYS_ADDR_SPACE_BITS == 64 ?
  +   UINT64_MAX : (0x1ULL  
  TARGET_PHYS_ADDR_SPACE_BITS));
 
  Michael, thanks again for the help.
 
  I am concerned that we cannot use all the UINT64_MAX
  address space.
  
  Well, exec isn't ready for this, it expects at most
  TARGET_PHYS_ADDR_SPACE_BITS.
  Fortunately there's no way for CPU to initiate io outside
  this area.
  
  So this is another place where device to device IO
  would be broken.
 
 However, firmware that places BARs where the CPU cannot access them 
 would be interesting to say the least.  This applies both to device-
 device I/O and to non-aligned 4K BARs.
 
 This patch looks good; however, on top of it can you test 
 kvm-unit-tests with TARGET_PHYS_ADDR_SPACE_BITS=64 and see whether 
 there is a measurable slowdown (in the inl_from_qemu tests)?  If not, 
 we can just get rid of TARGET_PHYS_ADDR_SPACE_BITS in exec.c.

I'd rather we fixed a bug first - we need to fix it on stable too - any
cleanups can come on top.  Also, I'm not sure what will this test tell
us: inl reads io space, not memory, right?

 
 Note that L2_BITS is shared between translate-all.c and exec.c only for 
 historical reasons (the translate-all.c code used to be in exec.c).  It 
 is probably a good idea to split them like this:

Want to test this and post properly?

 diff --git a/exec.c b/exec.c
 index 2e31ffc..3faea0e 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -88,7 +88,15 @@ struct PhysPageEntry {
  uint16_t ptr : 15;
  };
  
 -typedef PhysPageEntry Node[L2_SIZE];
 +/* Size of the L2 (and L3, etc) page tables.  */
 +#define ADDR_SPACE_BITS 64
 +
 +#define P_L2_BITS 10
 +#define P_L2_SIZE (1  P_L2_BITS)
 +
 +#define P_L2_LEVELS (((ADDR_SPACE_BITS - TARGET_PAGE_BITS - 1) / P_L2_BITS) 
 + 1)
 +
 +typedef PhysPageEntry Node[P_L2_SIZE];
  
  struct AddressSpaceDispatch {
  /* This is a multi-level map on the physical address space.
 @@ -155,7 +163,7 @@ static uint16_t phys_map_node_alloc(void)
  ret = next_map.nodes_nb++;
  assert(ret != PHYS_MAP_NODE_NIL);
  assert(ret != next_map.nodes_nb_alloc);
 -for (i = 0; i  L2_SIZE; ++i) {
 +for (i = 0; i  P_L2_SIZE; ++i) {
  next_map.nodes[ret][i].is_leaf = 0;
  next_map.nodes[ret][i].ptr = PHYS_MAP_NODE_NIL;
  }
 @@ -168,13 +176,13 @@ static void phys_page_set_level(PhysPageEntry *lp, 
 hwaddr *index,
  {
  PhysPageEntry *p;
  int i;
 -hwaddr step = (hwaddr)1  (level * L2_BITS);
 +hwaddr step = (hwaddr)1  (level * P_L2_BITS);
  
  if (!lp-is_leaf  lp-ptr == PHYS_MAP_NODE_NIL) {
  lp-ptr = phys_map_node_alloc();
  p = next_map.nodes[lp-ptr];
  if (level == 0) {
 -for (i = 0; i  L2_SIZE; i++) {
 +for (i = 0; i  P_L2_SIZE; i++) {
  p[i].is_leaf = 1;
  p[i].ptr = PHYS_SECTION_UNASSIGNED;
  }
 @@ -182,9 +190,9 @@ static void phys_page_set_level(PhysPageEntry *lp, hwaddr 
 *index,
  } else {
  p = next_map.nodes[lp-ptr];
  }
 -lp = p[(*index  (level * L2_BITS))  (L2_SIZE - 1)];
 +lp = p[(*index  (level * P_L2_BITS))  (P_L2_SIZE - 1)];
  
 -while (*nb  lp  p[L2_SIZE]) {
 +while (*nb  lp  p[P_L2_SIZE]) {
  if ((*index  (step - 1)) == 0  *nb = step) {
  lp-is_leaf = true;
  lp-ptr = leaf;
 @@ -218,7 +226,7 @@ static MemoryRegionSection *phys_page_find(PhysPageEntry 
 lp, hwaddr index,
  return sections[PHYS_SECTION_UNASSIGNED];
  }
  p = nodes[lp.ptr];
 -lp = p[(index  (i * L2_BITS))  (L2_SIZE - 1)];
 +lp = p[(index  (i * P_L2_BITS))  (P_L2_SIZE - 1)];
  }
  return sections[lp.ptr];
  }
 diff --git a/translate-all.c 

Re: [Qemu-devel] How to introduce bs-node_name ?

2013-11-04 Thread Kevin Wolf
Am 01.11.2013 um 16:12 hat Luiz Capitulino geschrieben:
 On Fri, 01 Nov 2013 08:59:20 -0600
 Eric Blake ebl...@redhat.com wrote:
 
  On 11/01/2013 08:51 AM, Luiz Capitulino wrote:
   On Wed, 30 Oct 2013 13:25:35 -0600
   Eric Blake ebl...@redhat.com wrote:
   
   On 10/30/2013 07:49 AM, Markus Armbruster wrote:
  
  
   The first proposal is to add another parameter, say id.  Users can
   then refer either to an arbitrary BDS by id, or (for backward
   compatibility) to the root BDS by device.  When the code sees
   device, it'll look up the BB, then fetch its root BDS.
  
   CON: Existing parameter device becomes compatibility cruft.
  
   PRO: Clean and obvious semantics (in my opinion).
  
   I like this one as well.
   
   Does this proposal makes device optional for existing commands? If it
   does then I'm afraid it breaks compatibility because if you don't
   specify a device you'll get an error today.
  
  Changing from error to success is not backwards-incompatible.  Old
  applications will ALWAYS supply device (because it used to be
  mandatory).  That is, a management application that was intentionally
  omitting 'device' and expecting an error is so unlikely to exist that we
  can consider such a management app as buggy.
 
 Doing such changes makes me nervous nevertheless. In my mind a stable
 API doesn't change. Of course that there might exceptions, but 99.9%
 of those exceptions should be bug fixes not deliberate API extensions.

Stable API means that whatever was working before keeps working. Nothing
less, but also nothing more.

If an extension that changes from error to success is breaking the API,
adding a new QMP command is breaking the API as well. Because sending an
unknown command means returning an error...

 A more compelling argument against it is the quality of the resulting
 command. I'm sure it's going to be anything but a simple, clean API.

I consider one command with an argument made optional a higher quality
API than having two different commands that do almost the same, except
that the older one has a mandatory argument where the newer one has an
optional argument.

Kevin



Re: [Qemu-devel] [PATCH] block: Avoid unecessary drv-bdrv_getlength() calls

2013-11-04 Thread Kevin Wolf
Am 04.11.2013 um 08:24 hat Fam Zheng geschrieben:
 
 
 On Tue 29 Oct 2013 08:12:38 PM CST, Kevin Wolf wrote:
 Am 29.10.2013 um 13:02 hat Paolo Bonzini geschrieben:
 Il 29/10/2013 12:35, Kevin Wolf ha scritto:
 The block layer generally keeps the size of an image cached in
 bs-total_sectors so that it doesn't have to perform expensive
 operations to get the size whenever it needs it.
 
 This doesn't work however when using a backend that can change its size
 without qemu being aware of it, i.e. passthrough of removable media like
 CD-ROMs or floppy disks. For this reason, the caching is disabled when a
 removable device is used.
 
 It is obvious that checking whether the _guest_ device has removable
 media isn't the right thing to do when we want to know whether the size
 of the host backend can change. To make things worse, non-top-level
 BlockDriverStates never have any device attached, which makes qemu
 assume they are removable, so drv-bdrv_getlength() is always called on
 the protocol layer. In the case of raw-posix, this causes unnecessary
 lseek() system calls, which turned out to be rather expensive.
 
 This patch completely changes the logic and disables bs-total_sectors
 caching only for certain block driver types, for which a size change is
 expected: host_cdrom and host_floppy; also the raw format in case it
 sits on top of one of these protocols, but in the common case the nested
 bdrv_getlength() call on the protocol driver will use the cache again
 and avoid an expensive drv-bdrv_getlength() call.
 
 Signed-off-by: Kevin Wolf kw...@redhat.com
 
 raw-win32.c probably needs to have a .has_variable_length=true in
 bdrv_host_device.  Apart from that,
 
 Reviewed-by: Paolo Bonzini pbonz...@redhat.com
 
 Thanks, good catch. I've added this now.
 
 This breaks VMDK because it can't read description file: buffer is
 empty in bdrv_probe, when the file size is 512 bytes. (for
 raw-posix).

Perhaps we should replace bs-total_sectors with bs-total_byts one
day... For now, how about introducing a bdrv_getlength_bytes() that
always calls the driver and never converts bytes to sectors?

Kevin



Re: [Qemu-devel] [PATCH v2] ppc: introduce CPUPPCState::cpu_dt_id

2013-11-04 Thread Alexey Kardashevskiy
On 11/04/2013 09:00 PM, Paolo Bonzini wrote:
 Il 01/11/2013 04:21, Alexey Kardashevskiy ha scritto:
 Normally CPUState::cpu_index is used to pick the right CPU for various
 operations. However default consecutive numbering does not always work
 for POWERPC.

 For example, on POWER7 (which supports 4 threads per core),
 -smp 8,threads=4 should create CPUs with indexes 0,1,2,3,4,5,6,7 and
 -smp 8,threads=1 should create CPUs with indexes 0,4,8,12,16,20,24,28.

 These indexes are reflected in /proc/device-tree/cpus/PowerPC,POWER7@XX
 and used to call KVM VCPU's ioctls. In order to achieve this,
 kvmppc_fixup_cpu() was introduced. Roughly speaking, it multiplies
 cpu_index by the number of threads per core.

 This approach has disadvantages such as:
 1. NUMA configuration stays broken after the fixup;
 2. CPU-related commands from QEMU Monitor do not work properly as
 the accept fixed CPU indexes and the user does not really know
 what they are after fixup as the number of threads per core changes
 between CPU versions and via QEMU command line.

 This introduces a new @cpu_dt_id field in the CPUPPCState struct which
 is set from @cpu_index by default but can be fixed later to the value
 which a hypervisor can accept. This also introduces two POWERPC-arch
 specific functions:
 1. int ppc_get_vcpu_dt_id(CPUState *cs) - returns a device-tree ID
 for a CPU;
 2. CPUState *ppc_get_vcpu_by_dt_id(int cpu_dt_id) - finds CPUState by
 a device-tree CPU ID.

 This uses the new functions to:
 1. fix emulated XICS hypercall handlers as they receive fixed CPU indexes;
 2. fix XICS-KVM to enable in-kernel XICS on right CPU;
 3. compose correct device-tree.

 This removes @cpu_index fixup as @cpu_dt_id is used instead so QEMU monitor
 can accept command-line CPU indexes again.

 Cc: Badari Pulavarty  pbad...@linux.vnet.ibm.com
 Cc: Paul Mackerras pau...@samba.org
 Cc: David Gibson da...@gibson.dropbear.id.au
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 Changes:
 v2:
 * added PPC-specific ppc_get_vcpu_dt_id() and ppc_get_vcpu_by_dt_id()
 * fixed kvm_arch_vcpu_id() to use ppc_get_vcpu_dt_id()
 * fixed emulated XICS
 * removed kvm_arch_vcpu_id() stub for non-KVM case
 
 Not having non-PPC code in the patch is definitely a good sign!
 
 Acked-by: Paolo Bonzini pbonz...@redhat.com


Heh. Mr. Graf has objections against v3 of this patch in another thread :)



-- 
Alexey



Re: [Qemu-devel] [PATCH] exec: limit system memory size

2013-11-04 Thread Paolo Bonzini
Il 04/11/2013 12:14, Michael S. Tsirkin ha scritto:
  
  This patch looks good; however, on top of it can you test 
  kvm-unit-tests with TARGET_PHYS_ADDR_SPACE_BITS=64 and see whether 
  there is a measurable slowdown (in the inl_from_qemu tests)?  If not, 
  we can just get rid of TARGET_PHYS_ADDR_SPACE_BITS in exec.c.
 I'd rather we fixed a bug first - we need to fix it on stable too - any
 cleanups can come on top.

This is not necessarily a cleanup.  Getting rid of
TARGET_PHYS_ADDR_SPACE_BITS in exec.c means fixing device-to-device DMA
bugs for example.

Of course a smaller patch can be done that avoids the renaming of L2_*
constants.

 Also, I'm not sure what will this test tell
 us: inl reads io space, not memory, right?

The number of levels in the dispatch radix tree is independent of the
size of the AddressSpace; it is P_L2_LEVELS for both the 64K io space
and the 2^TARGET_PHYS_ADDRESS_SPACE_BITS memory space.

Paolo



Re: [Qemu-devel] [PATCH] block: Avoid unecessary drv-bdrv_getlength() calls

2013-11-04 Thread Fam Zheng


On 11/04/2013 07:18 PM, Kevin Wolf wrote:

Am 04.11.2013 um 08:24 hat Fam Zheng geschrieben:


On Tue 29 Oct 2013 08:12:38 PM CST, Kevin Wolf wrote:

Am 29.10.2013 um 13:02 hat Paolo Bonzini geschrieben:

Il 29/10/2013 12:35, Kevin Wolf ha scritto:

The block layer generally keeps the size of an image cached in
bs-total_sectors so that it doesn't have to perform expensive
operations to get the size whenever it needs it.

This doesn't work however when using a backend that can change its size
without qemu being aware of it, i.e. passthrough of removable media like
CD-ROMs or floppy disks. For this reason, the caching is disabled when a
removable device is used.

It is obvious that checking whether the _guest_ device has removable
media isn't the right thing to do when we want to know whether the size
of the host backend can change. To make things worse, non-top-level
BlockDriverStates never have any device attached, which makes qemu
assume they are removable, so drv-bdrv_getlength() is always called on
the protocol layer. In the case of raw-posix, this causes unnecessary
lseek() system calls, which turned out to be rather expensive.

This patch completely changes the logic and disables bs-total_sectors
caching only for certain block driver types, for which a size change is
expected: host_cdrom and host_floppy; also the raw format in case it
sits on top of one of these protocols, but in the common case the nested
bdrv_getlength() call on the protocol driver will use the cache again
and avoid an expensive drv-bdrv_getlength() call.

Signed-off-by: Kevin Wolf kw...@redhat.com

raw-win32.c probably needs to have a .has_variable_length=true in
bdrv_host_device.  Apart from that,

Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Thanks, good catch. I've added this now.

This breaks VMDK because it can't read description file: buffer is
empty in bdrv_probe, when the file size is 512 bytes. (for
raw-posix).

Perhaps we should replace bs-total_sectors with bs-total_byts one
day... For now, how about introducing a bdrv_getlength_bytes() that
always calls the driver and never converts bytes to sectors?
I think it will work. And is it correct to round up total_bytes to 
BDRV_SECTOR_SIZE before storing to total_sectors? From my view it's 
better than losing some bytes by rounding down.


Fam




Re: [Qemu-devel] [PATCH RFC] blockdev: copy legacy and common opts to qemu_drive_opts

2013-11-04 Thread Kevin Wolf
Am 04.11.2013 um 08:01 hat Amos Kong geschrieben:
 Currently we have three QemuOptsList (qemu_common_drive_opts,
 qemu_legacy_drive_opts, and qemu_drive_opts), only qemu_drive_opts
 is added to vm_config_groups[].
 
 We query commandline options by checking information in
 vm_config_groups[], so we can only get a NULL parameter list now.
 
 This patch copied desc items of qemu_legacy_drive_opts and
 qemu_common_drive_opts to qemu_drive_opts.
 
 Signed-off-by: Amos Kong ak...@redhat.com

This breaks driver-specific options because they aren't (and cannot be)
listed in the QemuOptsList.

For example:

$ x86_64-softmmu/qemu-system-x86_64 -drive file.driver=nbd,file.host=localhost
qemu-system-x86_64: -drive file.driver=nbd,file.host=localhost: Invalid 
parameter 'file.driver'

query-command-line-options isn't an appropriate API to query the -drive
capabilities in the blockdev-add world. You really want to have
introspection for that.

For compatibility, we might want to at least expose part of the provided
options there. In this case you should modify the monitor command to
access the local QemuOptsLists of drive_init() and blockdev_init() for
option=drive.

Kevin



[Qemu-devel] [PATCH] migration: avoid starting a new migration task while the previous one still exist

2013-11-04 Thread Zhanghaoyu (A)
Avoid starting a new migration task while the previous one still exist.

Signed-off-by: Zeng Junliang zengjunli...@huawei.com
---
 migration.c |   34 ++
 1 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/migration.c b/migration.c
index 2b1ab20..ab4c439 100644
--- a/migration.c
+++ b/migration.c
@@ -40,8 +40,10 @@ enum {
 MIG_STATE_ERROR = -1,
 MIG_STATE_NONE,
 MIG_STATE_SETUP,
+MIG_STATE_CANCELLING,
 MIG_STATE_CANCELLED,
 MIG_STATE_ACTIVE,
+MIG_STATE_COMPLETING,
 MIG_STATE_COMPLETED,
 };
 
@@ -196,6 +198,8 @@ MigrationInfo *qmp_query_migrate(Error **errp)
 info-has_total_time = false;
 break;
 case MIG_STATE_ACTIVE:
+case MIG_STATE_CANCELLING:
+case MIG_STATE_COMPLETING:
 info-has_status = true;
 info-status = g_strdup(active);
 info-has_total_time = true;
@@ -282,6 +286,13 @@ void 
qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
 
 /* shared migration helpers */
 
+static void migrate_set_state(MigrationState *s, int old_state, int new_state)
+{
+if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) {
+trace_migrate_set_state(new_state);
+}
+}
+
 static void migrate_fd_cleanup(void *opaque)
 {
 MigrationState *s = opaque;
@@ -301,20 +312,18 @@ static void migrate_fd_cleanup(void *opaque)
 
 assert(s-state != MIG_STATE_ACTIVE);
 
-if (s-state != MIG_STATE_COMPLETED) {
+if (s-state != MIG_STATE_COMPLETING) {
 qemu_savevm_state_cancel();
+if (s-state == MIG_STATE_CANCELLING) {
+migrate_set_state(s, MIG_STATE_CANCELLING, MIG_STATE_CANCELLED); 
+}
+}else {
+migrate_set_state(s, MIG_STATE_COMPLETING, MIG_STATE_COMPLETED); 
 }
 
 notifier_list_notify(migration_state_notifiers, s);
 }
 
-static void migrate_set_state(MigrationState *s, int old_state, int new_state)
-{
-if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) {
-trace_migrate_set_state(new_state);
-}
-}
-
 void migrate_fd_error(MigrationState *s)
 {
 DPRINTF(setting error state\n);
@@ -328,7 +337,7 @@ static void migrate_fd_cancel(MigrationState *s)
 {
 DPRINTF(cancelling migration\n);
 
-migrate_set_state(s, s-state, MIG_STATE_CANCELLED);
+migrate_set_state(s, s-state, MIG_STATE_CANCELLING);
 }
 
 void add_migration_state_change_notifier(Notifier *notify)
@@ -405,7 +414,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 params.blk = has_blk  blk;
 params.shared = has_inc  inc;
 
-if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP) {
+if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP ||
+s-state == MIG_STATE_COMPLETING || s-state == MIG_STATE_CANCELLING) {
 error_set(errp, QERR_MIGRATION_ACTIVE);
 return;
 }
@@ -594,7 +604,7 @@ static void *migration_thread(void *opaque)
 }
 
 if (!qemu_file_get_error(s-file)) {
-migrate_set_state(s, MIG_STATE_ACTIVE, 
MIG_STATE_COMPLETED);
+migrate_set_state(s, MIG_STATE_ACTIVE, 
MIG_STATE_COMPLETING);
 break;
 }
 }
@@ -634,7 +644,7 @@ static void *migration_thread(void *opaque)
 }
 
 qemu_mutex_lock_iothread();
-if (s-state == MIG_STATE_COMPLETED) {
+if (s-state == MIG_STATE_COMPLETING) {
 int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 s-total_time = end_time - s-total_time;
 s-downtime = end_time - start_time;
-- 
1.7.3.1.msysgit.0

BTW, while error happened during migration, need the erroring state to avoid 
starting a new migration task while current migration task still exist?
And, do the new added migration states need to be reported to libvirt?



Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Alexander Graf

On 04.11.2013, at 11:55, Benjamin Herrenschmidt b...@kernel.crashing.org 
wrote:

 On Mon, 2013-11-04 at 11:44 +0100, Alexander Graf wrote:
 On 01.11.2013, at 11:21, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 
 SLOF gets really confused if RTAS/device-tree and everything else
 what SLOF can use is not in the very first block of the very first
 memory node.
 
 This makes sure that the RMA area is where SLOF expects it to be.
 
 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: Nikunj A Dadhania nik...@linux.vnet.ibm.com
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 hw/ppc/spapr.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)
 
 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
 index 09dc635..09a5d94 100644
 --- a/hw/ppc/spapr.c
 +++ b/hw/ppc/spapr.c
 @@ -1113,7 +1113,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
int i;
MemoryRegion *sysmem = get_system_memory();
MemoryRegion *ram = g_new(MemoryRegion, 1);
 -hwaddr rma_alloc_size;
 +hwaddr rma_alloc_size, node0_size;
uint32_t initrd_base = 0;
long kernel_size = 0, initrd_size = 0;
long load_limit, rtas_limit, fw_size;
 @@ -1154,6 +1154,12 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
spapr-rma_size = MIN(spapr-rma_size, 0x1000);
}
}
 +/*
 + * SLOF gets confused if RMA resides not in the first block
 + * of the first memory node so let's fix it.
 + */
 +node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
 +spapr-rma_size = MIN(spapr-rma_size, node0_size);
 
 So if I create a NUMA node of 4MB that will be my RMA? That sounds pretty 
 broken, especially on 970.
 
 Why does SLOF have any issues with NUMA memory nodes? It can just ignore 
 them, no?
 
 Because the only way SLOF knows about the RMA is by using the first
 reg entry of the first memory node and that's *all* SLOF knows about.
 
 If we start putting things like the DT, SLOF itself, etc... outside of
 that region, it will crash.
 
 So we constrain things to the rma that way.
 
 Creating 4M nodes makes no sense anyway

So why don't we just use the limit VRMA to 256MB code always and error out of 
node0 is smaller? I don't think SLOF can run with less than 256MB anyway.


Alex




Re: [Qemu-devel] [PATCH] migration: avoid starting a new migration task while the previous one still exist

2013-11-04 Thread Paolo Bonzini
Il 04/11/2013 12:26, Zhanghaoyu (A) ha scritto:
 Avoid starting a new migration task while the previous one still exist.

Can you explain how to reproduce the problem?

Also please use pbonz...@redhat.com instead.  My Gmail address is an
implementation detail. :)

 Signed-off-by: Zeng Junliang zengjunli...@huawei.com

It looks like the author of the patch is not the same as you.  If so,
you need to make Zeng Junliang the author (using --author on the git
commit command line) and add your own signoff line.

Paolo

 ---
  migration.c |   34 ++
  1 files changed, 22 insertions(+), 12 deletions(-)
 
 diff --git a/migration.c b/migration.c
 index 2b1ab20..ab4c439 100644
 --- a/migration.c
 +++ b/migration.c
 @@ -40,8 +40,10 @@ enum {
  MIG_STATE_ERROR = -1,
  MIG_STATE_NONE,
  MIG_STATE_SETUP,
 +MIG_STATE_CANCELLING,
  MIG_STATE_CANCELLED,
  MIG_STATE_ACTIVE,
 +MIG_STATE_COMPLETING,
  MIG_STATE_COMPLETED,
  };
  
 @@ -196,6 +198,8 @@ MigrationInfo *qmp_query_migrate(Error **errp)
  info-has_total_time = false;
  break;
  case MIG_STATE_ACTIVE:
 +case MIG_STATE_CANCELLING:
 +case MIG_STATE_COMPLETING:
  info-has_status = true;
  info-status = g_strdup(active);
  info-has_total_time = true;
 @@ -282,6 +286,13 @@ void 
 qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
  
  /* shared migration helpers */
  
 +static void migrate_set_state(MigrationState *s, int old_state, int 
 new_state)
 +{
 +if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) {
 +trace_migrate_set_state(new_state);
 +}
 +}
 +
  static void migrate_fd_cleanup(void *opaque)
  {
  MigrationState *s = opaque;
 @@ -301,20 +312,18 @@ static void migrate_fd_cleanup(void *opaque)
  
  assert(s-state != MIG_STATE_ACTIVE);
  
 -if (s-state != MIG_STATE_COMPLETED) {
 +if (s-state != MIG_STATE_COMPLETING) {
  qemu_savevm_state_cancel();
 +if (s-state == MIG_STATE_CANCELLING) {
 +migrate_set_state(s, MIG_STATE_CANCELLING, MIG_STATE_CANCELLED); 
 +}
 +}else {
 +migrate_set_state(s, MIG_STATE_COMPLETING, MIG_STATE_COMPLETED); 
  }
  
  notifier_list_notify(migration_state_notifiers, s);
  }
  
 -static void migrate_set_state(MigrationState *s, int old_state, int 
 new_state)
 -{
 -if (atomic_cmpxchg(s-state, old_state, new_state) == new_state) {
 -trace_migrate_set_state(new_state);
 -}
 -}
 -
  void migrate_fd_error(MigrationState *s)
  {
  DPRINTF(setting error state\n);
 @@ -328,7 +337,7 @@ static void migrate_fd_cancel(MigrationState *s)
  {
  DPRINTF(cancelling migration\n);
  
 -migrate_set_state(s, s-state, MIG_STATE_CANCELLED);
 +migrate_set_state(s, s-state, MIG_STATE_CANCELLING);
  }
  
  void add_migration_state_change_notifier(Notifier *notify)
 @@ -405,7 +414,8 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
  params.blk = has_blk  blk;
  params.shared = has_inc  inc;
  
 -if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP) {
 +if (s-state == MIG_STATE_ACTIVE || s-state == MIG_STATE_SETUP ||
 +s-state == MIG_STATE_COMPLETING || s-state == 
 MIG_STATE_CANCELLING) {
  error_set(errp, QERR_MIGRATION_ACTIVE);
  return;
  }
 @@ -594,7 +604,7 @@ static void *migration_thread(void *opaque)
  }
  
  if (!qemu_file_get_error(s-file)) {
 -migrate_set_state(s, MIG_STATE_ACTIVE, 
 MIG_STATE_COMPLETED);
 +migrate_set_state(s, MIG_STATE_ACTIVE, 
 MIG_STATE_COMPLETING);
  break;
  }
  }
 @@ -634,7 +644,7 @@ static void *migration_thread(void *opaque)
  }
  
  qemu_mutex_lock_iothread();
 -if (s-state == MIG_STATE_COMPLETED) {
 +if (s-state == MIG_STATE_COMPLETING) {
  int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
  s-total_time = end_time - s-total_time;
  s-downtime = end_time - start_time;
 




Re: [Qemu-devel] How to introduce bs-node_name ?

2013-11-04 Thread Fam Zheng


On 11/04/2013 07:06 PM, Kevin Wolf wrote:

Am 04.11.2013 um 10:48 hat Fam Zheng geschrieben:

On 11/04/2013 05:31 PM, Stefan Hajnoczi wrote:

On Wed, Oct 30, 2013 at 02:49:32PM +0100, Markus Armbruster wrote:

The first proposal is to add another parameter, say id.  Users can
then refer either to an arbitrary BDS by id, or (for backward
compatibility) to the root BDS by device.  When the code sees
device, it'll look up the BB, then fetch its root BDS.

CON: Existing parameter device becomes compatibility cruft.

PRO: Clean and obvious semantics (in my opinion).

This proposal gets my vote.


The second proposal is to press the existing parameter device into
service for referring to BDS node_name.

To keep backward compatibility, we obviously need to ensure that
whenever the old code accepts a value of device, the new code accepts
it as well, and both resolve it to the same BDS.

Different legacy commands given the same device name might need to
operate on different nodes.

Could you give an example for this?



Dynamic renaming does not solve this
problem, so I'm not convinced we can always choose a device name
matching a node name.

Device name commands are higher-level than graph node commands.  For
example, block_set_io_throttle makes sense on a device but less sense on
a graph node, unless we add the implicit assumption that the new
throttling node is created on top of the given node or updated in place
if the throttling node already exists (!!).

Throttling a node could be useful too, for example if we want to
throttle backing_hd which is on shared storage, but not to throttle
on the local image.

My ignorant question is: Why can't we just use one namespace, make
sure no name collision between node_name and device_name, or even
just drop device_name, so we treat the root node's node_name as
device_name? For commands that only accept a device, this can be
enforced in its implementation by checking against the whole graph
to verify this.

Markus described it somewhere in this thread: Live snapshots.
Currently, the device_name moves to the new BDS on the top (and
compatibility requires us to keep it that way), whereas a node name
should, of course, stay at its node.

When you consider this, the single namespace, as much as I would have
loved it, is pretty much dead.


Thanks for explaining (again). I get the reason now.

Fam



Re: [Qemu-devel] [PATCH v8 00/19] VHDX log replay and write support, .bdrv_create()

2013-11-04 Thread Kevin Wolf
Am 01.11.2013 um 16:32 hat Jeff Cody geschrieben:
 On Thu, Oct 31, 2013 at 02:10:48PM +0100, Stefan Hajnoczi wrote:
  On Wed, Oct 30, 2013 at 10:44:37AM -0400, Jeff Cody wrote:
   This patch series contains the initial VHDX log parsing, replay,
   write support, and image creation.
   
   === v8 changes ===
   https://github.com/codyprime/qemu-kvm-jtc/tree/vhdx-write-v7-upstream
   
   Rebased to latest qemu/master
   
   Patch 10/19: * Added comments for bdrv_flush() (Stefan)
   
   Patch 11/19: * Added qemu_iovec_destroy(hd_qiov) (Stefan)
* On certain _writev errors, restore BAT cache (Stefan)
   
   Patch 16/19: * Replaced fprintf(stderr,...) with error_setg_errno() 
   (Stefan)
   
   Patch 18/19: * Added filter for block_state_zero in qemu-iotest/common.rc
   
   Patch 19/19: * Moved log replay test name to 068 (part of rebase to 
   master)
   
   === v7 changes ===
   https://github.com/codyprime/qemu-kvm-jtc/tree/vhdx-write-v7-upstream
   
   Rebased to latest qemu/master (picked up vhdx r/o tests, migration 
   blocker)
   
   Patch  8/19: * validate log descriptor_count (Stefan)
* fix typos in comments (Stefan)
* Removed unneccessary initialization (Stefan)
* Replay log prior to metadata (Stefan)
* In vhdx_log_flush(), call bdrv_flush() prior to zeroing
  out the log guid in the header.
* In vhdx_close(), set freed pointers to NULL

   Patch  9/19: * correct logic for region overlap (Stefan)

   Patch 10/19: * add missing goto exit in error case (Stefan)
* add bdrv_flush() to ensure data is stable on disk (Stefan)
   
   Patch 11/19: * fixed typos in comments (Stefan)
* QEMU coding style changes (Stefan)
* rename bat_entry to bat_entry_le for clarity (Stefan)
* Add PAYLOAD_BLOCK_ZERO explicit zero padding for
  protocols that do not support zero init (Stefan)
* rename PAYLOAD_BLOCK_FULL_PRESENT to 
  PAYLOAD_BLOCK_FULLY_PRESENT (Stefan)
   
   Patch 13/19: * Fixed typo in commit message (Stefan)
   
   Patch 19/19: * New, adds qemu-io test for log replay of data sector
   
   v6 Patch 17/20: * Dropped (already upstream)
   v6 Patch 18/20: * Dropped (already upstream)
   
   
   
   === v6 changes ===
   https://github.com/codyprime/qemu-kvm-jtc/tree/vhdx-write-v6-upstream
   
   Rebased to latest qemu/master:
   
   Patch 16/20: .bdrv_create() propagates Error, and bdrv_unref() used
instead of bdrv_delete().
   
   Patch 17  18 are already included in another series:
   [PATCH v3 0/3] qemu-iotests with sample images, vhdx test, cleanup
   
   They are included here to provide a base for patches 19  20.  If the 
   above
   series is applied before this series, then patches 17 and 18 can be 
   ignored.
   
   Patch 19/20: In qemu-io tests _make_test_img(), filter out vhdx-specific
options for .bdrv_create().
   
   Patch 20/20: Add VHDX write test case to 064.
   
   
   === v5 changes ===
   
   v5 is also available for testing from:
   https://github.com/codyprime/qemu-kvm-jtc/tree/vhdx-write-v5-upstream
   
   Most of the patches from v4 - v5 are the same, but there are a few 
   differences
   and a few new patches.  Here is a summary of which patches are different 
   and/or
   new:
   
   Patch highlights:
   
   Patch 7  just some minor code movement, in prep for changes in patch 8
   
   Patch 8  incorporates review feedback from Stefan, for the previous Patch 
   7
in v4.
   
   Patch 9  adds region checking for log, region table, and metadata tables, 
   per
suggestion from Stefan.
   
   Patch 10 minor change from changes made in 8/16 (vhdx_guid_is_zero() is 
   gone)
   
   Patch 12 is just some minor housekeeping, to get rid of bit shifting that
doesn't need to happen.
   
   
   
   === v4 changes ===  
   
   v4 patches are available from github as well, on branch 
   vhdx-write-v4-upstream:
   https://github.com/codyprime/qemu-kvm-jtc/tree/vhdx-write-v4-upstream
   https://github.com/codyprime/qemu-kvm-jtc.git
   
   Those in the midst of reviewing v3, don't fear - the only changes with v4 
   is
   the addition of patches on the end of the series (patches 10-13).  These
   patches enable creating VHDX images.  Image files created have been
   (briefly  lightly) tested on Hyper-V running on Windows Server 2012.
   
   Some of the new patches could be squashed with earlier patches in the 
   series,
   but I refrained from doing so, since some of the patches have already been
   reviewed, and others are in the midst of review.  I want to make it as 
   easy
   as possible on those currently reviewing. There is nothing critical
   that needs to be pushed into the earlier patches.
   
   New patches:
   
   Patch 10:  Breaks out some more endian translation 

Re: [Qemu-devel] [libvirt] Libvirt support of mac-programming over macvtap planned?

2013-11-04 Thread Viktor Mihajlovski
On 11/04/2013 08:34 AM, Amos Kong wrote:
 (I'm curious where your original query was sent - I didn't see it in my
 libvir-list nor my qemu-devel folders...)
currently, all the RedHat mailing lists seem to gobble up emails
from the linux.vnet.ibm.com domain, I have tried to report that
via the mailman interface with no luck so far

-- 

Mit freundlichen Grüßen/Kind Regards
   Viktor Mihajlovski

IBM Deutschland Research  Development GmbH
Vorsitzender des Aufsichtsrats: Martina Köderitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294




Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Thomas Huth
On Mon, 4 Nov 2013 12:28:12 +0100
Alexander Graf ag...@suse.de wrote:

 
 On 04.11.2013, at 11:55, Benjamin Herrenschmidt b...@kernel.crashing.org 
 wrote:
 
  On Mon, 2013-11-04 at 11:44 +0100, Alexander Graf wrote:
  On 01.11.2013, at 11:21, Alexey Kardashevskiy a...@ozlabs.ru wrote:
  
  SLOF gets really confused if RTAS/device-tree and everything else
  what SLOF can use is not in the very first block of the very first
  memory node.
  
  This makes sure that the RMA area is where SLOF expects it to be.
  
  Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
  Cc: Nikunj A Dadhania nik...@linux.vnet.ibm.com
  Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
  ---
  hw/ppc/spapr.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)
  
  diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
  index 09dc635..09a5d94 100644
  --- a/hw/ppc/spapr.c
  +++ b/hw/ppc/spapr.c
  @@ -1113,7 +1113,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs 
  *args)
 int i;
 MemoryRegion *sysmem = get_system_memory();
 MemoryRegion *ram = g_new(MemoryRegion, 1);
  -hwaddr rma_alloc_size;
  +hwaddr rma_alloc_size, node0_size;
 uint32_t initrd_base = 0;
 long kernel_size = 0, initrd_size = 0;
 long load_limit, rtas_limit, fw_size;
  @@ -1154,6 +1154,12 @@ static void ppc_spapr_init(QEMUMachineInitArgs 
  *args)
 spapr-rma_size = MIN(spapr-rma_size, 0x1000);
 }
 }
  +/*
  + * SLOF gets confused if RMA resides not in the first block
  + * of the first memory node so let's fix it.
  + */
  +node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
  +spapr-rma_size = MIN(spapr-rma_size, node0_size);
  So if I create a NUMA node of 4MB that will be my RMA? That sounds pretty 
  broken, especially on 970.
  
  Why does SLOF have any issues with NUMA memory nodes? It can just ignore 
  them, no?
  
  Because the only way SLOF knows about the RMA is by using the first
  reg entry of the first memory node and that's *all* SLOF knows about.
  
  If we start putting things like the DT, SLOF itself, etc... outside of
  that region, it will crash.

Ok, the question is whether this is a bug in SLOF and should be fixed
there or whether the RMA should really be limited to the RAM of the
first node only.

Looking at the function spapr_populate_memory(), it seems there is
already similar code there, so I assume the RMA should really be
limited to that size:

/* memory node(s) */
node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
if (spapr-rma_size  node0_size) {
spapr-rma_size = node0_size;
}

Maybe this piece of code could just be done earlier instead, before
setting up the fdt_addr and rtas_addr variables, instead of adding the
similar piece of code of this patch?

  So we constrain things to the rma that way.
  
  Creating 4M nodes makes no sense anyway
 
 So why don't we just use the limit VRMA to 256MB code always and error out 
 of node0 is smaller? I don't think SLOF can run with less than 256MB anyway.

It's 128 MB nowadays ... there is even a define called MIN_RMA_SLOF for
this in the code already.

 Thomas




Re: [Qemu-devel] [PATCH] exec: limit system memory size

2013-11-04 Thread Michael S. Tsirkin
On Mon, Nov 04, 2013 at 12:22:35PM +0100, Paolo Bonzini wrote:
 Il 04/11/2013 12:14, Michael S. Tsirkin ha scritto:
   
   This patch looks good; however, on top of it can you test 
   kvm-unit-tests with TARGET_PHYS_ADDR_SPACE_BITS=64 and see whether 
   there is a measurable slowdown (in the inl_from_qemu tests)?  If not, 
   we can just get rid of TARGET_PHYS_ADDR_SPACE_BITS in exec.c.
  I'd rather we fixed a bug first - we need to fix it on stable too - any
  cleanups can come on top.
 
 This is not necessarily a cleanup.  Getting rid of
 TARGET_PHYS_ADDR_SPACE_BITS in exec.c means fixing device-to-device DMA
 bugs for example.
 
 Of course a smaller patch can be done that avoids the renaming of L2_*
 constants.
 
  Also, I'm not sure what will this test tell
  us: inl reads io space, not memory, right?
 
 The number of levels in the dispatch radix tree is independent of the
 size of the AddressSpace; it is P_L2_LEVELS for both the 64K io space
 and the 2^TARGET_PHYS_ADDRESS_SPACE_BITS memory space.
 
 Paolo

Hmm I think it's *at most* that deep but can be more shallow, no?

-- 
MST



Re: [Qemu-devel] [PATCH] exec: limit system memory size

2013-11-04 Thread Paolo Bonzini
Il 04/11/2013 13:04, Michael S. Tsirkin ha scritto:
Also, I'm not sure what will this test tell
us: inl reads io space, not memory, right?
  
  The number of levels in the dispatch radix tree is independent of the
  size of the AddressSpace; it is P_L2_LEVELS for both the 64K io space
  and the 2^TARGET_PHYS_ADDRESS_SPACE_BITS memory space.
 
 Hmm I think it's *at most* that deep but can be more shallow, no?

Yes, but it gets more shallow only if you have very large regions (which
is obviously not the case for io space).  The levels always cover the
same bit range (e.g. bits 42-51 for the first level on x86).

Paolo



[Qemu-devel] [PATCH 1/2] pc: add etc/e820 fw_cfg file

2013-11-04 Thread Gerd Hoffmann
Unlike the existing FW_CFG_E820_TABLE entry which carries reservations
only the new etc/e820 file also has entries for RAM.

Format is simliar to the FW_CFG_E820_TABLE, it is a simple list of
e820_entry structs.  Unlike FW_CFG_E820_TABLE it has no count though
as the number of entries can be figured from the file size.

Cc: Andrea Arcangeli aarca...@redhat.com
Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/i386/pc.c | 39 ---
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index dee409d..a653ae4 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -90,7 +90,9 @@ struct e820_table {
 struct e820_entry entry[E820_NR_ENTRIES];
 } QEMU_PACKED __attribute((__aligned__(4)));
 
-static struct e820_table e820_table;
+static struct e820_table e820_reserve;
+static struct e820_entry *e820_table;
+static unsigned e820_entries;
 struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX};
 
 void gsi_handler(void *opaque, int n, int level)
@@ -577,19 +579,32 @@ static void handle_a20_line_change(void *opaque, int irq, 
int level)
 
 int e820_add_entry(uint64_t address, uint64_t length, uint32_t type)
 {
-int index = le32_to_cpu(e820_table.count);
+int index = le32_to_cpu(e820_reserve.count);
 struct e820_entry *entry;
 
-if (index = E820_NR_ENTRIES)
-return -EBUSY;
-entry = e820_table.entry[index++];
+if (type != E820_RAM) {
+/* old FW_CFG_E820_TABLE entry -- reservations only */
+if (index = E820_NR_ENTRIES) {
+return -EBUSY;
+}
+entry = e820_reserve.entry[index++];
+
+entry-address = cpu_to_le64(address);
+entry-length = cpu_to_le64(length);
+entry-type = cpu_to_le32(type);
+
+e820_reserve.count = cpu_to_le32(index);
+}
 
-entry-address = cpu_to_le64(address);
-entry-length = cpu_to_le64(length);
-entry-type = cpu_to_le32(type);
+/* new etc/e820 file -- include ram too */
+e820_table = g_realloc(e820_table,
+   sizeof(struct e820_entry) * (e820_entries+1));
+e820_table[e820_entries].address = cpu_to_le64(address);
+e820_table[e820_entries].length = cpu_to_le64(length);
+e820_table[e820_entries].type = cpu_to_le32(type);
+e820_entries++;
 
-e820_table.count = cpu_to_le32(index);
-return index;
+return e820_entries;
 }
 
 /* Calculates the limit to CPU APIC ID values
@@ -640,7 +655,9 @@ static FWCfgState *bochs_bios_init(void)
 fw_cfg_add_bytes(fw_cfg, FW_CFG_SMBIOS_ENTRIES,
  smbios_table, smbios_len);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE,
- e820_table, sizeof(e820_table));
+ e820_reserve, sizeof(e820_reserve));
+fw_cfg_add_file(fw_cfg, etc/e820, e820_table,
+sizeof(struct e820_entry) * e820_entries);
 
 fw_cfg_add_bytes(fw_cfg, FW_CFG_HPET, hpet_cfg, sizeof(hpet_cfg));
 /* allocate memory for the NUMA channel: one (64bit) word for the number
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH] pc: disable acpi info for isapc and old pc machine

2013-11-04 Thread Paolo Bonzini
Il 04/11/2013 11:46, Michael S. Tsirkin ha scritto:
 Disable acpi build for isapc and no_kvmclock machine
 types (used by xen), since acpi build currently expects pci.

Xen is now using -M pc, so it's okay to remove it from the legacy -M
xenfv.

 Reported-by: Andreas Färber afaer...@suse.de
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/i386/pc_piix.c | 2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
 index 24a98cb..4fdb7b6 100644
 --- a/hw/i386/pc_piix.c
 +++ b/hw/i386/pc_piix.c
 @@ -309,6 +309,7 @@ static void pc_init_pci_1_2(QEMUMachineInitArgs *args)
  static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs *args)
  {
  has_pci_info = false;
 +has_acpi_build = false;
  disable_kvm_pv_eoi();
  enable_compat_apic_id_mode();
  pc_init1(args, 1, 0);
 @@ -317,6 +318,7 @@ static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs 
 *args)
  static void pc_init_isa(QEMUMachineInitArgs *args)
  {
  has_pci_info = false;
 +has_acpi_build = false;
  if (!args-cpu_model) {
  args-cpu_model = 486;
  }
 

Reviewed-by: Paolo Bonzini pbonz...@redhat.com



Re: [Qemu-devel] [PATCH] exec: limit system memory size

2013-11-04 Thread Paolo Bonzini
Il 04/11/2013 07:06, Michael S. Tsirkin ha scritto:
 The page table logic in exec.c assumes
 that memory addresses are at most TARGET_PHYS_ADDR_SPACE_BITS.
 
 But pci addresses are full 64 bit so if we try to render them ignoring
 the extra bits, we get strange effects with sections overlapping each
 other.
 
 To fix, simply limit the system memory size to
  1  TARGET_PHYS_ADDR_SPACE_BITS,
 pci addresses will be rendered within that.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  exec.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)
 
 diff --git a/exec.c b/exec.c
 index 030118e..c7a8df5 100644
 --- a/exec.c
 +++ b/exec.c
 @@ -1801,7 +1801,12 @@ void address_space_destroy_dispatch(AddressSpace *as)
  static void memory_map_init(void)
  {
  system_memory = g_malloc(sizeof(*system_memory));
 -memory_region_init(system_memory, NULL, system, INT64_MAX);
 +
 +assert(TARGET_PHYS_ADDR_SPACE_BITS = 64);
 +
 +memory_region_init(system_memory, NULL, system,
 +   TARGET_PHYS_ADDR_SPACE_BITS == 64 ?
 +   UINT64_MAX : (0x1ULL  TARGET_PHYS_ADDR_SPACE_BITS));
  address_space_init(address_space_memory, system_memory, memory);
  
  system_io = g_malloc(sizeof(*system_io));
 

You can include either this patch or Marcel's with my Reviewed-by: Paolo
Bonzini pbonz...@redhat.com.  I don't have any preference.


Paolo



[Qemu-devel] [PULL for-1.7 0/2] pc: e820 fw_cfg fixup.

2013-11-04 Thread Gerd Hoffmann
  Hi,

After some discussion with Andrea Arcangeli the initial
approach of extending the FW_CFG_E820_TABLE was scratched.
Unfortunaly that patch wasn't formally NACK'ed and became
commit 0624c7f916b4d97f17726d9b295d6a6b0dc5076d, so we
need to fixup the mess now, before 1.7.

The new approach is to leave FW_CFG_E820_TABLE alone and
add a new fw_cfg file instead, to avoid any compatibility
issues for sure (patch thread here:
http://comments.gmane.org/gmane.comp.emulators.qemu/238860).

This pull is the new-approach patch series rebased to latest
master (i.e. solve the conflict with
0624c7f916b4d97f17726d9b295d6a6b0dc5076d).

I suggest to pull this to fix things up.  The alternative is
to revert 0624c7f916b4d97f17726d9b295d6a6b0dc5076d and delay
the new interface to the 1.8 devel cycle.

cheers,
  Gerd


The following changes since commit a126050a103c924b03388a9a64ce9af8c96b0969:

  Merge remote-tracking branch 'kwolf/tags/for-anthony' into staging 
(2013-10-31 17:02:26 +0100)

are available in the git repository at:


  git://git.kraxel.org/qemu e820.1

for you to fetch changes up to 7db16f2480db5e246d34d0c453cff4f58549df0e:

  pc: register e820 entries for ram (2013-11-04 12:31:33 +0100)


Gerd Hoffmann (2):
  pc: add etc/e820 fw_cfg file
  pc: register e820 entries for ram

 hw/i386/pc.c | 47 +--
 1 file changed, 29 insertions(+), 18 deletions(-)



Re: [Qemu-devel] [PATCH] kvm: Add a new machine property kvm_type

2013-11-04 Thread Alexander Graf

On 07.10.2013, at 18:53, Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com 
wrote:

 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Targets like ppc64 support different typed of KVM, one which use
 hypervisor mode and the other which doesn't. Add a new machine
 property kvm_type that helps in selecting the respective ones
 We also add a new QEMUMachine callback get_vm_type that helps
 in mapping the string representation of kvm type specified.
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 ---
 hw/ppc/e500plat.c  |  2 ++
 hw/ppc/mac_newworld.c  |  2 ++
 hw/ppc/mac_oldworld.c  |  2 ++
 hw/ppc/ppc440_bamboo.c |  2 ++
 hw/ppc/spapr.c | 19 +++
 hw/ppc/vmtype.h| 18 ++
 include/hw/boards.h|  3 +++
 include/hw/xen/xen.h   |  3 ++-
 include/sysemu/kvm.h   |  4 ++--
 include/sysemu/qtest.h |  5 +++--
 kvm-all.c  | 16 +---
 kvm-stub.c |  4 +++-
 qtest.c|  2 +-
 vl.c   | 17 +++--
 xen-all.c  |  2 +-
 xen-stub.c |  2 +-
 16 files changed, 85 insertions(+), 18 deletions(-)
 create mode 100644 hw/ppc/vmtype.h
 
 diff --git a/hw/ppc/e500plat.c b/hw/ppc/e500plat.c
 index 2e964b2..3e53e85 100644
 --- a/hw/ppc/e500plat.c
 +++ b/hw/ppc/e500plat.c
 @@ -17,6 +17,7 @@
 #include hw/pci/pci.h
 #include hw/ppc/openpic.h
 #include kvm_ppc.h
 +#include vmtype.h
 
 static void e500plat_fixup_devtree(PPCE500Params *params, void *fdt)
 {
 @@ -51,6 +52,7 @@ static QEMUMachine e500plat_machine = {
 .desc = generic paravirt e500 platform,
 .init = e500plat_init,
 .max_cpus = 32,
 +.get_vm_type = pr_get_vm_type,

It should be called kvm_type, like the machine opt.

Apart from that I like the patch :). But it needs to be ack'ed by Gleb / Paolo.


Alex




Re: [Qemu-devel] [PATCH] uas: Fix response iu struct definition

2013-11-04 Thread Gerd Hoffmann
On Do, 2013-10-31 at 10:35 +0100, Hans de Goede wrote:
 This patch mirrors a patch to the Linux uas kernel driver which I've just
 submitted. It looks like the qemu uas struct definitions were taken from
 the Linux kernel driver, and have inherited the same mistake.
 
 Besides fixing the response iu struct, the patch also drops the add_info
 parameter from the usb_uas_queue_response() function, it is always 0 anyways,
 and expressing 3 zero-bytes as a function argument is a bit hard.

Added to usb patch queue, together with the other uas patch series.

thanks,
  Gerd






[Qemu-devel] [PATCH 2/2] pc: register e820 entries for ram

2013-11-04 Thread Gerd Hoffmann
So RAM shows up in the new etc/e820 fw_cfg file.

Cc: Andrea Arcangeli aarca...@redhat.com
Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
 hw/i386/pc.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index a653ae4..12c436e 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1174,13 +1174,7 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory,
 memory_region_init_alias(ram_below_4g, NULL, ram-below-4g, ram,
  0, below_4g_mem_size);
 memory_region_add_subregion(system_memory, 0, ram_below_4g);
-if (0) {
-/*
- * Ideally we should do that too, but that would ruin the e820
- * reservations added by seabios before initializing fw_cfg.
- */
-e820_add_entry(0, below_4g_mem_size, E820_RAM);
-}
+e820_add_entry(0, below_4g_mem_size, E820_RAM);
 if (above_4g_mem_size  0) {
 ram_above_4g = g_malloc(sizeof(*ram_above_4g));
 memory_region_init_alias(ram_above_4g, NULL, ram-above-4g, ram,
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH 0/2 v2] pc: inform SeaBIOS where 64-bit PCI hole begins

2013-11-04 Thread Gerd Hoffmann
  Hi,

  So maybe design that with memory hotplug in mind?  Such as adding a new
  qemu-specific type QEMU_RAM_HOTPLUG?  Which seabios could use to reserve
  the memory (but not add it to the e820 table for the guest)?
 It will do job too. But extending semantics of standard table would be
 confusing. Yes, Seabios will filter it out but it doesn't make table
 less confusing.

Was just thinking that it might be easier that way if we need e820
entries for hotplug memory address space _anyway_.

 I'd prefer having a dedicated interface for it as a more clean solution.

Agree.

cheers,
  Gerd





Re: [Qemu-devel] [PATCH] qapi: Fix comment for create-type to match code.

2013-11-04 Thread Stefan Hajnoczi
On Fri, Nov 01, 2013 at 05:35:29PM +0800, Fam Zheng wrote:
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  qapi-schema.json | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan



Re: [Qemu-devel] [PATCH] vnc: Fix qemu crash on vnc client disconnection

2013-11-04 Thread Gerd Hoffmann
On Mo, 2013-10-28 at 08:47 +, Gonglei (Arei) wrote:
  -Original Message-
  From: Gerd Hoffmann [mailto:kra...@redhat.com]
  Sent: Monday, October 28, 2013 3:53 PM
  To: Gonglei (Arei)
  Cc: qemu-devel@nongnu.org; Stefan Hajnoczi; Yanqiangjun; Luonengjun;
  Huangweidong (Hardware)
  Subject: Re: [Qemu-devel] [PATCH] vnc: Fix qemu crash on vnc client
  disconnection
  
Hi,
  
   diff --git a/ui/vnc.c b/ui/vnc.c
   index 5601cc3..2177704 100644
   --- a/ui/vnc.c
   +++ b/ui/vnc.c
   @@ -876,7 +876,8 @@ static int find_and_clear_dirty_height(struct
  VncState *vs,
static int vnc_update_client_sync(VncState *vs, int has_dirty)
{
int ret = vnc_update_client(vs, has_dirty);
   -vnc_jobs_join(vs);
   +if (ret = 0)
   +vnc_jobs_join(vs);
  
  What happens with any running jobs if you skip the jouin call here?
 
 Hi, Gerd. The other jobs are unaffected, and other clients still work.

My concern is more that we might keep threads running which should not
run any more.

cheers,
  Gerd






[Qemu-devel] [Bug 1247122] Re: kernel guest 3.10.6 boot failure at sha_transform

2013-11-04 Thread Christoph Eck
Tested with latest qemu git repos. Got the same kernel panic.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1247122

Title:
  kernel guest 3.10.6 boot failure at sha_transform

Status in QEMU:
  New

Bug description:
  Error occured with
   - QEMU emulator version 1.6.1 and
   - QEMU emulator version 1.2.0 (qemu-kvm-1.2.0+noroms-0ubuntu2.12.10.5, 
Debian)

  started qemu-system-i386 with default settings and no additional
  arguments.

  Tested a kernel version 3.5.4 without any problems.

  Hint: new kernel CONFIG parameter CONFIG_CRYPTO_SHA256 is needed in
  kernel 3.10.6.

  Kernel panic with kernel 3.10.6

  [0.00] Linux version 3.10.6-generic-1.0 (root@STAG-Linux6) (gcc 
version 4.5.0 (GCC) ) #1 SMP PREEMPT Fri Nov 1 07:51:10 UTC 2013
  [0.00] e820: BIOS-provided physical RAM map:
  [0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
  [0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
  [0.00] BIOS-e820: [mem 0x000f-0x000f] reserved
  [0.00] BIOS-e820: [mem 0x0010-0x07ffdfff] usable
  [0.00] BIOS-e820: [mem 0x07ffe000-0x07ff] reserved
  [0.00] BIOS-e820: [mem 0xfffc-0x] reserved
  [0.00] Notice: NX (Execute Disable) protection missing in CPU!
  [0.00] SMBIOS 2.4 present.
  [0.00] e820: last_pfn = 0x7ffe max_arch_pfn = 0x10
  [0.00] found SMP MP-table at [mem 0x000f1850-0x000f185f] mapped at 
[c00f1850]
  [0.00] Scanning 1 areas for low memory corruption
  [0.00] init_memory_mapping: [mem 0x-0x000f]
  [0.00] init_memory_mapping: [mem 0x0780-0x07bf]
  [0.00] init_memory_mapping: [mem 0x0010-0x077f]
  [0.00] init_memory_mapping: [mem 0x07c0-0x07ffdfff]
  [0.00] ACPI: RSDP 000f16f0 00014 (v00 BOCHS )
  [0.00] ACPI: RSDT 07ffe450 00034 (v01 BOCHS  BXPCRSDT 0001 BXPC 
0001)
  [0.00] ACPI: FACP 0780 00074 (v01 BOCHS  BXPCFACP 0001 BXPC 
0001)
  [0.00] ACPI: DSDT 07ffe490 01137 (v01   BXPC   BXDSDT 0001 INTL 
20100528)
  [0.00] ACPI: FACS 0740 00040
  [0.00] ACPI: SSDT 07fff700 00838 (v01 BOCHS  BXPCSSDT 0001 BXPC 
0001)
  [0.00] ACPI: APIC 07fff610 00078 (v01 BOCHS  BXPCAPIC 0001 BXPC 
0001)
  [0.00] ACPI: HPET 07fff5d0 00038 (v01 BOCHS  BXPCHPET 0001 BXPC 
0001)
  [0.00] 0MB HIGHMEM available.
  [0.00] 127MB LOWMEM available.
  [0.00]   mapped low ram: 0 - 07ffe000
  [0.00]   low ram: 0 - 07ffe000
  [0.00] Zone ranges:
  [0.00]   DMA  [mem 0x1000-0x00ff]
  [0.00]   Normal   [mem 0x0100-0x07ffdfff]
  [0.00]   HighMem  empty
  [0.00] Movable zone start for each node
  [0.00] Early memory node ranges
  [0.00]   node   0: [mem 0x1000-0x0009efff]
  [0.00]   node   0: [mem 0x0010-0x07ffdfff]
  [0.00] Using APIC driver default
  [0.00] ACPI: PM-Timer IO Port: 0xb008
  [0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
  [0.00] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
  [0.00] ACPI: IOAPIC (id[0x00] address[0xfec0] gsi_base[0])
  [0.00] IOAPIC[0]: apic_id 0 already used, trying 1
  [0.00] IOAPIC[0]: apic_id 1, version 17, address 0xfec0, GSI 0-23
  [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
  [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
  [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
  [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
  [0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
  [0.00] Using ACPI (MADT) for SMP configuration information
  [0.00] ACPI: HPET id: 0x8086a201 base: 0xfed0
  [0.00] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
  [0.00] e820: [mem 0x0800-0xfffb] available for PCI devices
  [0.00] Booting paravirtualized kernel on bare hardware
  [0.00] setup_percpu: NR_CPUS:4 nr_cpumask_bits:4 nr_cpu_ids:1 
nr_node_ids:1
  [0.00] PERCPU: Embedded 12 pages/cpu @c7eef000 s27200 r0 d21952 u49152
  [0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 32412
  [0.00] Kernel command line: BOOT_IMAGE=/boot/vmlinux-3.10.6 
root=/dev/sda1 ro acpi_enforce_resources=no console=ttyS0
  [0.00] PID hash table entries: 512 (order: -1, 2048 bytes)
  [0.00] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
  [0.00] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
  [0.00] Initializing CPU#0
  [0.00] 

Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Alexey Kardashevskiy
On 11/04/2013 10:50 PM, Thomas Huth wrote:
 On Mon, 4 Nov 2013 12:28:12 +0100
 Alexander Graf ag...@suse.de wrote:
 

 On 04.11.2013, at 11:55, Benjamin Herrenschmidt b...@kernel.crashing.org 
 wrote:

 On Mon, 2013-11-04 at 11:44 +0100, Alexander Graf wrote:
 On 01.11.2013, at 11:21, Alexey Kardashevskiy a...@ozlabs.ru wrote:

 SLOF gets really confused if RTAS/device-tree and everything else
 what SLOF can use is not in the very first block of the very first
 memory node.

 This makes sure that the RMA area is where SLOF expects it to be.

 Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
 Cc: Nikunj A Dadhania nik...@linux.vnet.ibm.com
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 hw/ppc/spapr.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

 diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
 index 09dc635..09a5d94 100644
 --- a/hw/ppc/spapr.c
 +++ b/hw/ppc/spapr.c
 @@ -1113,7 +1113,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs 
 *args)
int i;
MemoryRegion *sysmem = get_system_memory();
MemoryRegion *ram = g_new(MemoryRegion, 1);
 -hwaddr rma_alloc_size;
 +hwaddr rma_alloc_size, node0_size;
uint32_t initrd_base = 0;
long kernel_size = 0, initrd_size = 0;
long load_limit, rtas_limit, fw_size;
 @@ -1154,6 +1154,12 @@ static void ppc_spapr_init(QEMUMachineInitArgs 
 *args)
spapr-rma_size = MIN(spapr-rma_size, 0x1000);
}
}
 +/*
 + * SLOF gets confused if RMA resides not in the first block
 + * of the first memory node so let's fix it.
 + */
 +node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
 +spapr-rma_size = MIN(spapr-rma_size, node0_size);
 So if I create a NUMA node of 4MB that will be my RMA? That sounds pretty 
 broken, especially on 970.

 Why does SLOF have any issues with NUMA memory nodes? It can just ignore 
 them, no?

 Because the only way SLOF knows about the RMA is by using the first
 reg entry of the first memory node and that's *all* SLOF knows about.

 If we start putting things like the DT, SLOF itself, etc... outside of
 that region, it will crash.
 
 Ok, the question is whether this is a bug in SLOF and should be fixed
 there or whether the RMA should really be limited to the RAM of the
 first node only.


PAPR says in Hypervisor Call Functions:

Logical addresses start at zero. When control is initially passed to the
OS from the platform, the first region is the
single RMA. The first region has logical region identifier of zero. This
first region is specified by the first address -
length pair of the “reg” property of the /memory node of the OF device tree.


Question about english - is the single RMA equal to the only RMA?


 Looking at the function spapr_populate_memory(), it seems there is
 already similar code there, so I assume the RMA should really be
 limited to that size:
 
 /* memory node(s) */
 node0_size = (nb_numa_nodes  1) ? node_mem[0] : ram_size;
 if (spapr-rma_size  node0_size) {
 spapr-rma_size = node0_size;
 }
 
 Maybe this piece of code could just be done earlier instead, before
 setting up the fdt_addr and rtas_addr variables, instead of adding the
 similar piece of code of this patch?
 
 So we constrain things to the rma that way.

 Creating 4M nodes makes no sense anyway

 So why don't we just use the limit VRMA to 256MB code always and error out 
 of node0 is smaller? I don't think SLOF can run with less than 256MB anyway.
 
 It's 128 MB nowadays ... there is even a define called MIN_RMA_SLOF for
 this in the code already.
 
  Thomas
 


-- 
Alexey



Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Peter Maydell
On 4 November 2013 13:11, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 PAPR says in Hypervisor Call Functions:

 Logical addresses start at zero. When control is initially passed to the
 OS from the platform, the first region is the
 single RMA. The first region has logical region identifier of zero. This
 first region is specified by the first address -
 length pair of the “reg” property of the /memory node of the OF device tree.


 Question about english - is the single RMA equal to the only RMA?

No. the single RMA is weird English and to me implies
that it's a term that's been defined earlier or at least
that there is some surrounding context that would make it
make more sense (eg some contrasting definition of
single RMA vs double RMA).

-- PMM



Re: [Qemu-devel] [PATCH] kvm: Add a new machine property kvm_type

2013-11-04 Thread Jan Kiszka
On 2013-10-07 18:53, Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Targets like ppc64 support different typed of KVM, one which use
 hypervisor mode and the other which doesn't. Add a new machine
 property kvm_type that helps in selecting the respective ones
 We also add a new QEMUMachine callback get_vm_type that helps
 in mapping the string representation of kvm type specified.
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

...

 diff --git a/vl.c b/vl.c
 index 4e709d5..7ecc581 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -427,7 +427,12 @@ static QemuOptsList qemu_machine_opts = {
  .name = usb,
  .type = QEMU_OPT_BOOL,
  .help = Set on/off to enable/disable usb,
 +},{
 +.name = kvm_type,
 +.type = QEMU_OPT_STRING,
 +.help = Set to kvm type to be used in create vm ioctl,

This does not sound like an appropriate documentation for an enduser.

OTOH, I do not recall right now how these help strings can be obtained
at all. One could intuitively try -machine sometype,?, but that's
not implemented. Anyone?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH] rtc: remove dead SQW IRQ code

2013-11-04 Thread Jan Kiszka
On 2013-08-14 13:29, Jan Kiszka wrote:
 This was once introduced by commit 100d9891d6 but was never used in-tree
 and then got broken by commit 32e0c8260d. Time to clean up.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  hw/timer/mc146818rtc.c |9 +
  1 files changed, 1 insertions(+), 8 deletions(-)
 
 diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c
 index 3c3baac..b3f1baa 100644
 --- a/hw/timer/mc146818rtc.c
 +++ b/hw/timer/mc146818rtc.c
 @@ -70,7 +70,6 @@ typedef struct RTCState {
  uint64_t last_update;
  int64_t offset;
  qemu_irq irq;
 -qemu_irq sqw_irq;
  int it_shift;
  /* periodic timer */
  QEMUTimer *periodic_timer;
 @@ -151,8 +150,7 @@ static void periodic_timer_update(RTCState *s, int64_t 
 current_time)
  
  period_code = s-cmos_data[RTC_REG_A]  0x0f;
  if (period_code != 0
 - ((s-cmos_data[RTC_REG_B]  REG_B_PIE)
 -|| ((s-cmos_data[RTC_REG_B]  REG_B_SQWE)  s-sqw_irq))) {
 + (s-cmos_data[RTC_REG_B]  REG_B_PIE)) {
  if (period_code = 2)
  period_code += 7;
  /* period in 32 Khz cycles */
 @@ -202,11 +200,6 @@ static void rtc_periodic_timer(void *opaque)
  #endif
  qemu_irq_raise(s-irq);
  }
 -if (s-cmos_data[RTC_REG_B]  REG_B_SQWE) {
 -/* Not square wave at all but we don't want 2048Hz interrupts!
 -   Must be seen as a pulse.  */
 -qemu_irq_raise(s-sqw_irq);
 -}
  }
  
  /* handle update-ended timer */
 

ping

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH] kvm: Add a new machine property kvm_type

2013-11-04 Thread Alexander Graf

On 04.11.2013, at 14:28, Jan Kiszka jan.kis...@siemens.com wrote:

 On 2013-10-07 18:53, Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Targets like ppc64 support different typed of KVM, one which use
 hypervisor mode and the other which doesn't. Add a new machine
 property kvm_type that helps in selecting the respective ones
 We also add a new QEMUMachine callback get_vm_type that helps
 in mapping the string representation of kvm type specified.
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 ...
 
 diff --git a/vl.c b/vl.c
 index 4e709d5..7ecc581 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -427,7 +427,12 @@ static QemuOptsList qemu_machine_opts = {
 .name = usb,
 .type = QEMU_OPT_BOOL,
 .help = Set on/off to enable/disable usb,
 +},{
 +.name = kvm_type,
 +.type = QEMU_OPT_STRING,
 +.help = Set to kvm type to be used in create vm ioctl,
 
 This does not sound like an appropriate documentation for an enduser.
 
 OTOH, I do not recall right now how these help strings can be obtained
 at all. One could intuitively try -machine sometype,?, but that's
 not implemented. Anyone?

They should definitely show up in the man page. But yes, -machine kvm_type=? 
would be helpful as well.

As for -machine ? we are have a problem orthogonal to this patch. I agree that 
we need some way to programmatically list all machine options.


Alex




[Qemu-devel] [PATCH 1.7] vl: allow cont from panicked state

2013-11-04 Thread Paolo Bonzini
After reporting the GUEST_PANICKED monitor event, QEMU stops the VM.
The reason for this is that events are edge-triggered, and can be lost if
management dies at the wrong time.  Stopping a panicked VM lets management
know of a panic even if it has crashed; management can learn about the
panic when it restarts and queries running QEMU processes.  The downside
is of course that the VM will be paused while management is not running,
but that is acceptable if it only happens with explicit -device pvpanic.

Upon learning of a panic, management (if configured to do so) can pick a
variety of behaviors: leave the VM paused, reset it, destroy it.  In
addition to all of these behaviors, it is possible to dump the VM core
from the host.

However, right now, the panicked state is irreversible, and can only be
exited by resetting the machine.  This means that any policy decision
is entirely in the hands of the host.  In particular there is no way to
use the reboot on panic option together with pvpanic.

This patch makes the panicked state reversible (and removes various
workarounds that were there because of the state being irreversible).
With this change, management has a wider set of possible policies: it
can just log the crash and leave policy to the guest, it can leave the
VM paused.  In particular, the log the crash and continue is implemented
simply by sending a cont as soon as management learns about the panic.
Management could also implement the irreversible paused state itself.
And again, all such actions can be coupled with dumping the VM core.

Unfortunately we cannot change the behavior of 1.6.0.  Thus, even if
it uses -device pvpanic, management should check for cont failures.
If cont fails, management can then log that the VM remained paused
and urge the administrator to update QEMU.

Reviewed-by: Laszlo Ersek ler...@redhat.com
Reviewed-by: Luiz Capitulino lcapitul...@redhat.com
Acked-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 gdbstub.c | 3 ---
 vl.c  | 6 ++
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index 0e5a3f5..e8ab0b2 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -368,9 +368,6 @@ static inline void gdb_continue(GDBState *s)
 #ifdef CONFIG_USER_ONLY
 s-running_state = 1;
 #else
-if (runstate_check(RUN_STATE_GUEST_PANICKED)) {
-runstate_set(RUN_STATE_DEBUG);
-}
 if (!runstate_needs_reset()) {
 vm_start();
 }
diff --git a/vl.c b/vl.c
index b42ac67..16834bd 100644
--- a/vl.c
+++ b/vl.c
@@ -638,9 +638,8 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
 
-{ RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },
+{ RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
 { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
-{ RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
 
 { RUN_STATE_MAX, RUN_STATE_MAX },
 };
@@ -686,8 +685,7 @@ int runstate_is_running(void)
 bool runstate_needs_reset(void)
 {
 return runstate_check(RUN_STATE_INTERNAL_ERROR) ||
-runstate_check(RUN_STATE_SHUTDOWN) ||
-runstate_check(RUN_STATE_GUEST_PANICKED);
+runstate_check(RUN_STATE_SHUTDOWN);
 }
 
 StatusInfo *qmp_query_status(Error **errp)
-- 
1.8.3.1





[Qemu-devel] [PATCH 1.7] pc: get rid of builtin pvpanic for -M pc-1.5

2013-11-04 Thread Paolo Bonzini
This causes two slight backwards-incompatibilities between -M pc-1.5
and 1.5's -M pc:

(1) a fw_cfg file is removed with this patch.  This is only a problem
if migration stops the virtual machine exactly during fw_cfg enumeration.

(2) after migration, a VM created without an explicit -device pvpanic
will stop reporting panics to management.

The first problem only occurs if migration is done at a very, very
early point (and I'm not sure it can happen in practice for reasonable-size
VMs, since it will likely take more time to send the RAM to destination,
than it will take for BIOS to scan fw_cfg).

The second problem only occurs if the guest panics _and_ has a guest
driver _and_ management knows to look at the crash event, so it is
mostly theoretical at this point in time.

Thus keep the code simple, and pretend it was never broken.

Signed-off-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/i386/pc_piix.c| 7 ---
 hw/i386/pc_q35.c | 7 ---
 hw/misc/pvpanic.c| 5 -
 include/hw/i386/pc.h | 3 ---
 4 files changed, 22 deletions(-)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index c6042c7..a041e53 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -57,7 +57,6 @@ static const int ide_iobase[MAX_IDE_BUS] = { 0x1f0, 0x170 };
 static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 };
 static const int ide_irq[MAX_IDE_BUS] = { 14, 15 };
 
-static bool has_pvpanic;
 static bool has_pci_info = true;
 
 /* PC hardware initialisation */
@@ -225,10 +224,6 @@ static void pc_init1(QEMUMachineInitArgs *args,
 if (pci_enabled) {
 pc_pci_device_init(pci_bus);
 }
-
-if (has_pvpanic) {
-pvpanic_init(isa_bus);
-}
 }
 
 static void pc_init_pci(QEMUMachineInitArgs *args)
@@ -245,13 +240,11 @@ static void pc_compat_1_6(QEMUMachineInitArgs *args)
 static void pc_compat_1_5(QEMUMachineInitArgs *args)
 {
 pc_compat_1_6(args);
-has_pvpanic = true;
 }
 
 static void pc_compat_1_4(QEMUMachineInitArgs *args)
 {
 pc_compat_1_5(args);
-has_pvpanic = false;
 x86_cpu_compat_set_features(n270, FEAT_1_ECX, 0, CPUID_EXT_MOVBE);
 x86_cpu_compat_set_features(Westmere, FEAT_1_ECX, 0, 
CPUID_EXT_PCLMULQDQ);
 }
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index ca84e1c..593ed2a 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -47,7 +47,6 @@
 /* ICH9 AHCI has 6 ports */
 #define MAX_SATA_PORTS 6
 
-static bool has_pvpanic;
 static bool has_pci_info = true;
 
 /* PC hardware initialisation */
@@ -214,10 +213,6 @@ static void pc_q35_init(QEMUMachineInitArgs *args)
 if (pci_enabled) {
 pc_pci_device_init(host_bus);
 }
-
-if (has_pvpanic) {
-pvpanic_init(isa_bus);
-}
 }
 
 static void pc_compat_1_6(QEMUMachineInitArgs *args)
@@ -229,13 +224,11 @@ static void pc_compat_1_6(QEMUMachineInitArgs *args)
 static void pc_compat_1_5(QEMUMachineInitArgs *args)
 {
 pc_compat_1_6(args);
-has_pvpanic = true;
 }
 
 static void pc_compat_1_4(QEMUMachineInitArgs *args)
 {
 pc_compat_1_5(args);
-has_pvpanic = false;
 x86_cpu_compat_set_features(n270, FEAT_1_ECX, 0, CPUID_EXT_MOVBE);
 x86_cpu_compat_set_features(Westmere, FEAT_1_ECX, 0, 
CPUID_EXT_PCLMULQDQ);
 }
diff --git a/hw/misc/pvpanic.c b/hw/misc/pvpanic.c
index b64e3bb..4c85906 100644
--- a/hw/misc/pvpanic.c
+++ b/hw/misc/pvpanic.c
@@ -112,11 +112,6 @@ static void pvpanic_isa_realizefn(DeviceState *dev, Error 
**errp)
 isa_register_ioport(d, s-io, s-ioport);
 }
 
-void pvpanic_init(ISABus *bus)
-{
-isa_create_simple(bus, TYPE_ISA_PVPANIC_DEVICE);
-}
-
 static Property pvpanic_isa_properties[] = {
 DEFINE_PROP_UINT16(ioport, PVPanicState, ioport, 0x505),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 6083839..f6313b2 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -213,9 +213,6 @@ static inline bool isa_ne2000_init(ISABus *bus, int base, 
int irq, NICInfo *nd)
 void pc_system_firmware_init(MemoryRegion *rom_memory,
  bool isapc_ram_fw);
 
-/* pvpanic.c */
-void pvpanic_init(ISABus *bus);
-
 /* e820 types */
 #define E820_RAM1
 #define E820_RESERVED   2
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH] kvm: Add a new machine property kvm_type

2013-11-04 Thread Jan Kiszka
On 2013-11-04 14:30, Alexander Graf wrote:
 
 On 04.11.2013, at 14:28, Jan Kiszka jan.kis...@siemens.com wrote:
 
 On 2013-10-07 18:53, Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 Targets like ppc64 support different typed of KVM, one which use
 hypervisor mode and the other which doesn't. Add a new machine
 property kvm_type that helps in selecting the respective ones
 We also add a new QEMUMachine callback get_vm_type that helps
 in mapping the string representation of kvm type specified.

 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com

 ...

 diff --git a/vl.c b/vl.c
 index 4e709d5..7ecc581 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -427,7 +427,12 @@ static QemuOptsList qemu_machine_opts = {
 .name = usb,
 .type = QEMU_OPT_BOOL,
 .help = Set on/off to enable/disable usb,
 +},{
 +.name = kvm_type,
 +.type = QEMU_OPT_STRING,
 +.help = Set to kvm type to be used in create vm ioctl,

 This does not sound like an appropriate documentation for an enduser.

 OTOH, I do not recall right now how these help strings can be obtained
 at all. One could intuitively try -machine sometype,?, but that's
 not implemented. Anyone?
 
 They should definitely show up in the man page. But yes, -machine kvm_type=? 
 would be helpful as well.

The man page is not generate from this C code, is it?

 
 As for -machine ? we are have a problem orthogonal to this patch. I agree 
 that we need some way to programmatically list all machine options.

I'm not saying that this is an issue of this patch, just that I wonder
how that .help message can be made visisble at at besides with the help
of an editor.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 2/6] qapi: rename MonitorEvent to QEvent

2013-11-04 Thread Luiz Capitulino
On Mon, 04 Nov 2013 09:59:50 +0800
Wenchao Xia xiaw...@linux.vnet.ibm.com wrote:

 于 2013/11/1 22:02, Luiz Capitulino 写道:
  On Mon, 21 Oct 2013 10:16:01 +0800
  Wenchao Xia xiaw...@linux.vnet.ibm.com wrote:
 
  Signed-off-by: Wenchao Xia xiaw...@linux.vnet.ibm.com
  ---
block.c|2 +-
include/block/block_int.h  |2 +-
include/monitor/monitor.h  |6 +++---
monitor.c  |   12 ++--
stubs/mon-protocol-event.c |2 +-
ui/vnc.c   |2 +-
6 files changed, 13 insertions(+), 13 deletions(-)
 
  diff --git a/block.c b/block.c
  index 2c15e5d..458a4f8 100644
  --- a/block.c
  +++ b/block.c
  @@ -1760,7 +1760,7 @@ void bdrv_set_dev_ops(BlockDriverState *bs, const 
  BlockDevOps *ops,
}
 
void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
  -   MonitorEvent ev,
  +   QEvent ev,
   BlockErrorAction action, bool is_read)
{
QObject *data;
  diff --git a/include/block/block_int.h b/include/block/block_int.h
  index bcc72e2..bfdaf84 100644
  --- a/include/block/block_int.h
  +++ b/include/block/block_int.h
  @@ -337,7 +337,7 @@ AioContext *bdrv_get_aio_context(BlockDriverState *bs);
int is_windows_drive(const char *filename);
#endif
void bdrv_emit_qmp_error_event(const BlockDriverState *bdrv,
  -   MonitorEvent ev,
  +   QEvent ev,
   BlockErrorAction action, bool is_read);
 
/**
  diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
  index 10fa0e3..8b14a6f 100644
  --- a/include/monitor/monitor.h
  +++ b/include/monitor/monitor.h
  @@ -20,7 +20,7 @@ extern Monitor *default_mon;
#define MONITOR_CMD_ASYNC   0x0001
 
/* QMP events */
  -typedef enum MonitorEvent {
  +typedef enum QEvent {
 
  Qt has a QEvent class, so QEvent is not a good name for us if we're
  considering making it public in the schema (which could become an
  external library in the distant future).
 
  I suggest calling it QMPEvent.
 
 
Maybe QMPEventType, since QMPEvent should be used an union?

If we add the 'event' type, like:

{ 'event': 'BLOCK_IO_ERROR', 'data': { ... } }

Then we don't need an union.



Re: [Qemu-devel] [PATCH] spapr: make sure RMA is in first mode of first memory node

2013-11-04 Thread Alexey Kardashevskiy
On 11/05/2013 12:19 AM, Peter Maydell wrote:
 On 4 November 2013 13:11, Alexey Kardashevskiy a...@ozlabs.ru wrote:
 PAPR says in Hypervisor Call Functions:

 Logical addresses start at zero. When control is initially passed to the
 OS from the platform, the first region is the
 single RMA. The first region has logical region identifier of zero. This
 first region is specified by the first address -
 length pair of the “reg” property of the /memory node of the OF device tree.


 Question about english - is the single RMA equal to the only RMA?
 
 No. the single RMA is weird English and to me implies
 that it's a term that's been defined earlier or at least
 that there is some surrounding context that would make it
 make more sense (eg some contrasting definition of
 single RMA vs double RMA).


Oh. Some more context then:

14.1.1 Real Mode Accesses
When the OS controlling an LPAR runs with address translation turned off
(MSRDR or MSRIR bit(s) =0) (real mode)
the LPAR hardware translates the memory addresses to an LPAR unique area
known as the Real Mode Area (RMA).
When control is initially passed to the OS from the platform, the RMA
starts at the LPAR's logical address 0 and is the
first logical memory block reported in the LPAR’s device tree. In general,
the RMA is a subset of the LPAR's logical
address space. Attempting a non relocated access beyond the bounds of the
RMA results in an storage interrupt
(ISI/DSI depending upon instruction or data reference). The RMA hardware
translation scheme is platform dependent.
The options are given below.


-- 
Alexey



Re: [Qemu-devel] [PATCH v3 0/2] smbios nicer defaults for DMI type 1 (System)

2013-11-04 Thread Michael S. Tsirkin
On Wed, Oct 30, 2013 at 01:56:38PM +0100, arm...@redhat.com wrote:
 From: Markus Armbruster arm...@redhat.com
 
 Currently, we get SeaBIOS defaults: manufacturer Bochs, product Bochs,
 no version.  Best SeaBIOS can do, but we can provide better defaults:
 manufacturer QEMU, product  version taken from QEMUMachine desc and
 name.
 
 This series used to be called [PATCH v2 0/7] smbios cleanup  nicer
 defaults for type 1, but its cleanup parts [0-5/7] already went in.
 
 Andreas didn't like [PATCH v2 6/7] vl: Set current_machine early,
 and suggested to put he machine into QEMUMachineInitArgs.  PATCH 1/2
 does exactly that, replacing v2's PATCH 6/7.  PATCH 2/2 is v2's PATCH
 7/7 rebased on top.
 
 Michael didn't like the way I suppress the nicer defaults for old
 machine types via static bool smbios_type1_defaults, and thought it
 would be nicer to have it in QEMUMachine or some device.  I considered
 this, but decided to stick to smbios_type1_defaults, because
 
 1. There is no suitable device.
 2. I'd rather not extend QEMUMachine with target-specific stuff.
 3. A static bool whose default value gets flipped by some QEMUMachine
init() methods is what we commonly do in such cases, so let's stick
to that.
 
 v3: Do it the way Andreas suggested
 v2: Rebase, only last patch had conflicts

I picked this up merging 2/2 and 3/2.
Unfortunately it';s too late for 1.7 but I'll try to
merge it as soon as 1.8 opens.


 Markus Armbruster (2):
   hw: Pass QEMUMachine to its init() method
   smbios: Set system manufacturer, product  version by default
 
  hw/i386/pc_piix.c|  9 +
  hw/i386/pc_q35.c |  7 +++
  hw/i386/smbios.c | 14 ++
  include/hw/boards.h  |  7 +--
  include/hw/i386/smbios.h |  2 ++
  vl.c |  3 ++-
  6 files changed, 39 insertions(+), 3 deletions(-)
 
 -- 
 1.8.1.4



[Qemu-devel] [PATCH uq/master] pci-assign: Remove dead code for direct I/O region access from userspace

2013-11-04 Thread Jan Kiszka
This feature was already deprecated back then in qemu-kvm, ie. before
pci-assign went upstream. assigned_dev_ioport_rw will never be invoked
with resource_fd  0.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/i386/kvm/pci-assign.c | 56 +---
 1 file changed, 10 insertions(+), 46 deletions(-)

diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c
index 011764f..4e65110 100644
--- a/hw/i386/kvm/pci-assign.c
+++ b/hw/i386/kvm/pci-assign.c
@@ -154,55 +154,19 @@ static uint64_t assigned_dev_ioport_rw(AssignedDevRegion 
*dev_region,
 uint64_t val = 0;
 int fd = dev_region-region-resource_fd;
 
-if (fd = 0) {
-if (data) {
-DEBUG(pwrite data=% PRIx64 , size=%d, e_phys= TARGET_FMT_plx
-  , addr=TARGET_FMT_plx\n, *data, size, addr, addr);
-if (pwrite(fd, data, size, addr) != size) {
-error_report(%s - pwrite failed %s,
- __func__, strerror(errno));
-}
-} else {
-if (pread(fd, val, size, addr) != size) {
-error_report(%s - pread failed %s,
- __func__, strerror(errno));
-val = (1UL  (size * 8)) - 1;
-}
-DEBUG(pread val=% PRIx64 , size=%d, e_phys= TARGET_FMT_plx
-  , addr= TARGET_FMT_plx \n, val, size, addr, addr);
+if (data) {
+DEBUG(pwrite data=% PRIx64 , size=%d, e_phys= TARGET_FMT_plx
+  , addr=TARGET_FMT_plx\n, *data, size, addr, addr);
+if (pwrite(fd, data, size, addr) != size) {
+error_report(%s - pwrite failed %s, __func__, strerror(errno));
 }
 } else {
-uint32_t port = addr + dev_region-u.r_baseport;
-
-if (data) {
-DEBUG(out data=% PRIx64 , size=%d, e_phys= TARGET_FMT_plx
-  , host=%x\n, *data, size, addr, port);
-switch (size) {
-case 1:
-outb(*data, port);
-break;
-case 2:
-outw(*data, port);
-break;
-case 4:
-outl(*data, port);
-break;
-}
-} else {
-switch (size) {
-case 1:
-val = inb(port);
-break;
-case 2:
-val = inw(port);
-break;
-case 4:
-val = inl(port);
-break;
-}
-DEBUG(in data=% PRIx64 , size=%d, e_phys= TARGET_FMT_plx
-  , host=%x\n, val, size, addr, port);
+if (pread(fd, val, size, addr) != size) {
+error_report(%s - pread failed %s, __func__, strerror(errno));
+val = (1UL  (size * 8)) - 1;
 }
+DEBUG(pread val=% PRIx64 , size=%d, e_phys= TARGET_FMT_plx
+  , addr= TARGET_FMT_plx \n, val, size, addr, addr);
 }
 return val;
 }
-- 
1.8.1.1.298.ge7eed54



[Qemu-devel] [PULL 0/3] pci, pc, pvpanic bug fixes

2013-11-04 Thread Michael S. Tsirkin
Please pull the following for 1.7.

The following changes since commit a126050a103c924b03388a9a64ce9af8c96b0969:

  Merge remote-tracking branch 'kwolf/tags/for-anthony' into staging 
(2013-10-31 17:02:26 +0100)

are available in the git repository at:


  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_anthony

for you to fetch changes up to df39076850958b842ac9e414dc3ab2895f1877bf:

  vl: allow cont from panicked state (2013-11-04 15:39:41 +0200)


pci, pc, pvpanic bug fixes

This fixes strange pvpanic behaviour: you had to
pause to let VM continue (and potentially reboot on panic
if enabled).

This also fixes two bugs reported by Andreas.
One is a long-standing bug exposed by recent pci changes,
the other affects old piix machine types and was caused
by recent acpi changes.

Signed-off-by: Michael S. Tsirkin m...@redhat.com


Michael S. Tsirkin (2):
  pc: disable acpi info for isapc and old pc machine
  exec: limit system memory size

Paolo Bonzini (1):
  vl: allow cont from panicked state

 exec.c| 7 ++-
 gdbstub.c | 3 ---
 hw/i386/pc_piix.c | 2 ++
 vl.c  | 6 ++
 4 files changed, 10 insertions(+), 8 deletions(-)

-- 
MST




[Qemu-devel] [PULL 2/3] exec: limit system memory size

2013-11-04 Thread Michael S. Tsirkin
The page table logic in exec.c assumes
that memory addresses are at most TARGET_PHYS_ADDR_SPACE_BITS.

But pci addresses are full 64 bit so if we try to render them ignoring
the extra bits, we get strange effects with sections overlapping each
other.

To fix, simply limit the system memory size to
 1  TARGET_PHYS_ADDR_SPACE_BITS,
pci addresses will be rendered within that.

Cc: qemu-sta...@nongnu.org
Reported-by: Andreas Färber afaer...@suse.de
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 exec.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/exec.c b/exec.c
index b453713..79610ce 100644
--- a/exec.c
+++ b/exec.c
@@ -1741,7 +1741,12 @@ void address_space_destroy_dispatch(AddressSpace *as)
 static void memory_map_init(void)
 {
 system_memory = g_malloc(sizeof(*system_memory));
-memory_region_init(system_memory, NULL, system, INT64_MAX);
+
+assert(TARGET_PHYS_ADDR_SPACE_BITS = 64);
+
+memory_region_init(system_memory, NULL, system,
+   TARGET_PHYS_ADDR_SPACE_BITS == 64 ?
+   UINT64_MAX : (0x1ULL  TARGET_PHYS_ADDR_SPACE_BITS));
 address_space_init(address_space_memory, system_memory, memory);
 
 system_io = g_malloc(sizeof(*system_io));
-- 
MST




[Qemu-devel] savevm speed [was: Re: Qemu-devel Digest, Vol 128, Issue 24]

2013-11-04 Thread Eric Blake
On 11/03/2013 07:41 AM, 宫文超 wrote:
 
 

Top-posted replies to an untrimmed digest message and with an irrelevant
subject line are considered poor netiquette.

 
 
 When i use the savevm command to create a online snapshot i found it's very 
 very slow,i found the online snapshot save four device information,and now i 
 only want to save the disk information so i delete the code to save the other 
 three device information,however,i found
 when the disk capacity is big it's also very slow ,could you offer some ideas 
 to solve this problem?THX by Wenchao Gong

What qemu version are you using?  There have been patches that will be
in the eventual qemu 1.7 to address some of the speed problems, and
further patches planned for 1.8 to make internal snapshots more flexible
to use.  Perhaps you can evaluate those patches and provide reviews.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PULL 1/3] pc: disable acpi info for isapc and old pc machine

2013-11-04 Thread Michael S. Tsirkin
Disable acpi build for isapc and no_kvmclock machine
types (used by xen), since acpi build currently expects pci.

Reported-by: Andreas Färber afaer...@suse.de
Signed-off-by: Michael S. Tsirkin m...@redhat.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
---
 hw/i386/pc_piix.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 24a98cb..4fdb7b6 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -309,6 +309,7 @@ static void pc_init_pci_1_2(QEMUMachineInitArgs *args)
 static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs *args)
 {
 has_pci_info = false;
+has_acpi_build = false;
 disable_kvm_pv_eoi();
 enable_compat_apic_id_mode();
 pc_init1(args, 1, 0);
@@ -317,6 +318,7 @@ static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs 
*args)
 static void pc_init_isa(QEMUMachineInitArgs *args)
 {
 has_pci_info = false;
+has_acpi_build = false;
 if (!args-cpu_model) {
 args-cpu_model = 486;
 }
-- 
MST




[Qemu-devel] [PULL 3/3] vl: allow cont from panicked state

2013-11-04 Thread Michael S. Tsirkin
From: Paolo Bonzini pbonz...@redhat.com

After reporting the GUEST_PANICKED monitor event, QEMU stops the VM.
The reason for this is that events are edge-triggered, and can be lost if
management dies at the wrong time.  Stopping a panicked VM lets management
know of a panic even if it has crashed; management can learn about the
panic when it restarts and queries running QEMU processes.  The downside
is of course that the VM will be paused while management is not running,
but that is acceptable if it only happens with explicit -device pvpanic.

Upon learning of a panic, management (if configured to do so) can pick a
variety of behaviors: leave the VM paused, reset it, destroy it.  In
addition to all of these behaviors, it is possible to dump the VM core
from the host.

However, right now, the panicked state is irreversible, and can only be
exited by resetting the machine.  This means that any policy decision
is entirely in the hands of the host.  In particular there is no way to
use the reboot on panic option together with pvpanic.

This patch makes the panicked state reversible (and removes various
workarounds that were there because of the state being irreversible).
With this change, management has a wider set of possible policies: it
can just log the crash and leave policy to the guest, it can leave the
VM paused.  In particular, the log the crash and continue is implemented
simply by sending a cont as soon as management learns about the panic.
Management could also implement the irreversible paused state itself.
And again, all such actions can be coupled with dumping the VM core.

Unfortunately we cannot change the behavior of 1.6.0.  Thus, even if
it uses -device pvpanic, management should check for cont failures.
If cont fails, management can then log that the VM remained paused
and urge the administrator to update QEMU.

Reviewed-by: Laszlo Ersek ler...@redhat.com
Reviewed-by: Luiz Capitulino lcapitul...@redhat.com
Acked-by: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 gdbstub.c | 3 ---
 vl.c  | 6 ++
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index 0e5a3f5..e8ab0b2 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -368,9 +368,6 @@ static inline void gdb_continue(GDBState *s)
 #ifdef CONFIG_USER_ONLY
 s-running_state = 1;
 #else
-if (runstate_check(RUN_STATE_GUEST_PANICKED)) {
-runstate_set(RUN_STATE_DEBUG);
-}
 if (!runstate_needs_reset()) {
 vm_start();
 }
diff --git a/vl.c b/vl.c
index efbff65..4ad15b8 100644
--- a/vl.c
+++ b/vl.c
@@ -638,9 +638,8 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
 
-{ RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED },
+{ RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING },
 { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE },
-{ RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG },
 
 { RUN_STATE_MAX, RUN_STATE_MAX },
 };
@@ -686,8 +685,7 @@ int runstate_is_running(void)
 bool runstate_needs_reset(void)
 {
 return runstate_check(RUN_STATE_INTERNAL_ERROR) ||
-runstate_check(RUN_STATE_SHUTDOWN) ||
-runstate_check(RUN_STATE_GUEST_PANICKED);
+runstate_check(RUN_STATE_SHUTDOWN);
 }
 
 StatusInfo *qmp_query_status(Error **errp)
-- 
MST




Re: [Qemu-devel] How to introduce bs-node_name ?

2013-11-04 Thread Luiz Capitulino
On Mon, 4 Nov 2013 12:13:59 +0100
Kevin Wolf kw...@redhat.com wrote:

 Am 01.11.2013 um 16:12 hat Luiz Capitulino geschrieben:
  On Fri, 01 Nov 2013 08:59:20 -0600
  Eric Blake ebl...@redhat.com wrote:
  
   On 11/01/2013 08:51 AM, Luiz Capitulino wrote:
On Wed, 30 Oct 2013 13:25:35 -0600
Eric Blake ebl...@redhat.com wrote:

On 10/30/2013 07:49 AM, Markus Armbruster wrote:
   
   
The first proposal is to add another parameter, say id.  Users can
then refer either to an arbitrary BDS by id, or (for backward
compatibility) to the root BDS by device.  When the code sees
device, it'll look up the BB, then fetch its root BDS.
   
CON: Existing parameter device becomes compatibility cruft.
   
PRO: Clean and obvious semantics (in my opinion).
   
I like this one as well.

Does this proposal makes device optional for existing commands? If it
does then I'm afraid it breaks compatibility because if you don't
specify a device you'll get an error today.
   
   Changing from error to success is not backwards-incompatible.  Old
   applications will ALWAYS supply device (because it used to be
   mandatory).  That is, a management application that was intentionally
   omitting 'device' and expecting an error is so unlikely to exist that we
   can consider such a management app as buggy.
  
  Doing such changes makes me nervous nevertheless. In my mind a stable
  API doesn't change. Of course that there might exceptions, but 99.9%
  of those exceptions should be bug fixes not deliberate API extensions.
 
 Stable API means that whatever was working before keeps working. Nothing
 less, but also nothing more.
 
 If an extension that changes from error to success is breaking the API,
 adding a new QMP command is breaking the API as well. Because sending an
 unknown command means returning an error...

Let's not turn this into a surreal discussion, I'm sure we all want the
best for our users.

There's another problem, btw. We're not doing command extensions for
input arguments before heaving a way to discover them. Even if there's
consensus that extending the command is okay, we need the query-qmp-schema
command merged first. IMO, this a blocker requirement (although it
shouldn't be difficult to finalize what has been posted already).

  A more compelling argument against it is the quality of the resulting
  command. I'm sure it's going to be anything but a simple, clean API.
 
 I consider one command with an argument made optional a higher quality
 API than having two different commands that do almost the same, except
 that the older one has a mandatory argument where the newer one has an
 optional argument.

I wouldn't expect the new command to be just a duplication of the old
one with a new argument. I'm expecting it to have a one-cover all
argument or to have a dict or something that makes more sense for
the new capabilities. But yes, we need a schema example to see how it would
look like.



Re: [Qemu-devel] [PATCH] kvm: Add a new machine property kvm_type

2013-11-04 Thread Alexander Graf

On 04.11.2013, at 14:32, Jan Kiszka jan.kis...@siemens.com wrote:

 On 2013-11-04 14:30, Alexander Graf wrote:
 
 On 04.11.2013, at 14:28, Jan Kiszka jan.kis...@siemens.com wrote:
 
 On 2013-10-07 18:53, Aneesh Kumar K.V wrote:
 From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 Targets like ppc64 support different typed of KVM, one which use
 hypervisor mode and the other which doesn't. Add a new machine
 property kvm_type that helps in selecting the respective ones
 We also add a new QEMUMachine callback get_vm_type that helps
 in mapping the string representation of kvm type specified.
 
 Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com
 
 ...
 
 diff --git a/vl.c b/vl.c
 index 4e709d5..7ecc581 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -427,7 +427,12 @@ static QemuOptsList qemu_machine_opts = {
.name = usb,
.type = QEMU_OPT_BOOL,
.help = Set on/off to enable/disable usb,
 +},{
 +.name = kvm_type,
 +.type = QEMU_OPT_STRING,
 +.help = Set to kvm type to be used in create vm ioctl,
 
 This does not sound like an appropriate documentation for an enduser.
 
 OTOH, I do not recall right now how these help strings can be obtained
 at all. One could intuitively try -machine sometype,?, but that's
 not implemented. Anyone?
 
 They should definitely show up in the man page. But yes, -machine kvm_type=? 
 would be helpful as well.
 
 The man page is not generate from this C code, is it?

No, it has to be part of the patch :).

 
 
 As for -machine ? we are have a problem orthogonal to this patch. I agree 
 that we need some way to programmatically list all machine options.
 
 I'm not saying that this is an issue of this patch, just that I wonder
 how that .help message can be made visisble at at besides with the help
 of an editor.

I think we need both. We also need help on both levels: machine options and 
arguments to machine options.


Alex




Re: [Qemu-devel] [PATCH] qemu: Broken -smb with latest SAMBA package. (Unsupported security=share option)

2013-11-04 Thread Michael Tokarev

04.11.2013 00:06, Jan Kiszka wrote:

On 2013-11-01 11:10, Michael Tokarev wrote:

[]

If Jan picks it up, that's fine.  If not, I think it can go
to the trivial patches queue.


Works fine, applied to queues/slirp.


Okay, thank you Jan.


But this is not a trivial patch as the fix is not obvious for a reader
(unless you know smb.conf semantics by heart).


It's trivial for my understanding.  If we require that every
change going to -trivial should be obvious to everyone, we
should just close it right away.

And this area does not have an active maintainer anyway, at least
according to MAINTAINERS and ./scripts/get_maintainer.pl.

Thanks,

/mjt



Re: [Qemu-devel] [PATCH] migration: avoid starting a new migration task while the previous one still exist

2013-11-04 Thread Eric Blake
On 11/04/2013 04:26 AM, Zhanghaoyu (A) wrote:
 Avoid starting a new migration task while the previous one still exist.
 
 Signed-off-by: Zeng Junliang zengjunli...@huawei.com
 ---

It's best to put comments here...

  migration.c |   34 ++
  1 files changed, 22 insertions(+), 12 deletions(-)
...

  s-downtime = end_time - start_time;
 -- 1.7.3.1.msysgit.0 BTW, while error happened during migration, need
 the erroring state to avoid starting a new migration task while
 current migration task still exist? And, do the new added migration
 states need to be reported to libvirt?

...rather than here, since most mail clients will strip anything after
the '-- ' separator output by git as if it were the signature, making it
very difficult to reply to (as you can see, my mailer corrupted the
formatting of your question due to how I forced it to reply to your
signature).

As to whether libvirt needs to know about the state, I'm not sure.
Libvirt already has its own code to prevent concurrent attempts at
migration, so theoretically libvirt should never trip the situation that
you appear to be coding for.  Perhaps providing more details in the
commit message about how to actually get into the situation will make it
easier to evaluate.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [Bug 1247796] [NEW] USB Passthrough not working for Windows 7 Embedded guest

2013-11-04 Thread Jens Frederich
Public bug reported:

USB Passthrough from host to guest is not working for a 32-bit Windows 7
guest.

The device appears in the device manager of Windows 7, but with Error
code 10: device cannot start. I have tried this with numerous USB
thumbdrives and a USB wireless NIC, all with the same result. The device
name and functionality is recognized, so at least some USB negotiation
is taking place.

qemu-system-x86_64 -machine accel=kvm:tcg -name VRTP1_win -S -M pc-
i440fx-1.4 -cpu SandyBridge -m 3072 -smp 2,sockets=1,cores=2,threads=1
-uuid 8ee5add7-f1a9-d697-9c18-2c1b4967c00e -no-user-config -nodefaults
-chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/VRTP1_win.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
-device ahci,id=ahci0,bus=pci.0,addr=0x6 -drive
file=/var/lib/libvirt/images/VN8912_Development_0.9.2.bin,if=none,id
=drive-sata0-0-0,format=raw -device ide-hd,bus=ahci0.0,drive=drive-
sata0-0-0,id=sata0-0-0,bootindex=1 -netdev
tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-
pci,netdev=hostnet0,id=net0,mac=52:54:00:71:f5:45,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device isa-
serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc
127.0.0.1:0 -vga std -device intel-hda,id=sound0,bus=pci.0,addr=0x4
-device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device usb-
host,hostbus=3,hostaddr=18,id=hostdev0 -device virtio-balloon-
pci,id=balloon0,bus=pci.0,addr=0x5

** Affects: qemu
 Importance: Undecided
 Status: New


** Tags: 13.04

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1247796

Title:
  USB Passthrough not working for Windows 7 Embedded guest

Status in QEMU:
  New

Bug description:
  USB Passthrough from host to guest is not working for a 32-bit Windows
  7 guest.

  The device appears in the device manager of Windows 7, but with Error
  code 10: device cannot start. I have tried this with numerous USB
  thumbdrives and a USB wireless NIC, all with the same result. The
  device name and functionality is recognized, so at least some USB
  negotiation is taking place.

  qemu-system-x86_64 -machine accel=kvm:tcg -name VRTP1_win -S -M pc-
  i440fx-1.4 -cpu SandyBridge -m 3072 -smp 2,sockets=1,cores=2,threads=1
  -uuid 8ee5add7-f1a9-d697-9c18-2c1b4967c00e -no-user-config -nodefaults
  -chardev
  
socket,id=charmonitor,path=/var/lib/libvirt/qemu/VRTP1_win.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
  -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
  -device ahci,id=ahci0,bus=pci.0,addr=0x6 -drive
  file=/var/lib/libvirt/images/VN8912_Development_0.9.2.bin,if=none,id
  =drive-sata0-0-0,format=raw -device ide-hd,bus=ahci0.0,drive=drive-
  sata0-0-0,id=sata0-0-0,bootindex=1 -netdev
  tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-
  pci,netdev=hostnet0,id=net0,mac=52:54:00:71:f5:45,bus=pci.0,addr=0x3
  -chardev pty,id=charserial0 -device isa-
  serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0
  -vnc 127.0.0.1:0 -vga std -device intel-
  hda,id=sound0,bus=pci.0,addr=0x4 -device hda-
  duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device usb-
  host,hostbus=3,hostaddr=18,id=hostdev0 -device virtio-balloon-
  pci,id=balloon0,bus=pci.0,addr=0x5

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1247796/+subscriptions



Re: [Qemu-devel] [PATCH] qemu: Broken -smb with latest SAMBA package. (Unsupported security=share option)

2013-11-04 Thread Jan Kiszka
On 2013-11-04 14:55, Michael Tokarev wrote:
 04.11.2013 00:06, Jan Kiszka wrote:
 On 2013-11-01 11:10, Michael Tokarev wrote:
 []
 If Jan picks it up, that's fine.  If not, I think it can go
 to the trivial patches queue.

 Works fine, applied to queues/slirp.
 
 Okay, thank you Jan.
 
 But this is not a trivial patch as the fix is not obvious for a reader
 (unless you know smb.conf semantics by heart).
 
 It's trivial for my understanding.  If we require that every
 change going to -trivial should be obvious to everyone, we
 should just close it right away.

Then we may need -less-trivial, because - to my understanding - -trivial
was once set up according to the rule that (most) QEMU hackers should be
able to understand that a trivial change is at least mostly harmless.

 And this area does not have an active maintainer anyway, at least
 according to MAINTAINERS and ./scripts/get_maintainer.pl.

Yeah, I think we have some holes there. That smb configuration
conceptually belongs to slirp is right, just not clear documented in our
script.

Jan




signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH] configure: Enable pie for powerpc and arm Linux

2013-11-04 Thread Dinar Valeev
From: Dinar Valeev dval...@suse.de

This patch enables pie for PowerPC and ARM architectures

Signed-off-by: Dinar Valeev dval...@suse.com
---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 91372f9..0130e7e 100755
--- a/configure
+++ b/configure
@@ -1297,7 +1297,7 @@ fi
 
 if test $pie = ; then
   case $cpu-$targetos in
-i386-Linux|x86_64-Linux|x32-Linux|i386-OpenBSD|x86_64-OpenBSD)
+
i386-Linux|x86_64-Linux|x32-Linux|ppc*-Linux|arm*-Linux|aarch64*-Linux|i386-OpenBSD|x86_64-OpenBSD)
   ;;
 *)
   pie=no
-- 
1.7.12.4




Re: [Qemu-devel] [PATCH] qemu-iotests: Filter out actual image size in 067

2013-11-04 Thread Stefan Hajnoczi
On Sat, Nov 02, 2013 at 02:52:11PM +0100, Max Reitz wrote:
 The actual size of the image file may differ depending on the Linux
 kernel currently running on the host. Filtering out this value makes
 this test pass in such cases.
 
 Signed-off-by: Max Reitz mre...@redhat.com
 ---
  tests/qemu-iotests/067 |  2 +-
  tests/qemu-iotests/067.out | 10 +-
  2 files changed, 6 insertions(+), 6 deletions(-)

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan



Re: [Qemu-devel] [PATCH] pc: disable acpi info for isapc and old pc machine

2013-11-04 Thread Paolo Bonzini
Il 04/11/2013 11:46, Michael S. Tsirkin ha scritto:
 Disable acpi build for isapc and no_kvmclock machine
 types (used by xen), since acpi build currently expects pci.
 
 Reported-by: Andreas Färber afaer...@suse.de
 Signed-off-by: Michael S. Tsirkin m...@redhat.com

Wait, Xen is not using pc_init_pci_no_kvmclock.  But that's not a
problem, we can expose ACPI info via fw_cfg and then Xen hvmloader can
decide whether to use it or not.  Right now it doesn't, maybe later on
it should.

If Xen is broken, we need to patch pc_xen_hvm_init, but Andreas only
tested isapc not xenfv.  So unless someone reports otherwise, there is
no need to exclude ACPI info from -M xenpv.

So my first review was wrong, but I still agree with the patch.

Paolo



[Qemu-devel] [Bug 1247796] Re: USB Passthrough not working for Windows 7 Embedded guest

2013-11-04 Thread Jens Frederich
The bug occured on Ubuntu 13.04

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1247796

Title:
  USB Passthrough not working for Windows 7 Embedded guest

Status in QEMU:
  New

Bug description:
  USB Passthrough from host to guest is not working for a 32-bit Windows
  7 guest.

  The device appears in the device manager of Windows 7, but with Error
  code 10: device cannot start. I have tried this with numerous USB
  thumbdrives and a USB wireless NIC, all with the same result. The
  device name and functionality is recognized, so at least some USB
  negotiation is taking place.

  qemu-system-x86_64 -machine accel=kvm:tcg -name VRTP1_win -S -M pc-
  i440fx-1.4 -cpu SandyBridge -m 3072 -smp 2,sockets=1,cores=2,threads=1
  -uuid 8ee5add7-f1a9-d697-9c18-2c1b4967c00e -no-user-config -nodefaults
  -chardev
  
socket,id=charmonitor,path=/var/lib/libvirt/qemu/VRTP1_win.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
  -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
  -device ahci,id=ahci0,bus=pci.0,addr=0x6 -drive
  file=/var/lib/libvirt/images/VN8912_Development_0.9.2.bin,if=none,id
  =drive-sata0-0-0,format=raw -device ide-hd,bus=ahci0.0,drive=drive-
  sata0-0-0,id=sata0-0-0,bootindex=1 -netdev
  tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-
  pci,netdev=hostnet0,id=net0,mac=52:54:00:71:f5:45,bus=pci.0,addr=0x3
  -chardev pty,id=charserial0 -device isa-
  serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0
  -vnc 127.0.0.1:0 -vga std -device intel-
  hda,id=sound0,bus=pci.0,addr=0x4 -device hda-
  duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device usb-
  host,hostbus=3,hostaddr=18,id=hostdev0 -device virtio-balloon-
  pci,id=balloon0,bus=pci.0,addr=0x5

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1247796/+subscriptions



Re: [Qemu-devel] [PATCH] pc: disable acpi info for isapc and old pc machine

2013-11-04 Thread Michael S. Tsirkin
On Mon, Nov 04, 2013 at 03:13:50PM +0100, Paolo Bonzini wrote:
 Il 04/11/2013 11:46, Michael S. Tsirkin ha scritto:
  Disable acpi build for isapc and no_kvmclock machine
  types (used by xen), since acpi build currently expects pci.
  
  Reported-by: Andreas Färber afaer...@suse.de
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
 
 Wait, Xen is not using pc_init_pci_no_kvmclock.  But that's not a
 problem, we can expose ACPI info via fw_cfg and then Xen hvmloader can
 decide whether to use it or not.  Right now it doesn't, maybe later on
 it should.
 
 If Xen is broken, we need to patch pc_xen_hvm_init, but Andreas only
 tested isapc not xenfv.  So unless someone reports otherwise, there is
 no need to exclude ACPI info from -M xenpv.
 
 So my first review was wrong, but I still agree with the patch.
 
 Paolo

I really went by the comment
/* PC init function for pc-0.10 to pc-0.13, and reused by xenfv */
static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs *args)

Clearly pc-0.13 and older should not have acpi :)

-- 
MST



Re: [Qemu-devel] [PATCH 0/2 v2] pc: inform SeaBIOS where 64-bit PCI hole begins

2013-11-04 Thread Igor Mammedov
On Mon, 04 Nov 2013 13:48:03 +0100
Gerd Hoffmann kra...@redhat.com wrote:

   Hi,
 
   So maybe design that with memory hotplug in mind?  Such as adding a new
   qemu-specific type QEMU_RAM_HOTPLUG?  Which seabios could use to reserve
   the memory (but not add it to the e820 table for the guest)?
  It will do job too. But extending semantics of standard table would be
  confusing. Yes, Seabios will filter it out but it doesn't make table
  less confusing.
 
 Was just thinking that it might be easier that way if we need e820
 entries for hotplug memory address space _anyway_.
I don't think that we need e820 entries for hotplug memory reserved space as
e820 should. In case present at boot hotpluggable DIMMs would be needed in E820,
we can add them as usual E820_RAM entries.

But regardless of what we do here it might be good keep option of adding non
standard entries in future, by filtering out unknown types in SeaBIOS.

 
  I'd prefer having a dedicated interface for it as a more clean solution.
 
 Agree.

So back to naming question, would you agree to renaming fw_cfg to the last
Michael's suggestion reserved-memory-end?
 
 cheers,
   Gerd
 
 
 




Re: [Qemu-devel] [PATCH v2] block: per caller dirty bitmap

2013-11-04 Thread Benoît Canet
Le Monday 04 Nov 2013 à 17:30:10 (+0800), Fam Zheng a écrit :
 Previously a BlockDriverState has only one dirty bitmap, so only one
 caller (e.g. a block job) can keep track of writing. This changes the
 dirty bitmap to a list and creates a BdrvDirtyBitmap for each caller, the
 lifecycle is managed with these new functions:
 
 bdrv_create_dirty_bitmap
 bdrv_release_dirty_bitmap
 
 Where BdrvDirtyBitmap is a linked list wrapper structure of HBitmap.
 
 In place of bdrv_set_dirty_tracking, a BdrvDirtyBitmap pointer argument
 is added to these functions, since each caller has its own dirty bitmap:
 
 bdrv_get_dirty
 bdrv_dirty_iter_init
 bdrv_get_dirty_count
 
 bdrv_set_dirty and bdrv_reset_dirty prototypes are unchanged but will
 internally walk the list of all dirty bitmaps and set them one by one.
 
 Signed-off-by: Fam Zheng f...@redhat.com
 
 ---
 v2: Added BdrvDirtyBitmap [Paolo]
 
 Signed-off-by: Fam Zheng f...@redhat.com
 ---
  block-migration.c | 22 +
  block.c   | 81 
 ---
  block/mirror.c| 23 --
  block/qapi.c  |  8 -
  include/block/block.h | 11 ---
  include/block/block_int.h |  2 +-
  6 files changed, 85 insertions(+), 62 deletions(-)
 
 diff --git a/block-migration.c b/block-migration.c
 index daf9ec1..8f4e826 100644
 --- a/block-migration.c
 +++ b/block-migration.c
 @@ -58,6 +58,7 @@ typedef struct BlkMigDevState {
  /* Protected by block migration lock.  */
  unsigned long *aio_bitmap;
  int64_t completed_sectors;
 +BdrvDirtyBitmap *dirty_bitmap;
  } BlkMigDevState;
  
  typedef struct BlkMigBlock {
 @@ -309,12 +310,21 @@ static int mig_save_device_bulk(QEMUFile *f, 
 BlkMigDevState *bmds)
  
  /* Called with iothread lock taken.  */
  
 -static void set_dirty_tracking(int enable)
 +static void set_dirty_tracking(void)
  {
  BlkMigDevState *bmds;
  
  QSIMPLEQ_FOREACH(bmds, block_mig_state.bmds_list, entry) {
 -bdrv_set_dirty_tracking(bmds-bs, enable ? BLOCK_SIZE : 0);
 +bmds-dirty_bitmap = bdrv_create_dirty_bitmap(bmds-bs, BLOCK_SIZE);
 +}
 +}
 +
 +static void unset_dirty_tracking(void)
 +{
 +BlkMigDevState *bmds;
 +
 +QSIMPLEQ_FOREACH(bmds, block_mig_state.bmds_list, entry) {
 +bdrv_release_dirty_bitmap(bmds-bs, bmds-dirty_bitmap);
  }
  }
  
 @@ -432,7 +442,7 @@ static int mig_save_device_dirty(QEMUFile *f, 
 BlkMigDevState *bmds,
  } else {
  blk_mig_unlock();
  }
 -if (bdrv_get_dirty(bmds-bs, sector)) {
 +if (bdrv_get_dirty(bmds-bs, bmds-dirty_bitmap, sector)) {
  
  if (total_sectors - sector  BDRV_SECTORS_PER_DIRTY_CHUNK) {
  nr_sectors = total_sectors - sector;
 @@ -554,7 +564,7 @@ static int64_t get_remaining_dirty(void)
  int64_t dirty = 0;
  
  QSIMPLEQ_FOREACH(bmds, block_mig_state.bmds_list, entry) {
 -dirty += bdrv_get_dirty_count(bmds-bs);
 +dirty += bdrv_get_dirty_count(bmds-bs, bmds-dirty_bitmap);
  }
  
  return dirty  BDRV_SECTOR_BITS;
 @@ -569,7 +579,7 @@ static void blk_mig_cleanup(void)
  
  bdrv_drain_all();
  
 -set_dirty_tracking(0);
 +unset_dirty_tracking();
  
  blk_mig_lock();
  while ((bmds = QSIMPLEQ_FIRST(block_mig_state.bmds_list)) != NULL) {
 @@ -604,7 +614,7 @@ static int block_save_setup(QEMUFile *f, void *opaque)
  init_blk_migration(f);
  
  /* start track dirty blocks */
 -set_dirty_tracking(1);
 +set_dirty_tracking();
  qemu_mutex_unlock_iothread();
  
  ret = flush_blks(f);
 diff --git a/block.c b/block.c
 index 58efb5b..ef7a55f 100644
 --- a/block.c
 +++ b/block.c
 @@ -49,6 +49,11 @@
  #include windows.h
  #endif
  
 +typedef struct BdrvDirtyBitmap {
 +HBitmap *bitmap;
 +QLIST_ENTRY(BdrvDirtyBitmap) list;
 +} BdrvDirtyBitmap;
 +
  #define NOT_DONE 0x7fff /* used while emulated sync operation in 
 progress */
  
  typedef enum {
 @@ -323,6 +328,7 @@ BlockDriverState *bdrv_new(const char *device_name)
  BlockDriverState *bs;
  
  bs = g_malloc0(sizeof(BlockDriverState));
 +QLIST_INIT(bs-dirty_bitmaps);
  pstrcpy(bs-device_name, sizeof(bs-device_name), device_name);
  if (device_name[0] != '\0') {
  QTAILQ_INSERT_TAIL(bdrv_states, bs, list);
 @@ -1615,7 +1621,7 @@ static void bdrv_move_feature_fields(BlockDriverState 
 *bs_dest,
  bs_dest-iostatus   = bs_src-iostatus;
  
  /* dirty bitmap */
 -bs_dest-dirty_bitmap   = bs_src-dirty_bitmap;
 +bs_dest-dirty_bitmaps  = bs_src-dirty_bitmaps;
  
  /* reference count */
  bs_dest-refcnt = bs_src-refcnt;
 @@ -1648,7 +1654,7 @@ void bdrv_swap(BlockDriverState *bs_new, 
 BlockDriverState *bs_old)
  
  /* bs_new must be anonymous and shouldn't have anything fancy enabled */
  assert(bs_new-device_name[0] == '\0');
 -assert(bs_new-dirty_bitmap == NULL);
 +

Re: [Qemu-devel] [PATCH 0/2 v2] pc: inform SeaBIOS where 64-bit PCI hole begins

2013-11-04 Thread Gerd Hoffmann
On Mo, 2013-11-04 at 15:35 +0100, Igor Mammedov wrote:
 On Mon, 04 Nov 2013 13:48:03 +0100
 Gerd Hoffmann kra...@redhat.com wrote:
 
Hi,
  
So maybe design that with memory hotplug in mind?  Such as adding a new
qemu-specific type QEMU_RAM_HOTPLUG?  Which seabios could use to reserve
the memory (but not add it to the e820 table for the guest)?
   It will do job too. But extending semantics of standard table would be
   confusing. Yes, Seabios will filter it out but it doesn't make table
   less confusing.
  
  Was just thinking that it might be easier that way if we need e820
  entries for hotplug memory address space _anyway_.
 I don't think that we need e820 entries for hotplug memory reserved space as
 e820 should. In case present at boot hotpluggable DIMMs would be needed in 
 E820,
 we can add them as usual E820_RAM entries.
 
 But regardless of what we do here it might be good keep option of adding non
 standard entries in future, by filtering out unknown types in SeaBIOS.
 
  
   I'd prefer having a dedicated interface for it as a more clean solution.
  
  Agree.
 
 So back to naming question, would you agree to renaming fw_cfg to the last
 Michael's suggestion reserved-memory-end?

Fine with me.

cheers,
  Gerd






Re: [Qemu-devel] [Bug 1247796] [NEW] USB Passthrough not working for Windows 7 Embedded guest

2013-11-04 Thread Serge Hallyn
*** This bug is a duplicate of bug 685096 ***
https://bugs.launchpad.net/bugs/685096

 duplicate: 685096


** This bug has been marked a duplicate of bug 685096
   USB Passthrough not working for Windows 7 guest

** This bug has been marked a duplicate of bug 685096
   USB Passthrough not working for Windows 7 guest

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1247796

Title:
  USB Passthrough not working for Windows 7 Embedded guest

Status in QEMU:
  New

Bug description:
  USB Passthrough from host to guest is not working for a 32-bit Windows
  7 guest.

  The device appears in the device manager of Windows 7, but with Error
  code 10: device cannot start. I have tried this with numerous USB
  thumbdrives and a USB wireless NIC, all with the same result. The
  device name and functionality is recognized, so at least some USB
  negotiation is taking place.

  qemu-system-x86_64 -machine accel=kvm:tcg -name VRTP1_win -S -M pc-
  i440fx-1.4 -cpu SandyBridge -m 3072 -smp 2,sockets=1,cores=2,threads=1
  -uuid 8ee5add7-f1a9-d697-9c18-2c1b4967c00e -no-user-config -nodefaults
  -chardev
  
socket,id=charmonitor,path=/var/lib/libvirt/qemu/VRTP1_win.monitor,server,nowait
  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
  -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
  -device ahci,id=ahci0,bus=pci.0,addr=0x6 -drive
  file=/var/lib/libvirt/images/VN8912_Development_0.9.2.bin,if=none,id
  =drive-sata0-0-0,format=raw -device ide-hd,bus=ahci0.0,drive=drive-
  sata0-0-0,id=sata0-0-0,bootindex=1 -netdev
  tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-
  pci,netdev=hostnet0,id=net0,mac=52:54:00:71:f5:45,bus=pci.0,addr=0x3
  -chardev pty,id=charserial0 -device isa-
  serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0
  -vnc 127.0.0.1:0 -vga std -device intel-
  hda,id=sound0,bus=pci.0,addr=0x4 -device hda-
  duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device usb-
  host,hostbus=3,hostaddr=18,id=hostdev0 -device virtio-balloon-
  pci,id=balloon0,bus=pci.0,addr=0x5

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1247796/+subscriptions



Re: [Qemu-devel] [PATCH uq/master] KVM: x86: fix typo in KVM_GET_XCRS

2013-11-04 Thread Paolo Bonzini
Il 17/10/2013 16:47, Paolo Bonzini ha scritto:
 Only the first item of the array was ever looked at.  No
 practical effect, but still worth fixing.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---
  target-i386/kvm.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 749aa09..27071e3 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -1314,8 +1314,8 @@ static int kvm_get_xcrs(X86CPU *cpu)
  
  for (i = 0; i  xcrs.nr_xcrs; i++) {
  /* Only support xcr0 now */
 -if (xcrs.xcrs[0].xcr == 0) {
 -env-xcr0 = xcrs.xcrs[0].value;
 +if (xcrs.xcrs[i].xcr == 0) {
 +env-xcr0 = xcrs.xcrs[i].value;
  break;
  }
  }
 

Ping?




Re: [Qemu-devel] [PATCH v3] net: Adding netmap network backend

2013-11-04 Thread Stefan Hajnoczi
On Tue, Oct 29, 2013 at 11:12:25AM +0100, Vincenzo Maffione wrote:

Looks pretty good, I think we can merge the next revision.

 This patch only contains the simplest netmap backend for QEMU.
 In particular, this backend implementation is still not
 able to make use of batching on the TX side (frontend - backend),
 which is where most of the TX performance gain comes from.
 As you can see from the code, there is an ioctl(NIOCTXSYNC) for each
 packet, instead of an ioctl(NIOCTXSYNC) for a batch of packets.
 In order to make TX batching possible, we would need to do some
 modifications to the generic net/net.c code, adding to the
 frontend/backend datapath interface a way to send a batch (this can
 be done using a QEMU_NET_PACKET_FLAG_MORE, without changing too
 much the existing interface).
 We will propose these features in future patches.

QEMU_NET_PACKET_FLAG_MORE sounds like a good idea.

 +#include net/net.h
 +#include clients.h
 +#include sysemu/sysemu.h
 +#include qemu/error-report.h
 +
 +#include sys/ioctl.h
 +#include net/if.h
 +#include sys/mman.h
 +#include net/netmap.h
 +#include net/netmap_user.h
 +#include qemu/iov.h

Please include system headers first, then QEMU headers.  This way QEMU
headers cannot interfere with system headers if they define macros.

Also, qemu/iov.h should be qemu/iov.h:

#include sys/ioctl.h
#include net/if.h
#include sys/mman.h
#include net/netmap.h
#include net/netmap_user.h

#include net/net.h
#include clients.h
#include sysemu/sysemu.h
#include qemu/error-report.h
#include qemu/iov.h

 +/*
 + * Open a netmap device. We assume there is only one queue
 + * (which is the case for the VALE bridge).
 + */
 +static int netmap_open(NetmapPriv *me)
 +{
 +int fd;
 +int err;
 +size_t l;
 +struct nmreq req;
 +
 +me-fd = fd = open(me-fdname, O_RDWR);
 +if (fd  0) {
 +error_report(Unable to open netmap device '%s', me-fdname);
 +return -1;

This error message is not detailed enough, please include the errstr(3).

 +}
 +bzero(req, sizeof(req));

Please use memset(req, 0, sizeof(req)).  Some compilers/tools warn that
bzero() is deprecated.  For more info, see:
http://pubs.opengroup.org/onlinepubs/009695399/functions/bzero.html

 +pstrcpy(req.nr_name, sizeof(req.nr_name), me-ifname);
 +req.nr_ringid = NETMAP_NO_TX_POLL;
 +req.nr_version = NETMAP_API;
 +err = ioctl(fd, NIOCREGIF, req);
 +if (err) {
 +error_report(Unable to register %s, me-ifname);

Lacking errno information here too.

 +goto error;
 +}
 +l = me-memsize = req.nr_memsize;
 +
 +me-mem = mmap(0, l, PROT_WRITE | PROT_READ, MAP_SHARED, fd, 0);
 +if (me-mem == MAP_FAILED) {
 +error_report(Unable to mmap);

Lacking errno information here too.

 +static ssize_t netmap_receive_iov(NetClientState *nc,
 +const struct iovec *iov, int iovcnt)
 +{
 +NetmapState *s = DO_UPCAST(NetmapState, nc, nc);
 +struct netmap_ring *ring = s-me.tx;
 +
 +if (ring) {

Feel free to reduce indentation by returning early:

if (!ring) {
return iov_size(iov, iovcnt); /* drop */
}

 +uint32_t i = 0;
 +uint32_t idx;
 +uint8_t *dst;
 +int j;
 +uint32_t cur = ring-cur;
 +uint32_t avail = ring-avail;
 +int iov_frag_size;
 +int nm_frag_size;
 +int offset;
 +
 +if (avail  iovcnt) {
 +/* Not enough netmap slots. */
 +netmap_write_poll(s, true);
 +return 0;
 +}
 +
 +for (j = 0; j  iovcnt; j++) {
 +iov_frag_size = iov[j].iov_len;
 +offset = 0;
 +
 +/* Split each iovec fragment over more netmap slots, if
 +   necessary (without performing data copy). */

I don't understand this comment, we always perform data copy.

 +while (iov_frag_size) {
 +nm_frag_size = MIN(iov_frag_size, ring-nr_buf_size);
 +
 +if (unlikely(avail == 0)) {
 +/* We run out of netmap slots while splitting the
 +   iovec fragments. */
 +return 0;

netmap_write_poll(s, true) missing?



Re: [Qemu-devel] [PATCH v5 0/5] Fix UST backend for LTTng 2.x

2013-11-04 Thread Alex Bennée

mohamad.ge...@gmail.com writes:

 Version 5

 * Use pkg-config for lttng-ust and urcu-bp libraries in configure (hard-coded
   libraries as a fallback)
 * s/Qemu/QEMU in docs/tracing.txt
 * Better dependencies for ust-generated files in trace/Makefile.obj
snip

Ping Stefan.

Any more need doing to this or can it be merged into your maintainer tree?

-- 
Alex Bennée




Re: [Qemu-devel] [PATCH v3 1/2] linux-user: create target_structsheader to place ipc_perm and shmid_dss

2013-11-04 Thread Alex Bennée

petar.jovano...@rt-rk.com writes:

 From: Petar Jovanovic petar.jovano...@imgtec.com

 Creating target_structs header in linux-user/$arch/ and making
 target_ipc_perm and target_shmid_ds its first inhabitants.
 The struct defintions may/should be further fine-tuned by arch maintainers.

 Signed-off-by: Petar Jovanovic petar.jovano...@imgtec.com
 ---
 v3:
 - add GNU licence to the new header files.

 v2:
 - target_struct headers have been created and the patch has been split into
   two separate patches.

  linux-user/aarch64/target_structs.h|   58 
  linux-user/alpha/target_structs.h  |   48 
  linux-user/arm/target_structs.h|   52 ++
  linux-user/cris/target_structs.h   |   58 
  linux-user/i386/target_structs.h   |   58 
  linux-user/m68k/target_structs.h   |   58 
  linux-user/microblaze/target_structs.h |   58 
  linux-user/mips/target_structs.h   |   48 
  linux-user/mips64/target_cpu.h |   18 
  linux-user/mips64/target_structs.h |2 +
  linux-user/openrisc/target_structs.h   |   58 
  linux-user/ppc/target_structs.h|   60 +
  linux-user/qemu.h  |1 +
  linux-user/s390x/target_structs.h  |   63 ++
  linux-user/sh4/target_structs.h|   58 
  linux-user/sparc/target_structs.h  |   63 ++
  linux-user/sparc64/target_structs.h|   58 
  linux-user/syscall.c   |   76 
 
  linux-user/unicore32/target_structs.h  |   58 
  linux-user/x86_64/target_structs.h |   58 
  20 files changed, 963 insertions(+), 48 deletions(-)
snip

There is an awful lot of similarity between a lot of the structures
while not being totally identical. Given the syscall munging is common
is there not an argument for having a common header for this case?

-- 
Alex Bennée




Re: [Qemu-devel] [patch] Fix FreeBSD compilation block/raw-posix.c

2013-11-04 Thread Stefan Hajnoczi
On Thu, Oct 31, 2013 at 10:41:46PM +0100, Andreas Tobler wrote:
 Hi all,
 
 the below patch is needed to compile qemu trunk on FreeBSD with gcc48,
 clang will fail ;). Host x84_64-freebsd.
 
 Installation will fail due to hardcoded /usr/bin/python in
 scripts/acpi_extract.py
 
 Thanks,
 Andreas
 
 Signed-off-by: Andreas Tobler do-I-put-my-address-here?
 

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan



Re: [Qemu-devel] [PATCH 1/2] kvm/apic: use QOM style

2013-11-04 Thread Andreas Färber
Hi,

Am 04.11.2013 07:41, schrieb 赵小强:
 于 09/27/2013 01:30 PM, xiaoqiang zhao 写道:
 From: xiaoqiang.zhao zxq_yx_...@163.com

 Change apic and kvm/apic to use QOM interface.

 Includes:
 1. APICCommonState now use QOM realizefn
 2. Remove DO_UPCAST() for APICCommonState
 3. Use type constant
 4. Change DeviceState pointers from 'd' to 'dev', sounds better?

 Signed-off-by: xiaoqiang zhao zxq_yx_...@163.com
 ---
   hw/i386/kvm/apic.c  |   40 ++
   hw/intc/apic.c  |   70
 +--
   hw/intc/apic_common.c   |   70
 +++
   include/hw/i386/apic_internal.h |1 -
   4 files changed, 113 insertions(+), 68 deletions(-)
[...]
 ping.
 patch link:
 http://patchwork.ozlabs.org/patch/278481
 http://patchwork.ozlabs.org/patch/278476

Thanks for reminding. A couple of issues with these two patches:

When sending more than one patch, please include a small cover letter.
That gives a better overview and gives you a central place to ping or us
to provide review comments covering both patches.

The DO_UPCAST() is a good cleanup, please put it in a separate patch to
decouple it from the rest.

As for d vs. dev, I've usually only changed it for the init/realize
functions I touched or when there was a new name conflict between
generic and specific device. But I'm fine with all your d changes. Just
please put them in a cleanup patch together with DO_UPCAST(), same for
the register_types prefix, then the actual dangerous patch becomes
smaller and thus easier to review for us. (Instead of 2 you'd have 4.)

The main issue though is that Anthony has requested to change the
pattern for overriding virtual methods: Can you please update the patch
so that void (*init)(APICCommonState *dev); simply gets replaced by
DeviceRealize realize;? I.e. drop the two new ...Class'es and
parent_realize, leave the call logic as it is now and just update the
function signature. Same for the ioapic.

For the second patch, I'd appreciate a second reviewer for the static
variable being refactored.
Note that you can pull the DEVICE(s) - dev conversion into a preceding
cleanup patch by introducing a local DeviceState *dev = DEVICE(s).
Also there's some minor whitespace issues in ioapic_common_realize()
that we'll need to fix.

Thanks in advance and sorry for the delay,

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



  1   2   >