Re: [Qemu-devel] [PATCH] PPC: Fail configure when libfdt is not available

2011-10-18 Thread Paolo Bonzini

On 10/18/2011 02:18 AM, Alexander Graf wrote:

We have several targets in the PPC tree now that basically require libfdt
to function properly, namely the pseries and the e500 targets. This dependency
will rather increase than decrease in the future, so I want to make sure
that people building shiny new 1.0 actually have libfdt installed to get
rid of a few ifdefs in the code.

Warning: This patch will likely make configure fail for people who don't
select their own --target-list, but don't have libfdt development packages
installed. However, we really need this new dependency to move on.

Signed-off-by: Alexander Grafag...@suse.de


F15 has them, and they install also on older Fedoras if you pick them 
from Koji.


Acked-by: Paolo Bonzini pbonz...@redhat.com

Paolo




Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support

2011-10-18 Thread Zhi Yong Wu
On Mon, Oct 17, 2011 at 6:17 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 On 10/17/2011 12:17 PM, Kevin Wolf wrote:

 +
 +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
 +{
 +    int ret;
 +    BlockDriverAIOCB *res;
 +
 +    res = request-handler(request-common.bs,
request-sector_num,
 +                           request-qiov, request-nb_sectors,
 +                           qemu_block_queue_callback, request);
 +    if (res) {
 +        request-real_acb = res;
 +    }
 +
 +    ret = (res == NULL) ? 0 : 1;
 +
 +    return ret;
  
    You mean return (res != NULL); and want to have bool as the return
   value
    of this function.
 
   Yeah, thanks. i will modify as below:
   ret = (res == NULL) ? false : true;

 ret = (res != NULL) is really more readable.

 return (res != NULL); is even nicer! :)
Great, thanks

 Paolo

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html




-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Wen Congyang
Hi, Jan Kiszka

At 10/10/2011 05:34 PM, Jan Kiszka Write:
 On 2011-10-10 11:02, Daniel P. Berrange wrote:
 On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:

 
 Run gdb with set debug remote 1 and watch the communication, it is not
 that complex. But a dump command is probably simpler for those
 scenarios, I agree.

I have implemented the command dump and reuse migration's code. But I meet a 
problem
when I test it.

My qemu-kvm's tree is not updated about 2 months ago, because kernel.org is 
down, and
I forgot to pull from github.

After I pull it from github, I find the following changes:
@@ -1523,9 +1523,7 @@ static void 
assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
 
 static const VMStateDescription vmstate_assigned_device = {
 .name = pci-assign,
-.fields = (VMStateField []) {
-VMSTATE_END_OF_LIST()
-}
+.unmigratable = 1,
 };
 
 static void reset_assigned_device(DeviceState *dev)

Why do you remove fields from vmstate_assigned_device?
It is useful for dump because it does not check unmigratable. If 
vmstate_assigned_device
does not contain .fields, qemu will crash in vmstate_save_state().

Thanks
Wen Congyang

 
 Jan
 




[Qemu-devel] [Bug 877155] [NEW] VT-D NIC doesn't work in windows guest

2011-10-18 Thread Yongjie Ren
Public bug reported:

[This should be a kvm kernel bug. I met this issue with quemu and kvm for 
virtualization. For https://bugzilla.kernel.org is not available yet, I just 
file this bug here just for tracking. ]
Environment:

Host OS (ia32/ia32e/IA64):All
Guest OS (ia32/ia32e/IA64):ia32e
Guest OS Type (Linux/Windows):Linux
kvm.git Commit:211978fb6c666f824ae8221baea7ee28efc1
qemu.git Commit:edbb7c0de56692868e6126c7ff7e8bf917f8d0e7
Host Kernel Version:3.1.0-rc9+
Hardware: Westmere-EP  SandyBridge platform

Bug detailed description:
--
Using qemu and kvm upstream, VT-D NIC and SR-IOV VF don't work in Windows 
guest.  I tried Windows XP, Vista, Win7, Win2k8 as a guest. And I used 82572 
NIC  and 82576 Virtual Function to assign to windows guest, and got the same 
result. 

Reproduce steps:

1.start a Windows guest with a NIC assigned: 
qemu-system-x86_64 -m 1024 -smp 2 -device pci-assign,host=08:00.0 -net none 
-hda Windows.img
2.check network in the Windows guest. (You'll find the assigned NIC doesn't 
work in guest.)

Current result:

NIC doesn't work in Windows guest

Expected result:

NIC will work.

** Affects: qemu
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/877155

Title:
  VT-D NIC doesn't work in windows guest

Status in QEMU:
  New

Bug description:
  [This should be a kvm kernel bug. I met this issue with quemu and kvm for 
virtualization. For https://bugzilla.kernel.org is not available yet, I just 
file this bug here just for tracking. ]
  Environment:
  
  Host OS (ia32/ia32e/IA64):All
  Guest OS (ia32/ia32e/IA64):ia32e
  Guest OS Type (Linux/Windows):Linux
  kvm.git Commit:211978fb6c666f824ae8221baea7ee28efc1
  qemu.git Commit:edbb7c0de56692868e6126c7ff7e8bf917f8d0e7
  Host Kernel Version:3.1.0-rc9+
  Hardware: Westmere-EP  SandyBridge platform

  Bug detailed description:
  --
  Using qemu and kvm upstream, VT-D NIC and SR-IOV VF don't work in Windows 
guest.  I tried Windows XP, Vista, Win7, Win2k8 as a guest. And I used 82572 
NIC  and 82576 Virtual Function to assign to windows guest, and got the same 
result. 

  Reproduce steps:
  
  1.start a Windows guest with a NIC assigned: 
  qemu-system-x86_64 -m 1024 -smp 2 -device pci-assign,host=08:00.0 -net none 
-hda Windows.img
  2.check network in the Windows guest. (You'll find the assigned NIC doesn't 
work in guest.)

  Current result:
  
  NIC doesn't work in Windows guest

  Expected result:
  
  NIC will work.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/877155/+subscriptions



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 09:15, Wen Congyang wrote:
 Hi, Jan Kiszka
 
 At 10/10/2011 05:34 PM, Jan Kiszka Write:
 On 2011-10-10 11:02, Daniel P. Berrange wrote:
 On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:
 

 Run gdb with set debug remote 1 and watch the communication, it is not
 that complex. But a dump command is probably simpler for those
 scenarios, I agree.
 
 I have implemented the command dump and reuse migration's code. But I meet a 
 problem
 when I test it.

Using migration code for dump is most probably the wrong approach as you
saw through that conflict. All you need are the register states and the
RAM. Reuse gdbstub services for this.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Wen Congyang
At 10/18/2011 03:52 PM, Jan Kiszka Write:
 On 2011-10-18 09:15, Wen Congyang wrote:
 Hi, Jan Kiszka

 At 10/10/2011 05:34 PM, Jan Kiszka Write:
 On 2011-10-10 11:02, Daniel P. Berrange wrote:
 On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:


 Run gdb with set debug remote 1 and watch the communication, it is not
 that complex. But a dump command is probably simpler for those
 scenarios, I agree.

 I have implemented the command dump and reuse migration's code. But I meet a 
 problem
 when I test it.
 
 Using migration code for dump is most probably the wrong approach as you
 saw through that conflict. All you need are the register states and the
 RAM. Reuse gdbstub services for this.

Hmm, if the migration code can not be reused, I think we should define a new
qemu's vmcore format, and add some codes into crash to support such format.

I will read gdbstub services's code now.

Thanks
Wen Congyang

 
 Jan
 




Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Daniel P. Berrange
On Tue, Oct 18, 2011 at 03:15:29PM +0800, Wen Congyang wrote:
 Hi, Jan Kiszka
 
 At 10/10/2011 05:34 PM, Jan Kiszka Write:
  On 2011-10-10 11:02, Daniel P. Berrange wrote:
  On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:
 
  
  Run gdb with set debug remote 1 and watch the communication, it is not
  that complex. But a dump command is probably simpler for those
  scenarios, I agree.
 
 I have implemented the command dump and reuse migration's code. But I meet a 
 problem
 when I test it.
 
 My qemu-kvm's tree is not updated about 2 months ago, because kernel.org is 
 down, and
 I forgot to pull from github.
 
 After I pull it from github, I find the following changes:
 @@ -1523,9 +1523,7 @@ static void 
 assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
  
  static const VMStateDescription vmstate_assigned_device = {
  .name = pci-assign,
 -.fields = (VMStateField []) {
 -VMSTATE_END_OF_LIST()
 -}
 +.unmigratable = 1,
  };
  
  static void reset_assigned_device(DeviceState *dev)
 
 Why do you remove fields from vmstate_assigned_device?
 It is useful for dump because it does not check unmigratable. If 
 vmstate_assigned_device
 does not contain .fields, qemu will crash in vmstate_save_state().

Given that '.fields' is allowed to be NULL for some devices, I'd say
even for normal migration, QEMU should be checking for NULL in the
vmstate_save_state() code. This would prevent QEMU crashes in the case
where someone removed the .unmigratable member, but forgot to add back
a .fields member.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] GPLv3 troubles

2011-10-18 Thread Markus Armbruster
Avi Kivity a...@redhat.com writes:

 On 10/17/2011 07:46 PM, Stefan Weil wrote:

 So let's start. For any of my contributions, I agree to GPL v2 or later.
 Later generations should have the possibility to replace GPL v2 by
 something which matches future requirements.

 I expect Red Hat contributions can be relicensed to v2+ as well.

Plenty of precedence for that.



Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support

2011-10-18 Thread Zhi Yong Wu
On Mon, Oct 17, 2011 at 6:17 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 26.09.2011 10:01, schrieb Zhi Yong Wu:
 On Fri, Sep 23, 2011 at 11:32 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  Makefile.objs     |    2 +-
  block/blk-queue.c |  201 
 +
  block/blk-queue.h |   59 
  block_int.h       |   27 +++
  4 files changed, 288 insertions(+), 1 deletions(-)
  create mode 100644 block/blk-queue.c
  create mode 100644 block/blk-queue.h

 diff --git a/Makefile.objs b/Makefile.objs
 index 26b885b..5dcf456 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o 
 cloop.o dmg.o bochs.o vpc.o vv
  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o 
 qcow2-snapshot.o qcow2-cache.o
  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o 
 qed-cluster.o
  block-nested-y += qed-check.o
 -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o 
 blk-queue.o
  block-nested-$(CONFIG_WIN32) += raw-win32.o
  block-nested-$(CONFIG_POSIX) += raw-posix.o
  block-nested-$(CONFIG_CURL) += curl.o
 diff --git a/block/blk-queue.c b/block/blk-queue.c
 new file mode 100644
 index 000..adef497
 --- /dev/null
 +++ b/block/blk-queue.c
 @@ -0,0 +1,201 @@
 +/*
 + * QEMU System Emulator queue definition for block layer
 + *
 + * Copyright (c) IBM, Corp. 2011
 + *
 + * Authors:
 + *  Zhi Yong Wu  wu...@linux.vnet.ibm.com
 + *  Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining 
 a copy
 + * of this software and associated documentation files (the Software), 
 to deal
 + * in the Software without restriction, including without limitation the 
 rights
 + * to use, copy, modify, merge, publish, distribute, sublicense, and/or 
 sell
 + * copies of the Software, and to permit persons to whom the Software is
 + * furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice shall be 
 included in
 + * all copies or substantial portions of the Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, 
 EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
 MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
 OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
 ARISING FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS 
 IN
 + * THE SOFTWARE.
 + */
 +
 +#include block_int.h
 +#include block/blk-queue.h
 +#include qemu-common.h
 +
 +/* The APIs for block request queue on qemu block layer.
 + */
 +
 +struct BlockQueueAIOCB {
 +    BlockDriverAIOCB common;
 +    QTAILQ_ENTRY(BlockQueueAIOCB) entry;
 +    BlockRequestHandler *handler;
 +    BlockDriverAIOCB *real_acb;
 +
 +    int64_t sector_num;
 +    QEMUIOVector *qiov;
 +    int nb_sectors;
 +};

 The idea is that each request is first queued on the QTAILQ, and at some
 point it's removed from the queue and gets a real_acb. But it never has
 both at the same time. Correct?
 NO. if block I/O throttling is enabled and I/O rate at runtime exceed
 this limits, this request will be enqueued.
 It represents the whole lifecycle of one enqueued request.

 What are the conditions under which the request will still be enqueued,
 but has a real_acb at the same time?
When the request is not serviced and still need to be enqueued, one
real_acb will be existable at the same time.
thanks for your good catch.:)
When one request will be mallocated at the first time, we should save
the got acb to real_acb
qemu_block_queue_enqueue():
  request-real_acb  = acb;


 +
 +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
 +
 +struct BlockQueue {
 +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
 +    bool req_failed;
 +    bool flushing;
 +};

 I find req_failed pretty confusing. Needs documentation at least, but
 most probably also a better name.
 OK. request_has_failed?

 No, that doesn't describe what it's really doing.

 You set req_failed = true by default and then on some obscure condition
 clear it or not. It's tracking something, but I'm not sure what meaning
 it has during the whole process.
In qemu_block_queue_flush,
When bdrv_aio_readv/writev return NULL, if req_failed is changed to
false, it indicates that the request exceeds the limits again; if
req_failed is still true, it indicates that one I/O error takes place,
at the monent, qemu_block_queue_callback(request, -EIO) need to be
called to report this to upper layer.


 +
 +static void qemu_block_queue_dequeue(BlockQueue *queue,
 +                                     

Re: [Qemu-devel] [PATCH v8 2/4] block: add the command line support

2011-10-18 Thread Zhi Yong Wu
On Mon, Oct 17, 2011 at 6:19 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 26.09.2011 08:15, schrieb Zhi Yong Wu:
 On Fri, Sep 23, 2011 at 11:54 PM, Kevin Wolf kw...@redhat.com wrote:
 +}
 +
 +static void bdrv_block_timer(void *opaque)
 +{
 +    BlockDriverState *bs = opaque;
 +    BlockQueue *queue    = bs-block_queue;
 +
 +    qemu_block_queue_flush(queue);

 Hm, didn't really notice it while reading patch 1, but
 qemu_block_queue_flush() is misleading. It's really something like
 Why do you say this is misleading?
 qemu_block_queue_submit().
 Right. It will resubmit all enqueued I/O requests.

 For me, flush sounds as if it waits for completion of all requests.
The code is currently except one I/O error. But i think that we should
not take the action, right? I don't make sure if we should keep all
the enqueued request in order.


 Kevin




-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 09:58, Daniel P. Berrange wrote:
 On Tue, Oct 18, 2011 at 03:15:29PM +0800, Wen Congyang wrote:
 Hi, Jan Kiszka

 At 10/10/2011 05:34 PM, Jan Kiszka Write:
 On 2011-10-10 11:02, Daniel P. Berrange wrote:
 On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:


 Run gdb with set debug remote 1 and watch the communication, it is not
 that complex. But a dump command is probably simpler for those
 scenarios, I agree.

 I have implemented the command dump and reuse migration's code. But I meet a 
 problem
 when I test it.

 My qemu-kvm's tree is not updated about 2 months ago, because kernel.org is 
 down, and
 I forgot to pull from github.

 After I pull it from github, I find the following changes:
 @@ -1523,9 +1523,7 @@ static void 
 assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
  
  static const VMStateDescription vmstate_assigned_device = {
  .name = pci-assign,
 -.fields = (VMStateField []) {
 -VMSTATE_END_OF_LIST()
 -}
 +.unmigratable = 1,
  };
  
  static void reset_assigned_device(DeviceState *dev)

 Why do you remove fields from vmstate_assigned_device?
 It is useful for dump because it does not check unmigratable. If 
 vmstate_assigned_device
 does not contain .fields, qemu will crash in vmstate_save_state().
 
 Given that '.fields' is allowed to be NULL for some devices, I'd say
 even for normal migration, QEMU should be checking for NULL in the
 vmstate_save_state() code. This would prevent QEMU crashes in the case
 where someone removed the .unmigratable member, but forgot to add back
 a .fields member.

That crash wouldn't be bad because removinb unmigratable without adding
proper fields is almost always a bug.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 09:58, Wen Congyang wrote:
 At 10/18/2011 03:52 PM, Jan Kiszka Write:
 On 2011-10-18 09:15, Wen Congyang wrote:
 Hi, Jan Kiszka

 At 10/10/2011 05:34 PM, Jan Kiszka Write:
 On 2011-10-10 11:02, Daniel P. Berrange wrote:
 On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:


 Run gdb with set debug remote 1 and watch the communication, it is not
 that complex. But a dump command is probably simpler for those
 scenarios, I agree.

 I have implemented the command dump and reuse migration's code. But I meet 
 a problem
 when I test it.

 Using migration code for dump is most probably the wrong approach as you
 saw through that conflict. All you need are the register states and the
 RAM. Reuse gdbstub services for this.
 
 Hmm, if the migration code can not be reused, I think we should define a new
 qemu's vmcore format, and add some codes into crash to support such format.

Please try to avoid defining something new. Unless there is a striking
reason, standard gdb core files should be generated so that you can load
the dump directly into gdb for analysis.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



[Qemu-devel] hw/nand hw/onenand and read-only

2011-10-18 Thread Markus Armbruster
These guys have been converted to qdev relatively recently.

I wonder what happens when they use a drive defined with -drive
if=none,readonly.

If that's not a valid configuration, we better reject read-only drives,
like ide_init_drive() does.



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Wen Congyang
At 10/18/2011 04:19 PM, Jan Kiszka Write:
 On 2011-10-18 09:58, Wen Congyang wrote:
 At 10/18/2011 03:52 PM, Jan Kiszka Write:
 On 2011-10-18 09:15, Wen Congyang wrote:
 Hi, Jan Kiszka

 At 10/10/2011 05:34 PM, Jan Kiszka Write:
 On 2011-10-10 11:02, Daniel P. Berrange wrote:
 On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:


 Run gdb with set debug remote 1 and watch the communication, it is not
 that complex. But a dump command is probably simpler for those
 scenarios, I agree.

 I have implemented the command dump and reuse migration's code. But I meet 
 a problem
 when I test it.

 Using migration code for dump is most probably the wrong approach as you
 saw through that conflict. All you need are the register states and the
 RAM. Reuse gdbstub services for this.

 Hmm, if the migration code can not be reused, I think we should define a new
 qemu's vmcore format, and add some codes into crash to support such format.
 
 Please try to avoid defining something new. Unless there is a striking
 reason, standard gdb core files should be generated so that you can load
 the dump directly into gdb for analysis.

I am not sure whehter the standard gdb core files can not be analyzed by crash.
If not, I think we should define something new because it's easier to use
crash than gdb to analyze the core files.

Thanks
Wen Congyang

 
 Jan
 




Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 10:17, Jan Kiszka wrote:
 On 2011-10-18 09:58, Daniel P. Berrange wrote:
 On Tue, Oct 18, 2011 at 03:15:29PM +0800, Wen Congyang wrote:
 Hi, Jan Kiszka

 At 10/10/2011 05:34 PM, Jan Kiszka Write:
 On 2011-10-10 11:02, Daniel P. Berrange wrote:
 On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:


 Run gdb with set debug remote 1 and watch the communication, it is not
 that complex. But a dump command is probably simpler for those
 scenarios, I agree.

 I have implemented the command dump and reuse migration's code. But I meet 
 a problem
 when I test it.

 My qemu-kvm's tree is not updated about 2 months ago, because kernel.org is 
 down, and
 I forgot to pull from github.

 After I pull it from github, I find the following changes:
 @@ -1523,9 +1523,7 @@ static void 
 assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
  
  static const VMStateDescription vmstate_assigned_device = {
  .name = pci-assign,
 -.fields = (VMStateField []) {
 -VMSTATE_END_OF_LIST()
 -}
 +.unmigratable = 1,
  };
  
  static void reset_assigned_device(DeviceState *dev)

 Why do you remove fields from vmstate_assigned_device?
 It is useful for dump because it does not check unmigratable. If 
 vmstate_assigned_device
 does not contain .fields, qemu will crash in vmstate_save_state().

 Given that '.fields' is allowed to be NULL for some devices, I'd say
 even for normal migration, QEMU should be checking for NULL in the
 vmstate_save_state() code. This would prevent QEMU crashes in the case
 where someone removed the .unmigratable member, but forgot to add back
 a .fields member.
 
 That crash wouldn't be bad because removinb unmigratable without adding
 proper fields is almost always a bug.

Err, s/almost//.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Daniel P. Berrange
On Tue, Oct 18, 2011 at 10:17:27AM +0200, Jan Kiszka wrote:
 On 2011-10-18 09:58, Daniel P. Berrange wrote:
  On Tue, Oct 18, 2011 at 03:15:29PM +0800, Wen Congyang wrote:
  Hi, Jan Kiszka
 
  At 10/10/2011 05:34 PM, Jan Kiszka Write:
  On 2011-10-10 11:02, Daniel P. Berrange wrote:
  On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:
 
 
  Run gdb with set debug remote 1 and watch the communication, it is not
  that complex. But a dump command is probably simpler for those
  scenarios, I agree.
 
  I have implemented the command dump and reuse migration's code. But I meet 
  a problem
  when I test it.
 
  My qemu-kvm's tree is not updated about 2 months ago, because kernel.org 
  is down, and
  I forgot to pull from github.
 
  After I pull it from github, I find the following changes:
  @@ -1523,9 +1523,7 @@ static void 
  assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
   
   static const VMStateDescription vmstate_assigned_device = {
   .name = pci-assign,
  -.fields = (VMStateField []) {
  -VMSTATE_END_OF_LIST()
  -}
  +.unmigratable = 1,
   };
   
   static void reset_assigned_device(DeviceState *dev)
 
  Why do you remove fields from vmstate_assigned_device?
  It is useful for dump because it does not check unmigratable. If 
  vmstate_assigned_device
  does not contain .fields, qemu will crash in vmstate_save_state().
  
  Given that '.fields' is allowed to be NULL for some devices, I'd say
  even for normal migration, QEMU should be checking for NULL in the
  vmstate_save_state() code. This would prevent QEMU crashes in the case
  where someone removed the .unmigratable member, but forgot to add back
  a .fields member.
 
 That crash wouldn't be bad because removinb unmigratable without adding
 proper fields is almost always a bug.

This is the kind of example of why QEMU is too crashy in general. We
should be defensive against such bugs, by checking that preconditions
are valid in the code, even if we don't expect them to happen often.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 10:25, Wen Congyang wrote:
 At 10/18/2011 04:19 PM, Jan Kiszka Write:
 On 2011-10-18 09:58, Wen Congyang wrote:
 At 10/18/2011 03:52 PM, Jan Kiszka Write:
 On 2011-10-18 09:15, Wen Congyang wrote:
 Hi, Jan Kiszka

 At 10/10/2011 05:34 PM, Jan Kiszka Write:
 On 2011-10-10 11:02, Daniel P. Berrange wrote:
 On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:


 Run gdb with set debug remote 1 and watch the communication, it is not
 that complex. But a dump command is probably simpler for those
 scenarios, I agree.

 I have implemented the command dump and reuse migration's code. But I 
 meet a problem
 when I test it.

 Using migration code for dump is most probably the wrong approach as you
 saw through that conflict. All you need are the register states and the
 RAM. Reuse gdbstub services for this.

 Hmm, if the migration code can not be reused, I think we should define a new
 qemu's vmcore format, and add some codes into crash to support such format.

 Please try to avoid defining something new. Unless there is a striking
 reason, standard gdb core files should be generated so that you can load
 the dump directly into gdb for analysis.
 
 I am not sure whehter the standard gdb core files can not be analyzed by 
 crash.
 If not, I think we should define something new because it's easier to use
 crash than gdb to analyze the core files.

gdb allows you to walk up the frame and print variables (globals 
local) etc.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm

2011-10-18 Thread Zhi Yong Wu
On Mon, Oct 17, 2011 at 11:54 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 On Mon, Oct 17, 2011 at 11:26 AM, Kevin Wolf kw...@redhat.com wrote:
 Am 26.09.2011 09:24, schrieb Zhi Yong Wu:
 On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf kw...@redhat.com wrote:
 Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
 Note:
      1.) When bps/iops limits are specified to a small value such as 511 
 bytes/s, this VM will hang up. We are considering how to handle this 
 senario.
      2.) When dd command is issued in guest, if its option bs is set to 
 a large value such as bs=1024K, the result speed will slightly bigger 
 than the limits.

 For these problems, if you have nice thought, pls let us know.:)

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  block.c |  259 
 ---
  block.h |    1 -
  2 files changed, 248 insertions(+), 12 deletions(-)

 One general comment: What about synchronous and/or coroutine I/O
 operations? Do you think they are just not important enough to consider
 here or were they forgotten?
 For sync ops, we assume that it will be converse into async mode at
 some point of future, right?
 For coroutine I/O, it is introduced in image driver layer, and behind
 bdrv_aio_readv/writev. I think that we need not consider them, right?

 Meanwhile the block layer has been changed to handle all requests in
 terms of coroutines. So you would best move your intercepting code into
 the coroutine functions.

 Some additional info: the advantage of handling all requests in
 coroutines is that there is now a single place where you can put I/O
 throttling.  It will work for bdrv_read(), bdrv_co_readv(), and
 bdrv_aio_readv().  There is no code duplication, just put the I/O
 throttling logic in bdrv_co_do_readv().
got it. thanks.

 Stefan




-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Wen Congyang
At 10/18/2011 04:26 PM, Jan Kiszka Write:
 On 2011-10-18 10:25, Wen Congyang wrote:
 At 10/18/2011 04:19 PM, Jan Kiszka Write:
 On 2011-10-18 09:58, Wen Congyang wrote:
 At 10/18/2011 03:52 PM, Jan Kiszka Write:
 On 2011-10-18 09:15, Wen Congyang wrote:
 Hi, Jan Kiszka

 At 10/10/2011 05:34 PM, Jan Kiszka Write:
 On 2011-10-10 11:02, Daniel P. Berrange wrote:
 On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:


 Run gdb with set debug remote 1 and watch the communication, it is not
 that complex. But a dump command is probably simpler for those
 scenarios, I agree.

 I have implemented the command dump and reuse migration's code. But I 
 meet a problem
 when I test it.

 Using migration code for dump is most probably the wrong approach as you
 saw through that conflict. All you need are the register states and the
 RAM. Reuse gdbstub services for this.

 Hmm, if the migration code can not be reused, I think we should define a 
 new
 qemu's vmcore format, and add some codes into crash to support such format.

 Please try to avoid defining something new. Unless there is a striking
 reason, standard gdb core files should be generated so that you can load
 the dump directly into gdb for analysis.

 I am not sure whehter the standard gdb core files can not be analyzed by 
 crash.
 If not, I think we should define something new because it's easier to use
 crash than gdb to analyze the core files.
 
 gdb allows you to walk up the frame and print variables (globals 
 local) etc.

Crash uses gdb to provide common function, and you can also use all the gdb 
commands
in crash.

Thanks
Wen Congyang

 
 Jan
 




Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 10:25, Daniel P. Berrange wrote:
 On Tue, Oct 18, 2011 at 10:17:27AM +0200, Jan Kiszka wrote:
 On 2011-10-18 09:58, Daniel P. Berrange wrote:
 On Tue, Oct 18, 2011 at 03:15:29PM +0800, Wen Congyang wrote:
 Hi, Jan Kiszka

 At 10/10/2011 05:34 PM, Jan Kiszka Write:
 On 2011-10-10 11:02, Daniel P. Berrange wrote:
 On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:


 Run gdb with set debug remote 1 and watch the communication, it is not
 that complex. But a dump command is probably simpler for those
 scenarios, I agree.

 I have implemented the command dump and reuse migration's code. But I meet 
 a problem
 when I test it.

 My qemu-kvm's tree is not updated about 2 months ago, because kernel.org 
 is down, and
 I forgot to pull from github.

 After I pull it from github, I find the following changes:
 @@ -1523,9 +1523,7 @@ static void 
 assigned_dev_unregister_msix_mmio(AssignedDevice *dev)
  
  static const VMStateDescription vmstate_assigned_device = {
  .name = pci-assign,
 -.fields = (VMStateField []) {
 -VMSTATE_END_OF_LIST()
 -}
 +.unmigratable = 1,
  };
  
  static void reset_assigned_device(DeviceState *dev)

 Why do you remove fields from vmstate_assigned_device?
 It is useful for dump because it does not check unmigratable. If 
 vmstate_assigned_device
 does not contain .fields, qemu will crash in vmstate_save_state().

 Given that '.fields' is allowed to be NULL for some devices, I'd say
 even for normal migration, QEMU should be checking for NULL in the
 vmstate_save_state() code. This would prevent QEMU crashes in the case
 where someone removed the .unmigratable member, but forgot to add back
 a .fields member.

 That crash wouldn't be bad because removinb unmigratable without adding
 proper fields is almost always a bug.
 
 This is the kind of example of why QEMU is too crashy in general. We
 should be defensive against such bugs, by checking that preconditions
 are valid in the code, even if we don't expect them to happen often.

A serious bug in a device model is tricky to handle gracefully. Given
that this case is fairly clear (specifically for the code reviewer), I'm
not sure what defensive measures you want to apply and what benefit they
would provide.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 10:31, Wen Congyang wrote:
 At 10/18/2011 04:26 PM, Jan Kiszka Write:
 On 2011-10-18 10:25, Wen Congyang wrote:
 At 10/18/2011 04:19 PM, Jan Kiszka Write:
 On 2011-10-18 09:58, Wen Congyang wrote:
 At 10/18/2011 03:52 PM, Jan Kiszka Write:
 On 2011-10-18 09:15, Wen Congyang wrote:
 Hi, Jan Kiszka

 At 10/10/2011 05:34 PM, Jan Kiszka Write:
 On 2011-10-10 11:02, Daniel P. Berrange wrote:
 On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:


 Run gdb with set debug remote 1 and watch the communication, it is 
 not
 that complex. But a dump command is probably simpler for those
 scenarios, I agree.

 I have implemented the command dump and reuse migration's code. But I 
 meet a problem
 when I test it.

 Using migration code for dump is most probably the wrong approach as you
 saw through that conflict. All you need are the register states and the
 RAM. Reuse gdbstub services for this.

 Hmm, if the migration code can not be reused, I think we should define a 
 new
 qemu's vmcore format, and add some codes into crash to support such 
 format.

 Please try to avoid defining something new. Unless there is a striking
 reason, standard gdb core files should be generated so that you can load
 the dump directly into gdb for analysis.

 I am not sure whehter the standard gdb core files can not be analyzed by 
 crash.
 If not, I think we should define something new because it's easier to use
 crash than gdb to analyze the core files.

 gdb allows you to walk up the frame and print variables (globals 
 local) etc.
 
 Crash uses gdb to provide common function, and you can also use all the gdb 
 commands
 in crash.

That what's the added value here when I can use gdb directly?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support

2011-10-18 Thread Kevin Wolf
Am 18.10.2011 10:07, schrieb Zhi Yong Wu:
 On Mon, Oct 17, 2011 at 6:17 PM, Kevin Wolf kw...@redhat.com wrote:
 +
 +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
 +
 +struct BlockQueue {
 +QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
 +bool req_failed;
 +bool flushing;
 +};

 I find req_failed pretty confusing. Needs documentation at least, but
 most probably also a better name.
 OK. request_has_failed?

 No, that doesn't describe what it's really doing.

 You set req_failed = true by default and then on some obscure condition
 clear it or not. It's tracking something, but I'm not sure what meaning
 it has during the whole process.
 In qemu_block_queue_flush,
 When bdrv_aio_readv/writev return NULL, if req_failed is changed to
 false, it indicates that the request exceeds the limits again; if
 req_failed is still true, it indicates that one I/O error takes place,
 at the monent, qemu_block_queue_callback(request, -EIO) need to be
 called to report this to upper layer.

Okay, this makes some more sense now.

How about reversing the logic and maintaining a limit_exceeded flag
instead of a req_failed? I think this would be clearer.

 +void qemu_del_block_queue(BlockQueue *queue)
 +{
 +BlockQueueAIOCB *request, *next;
 +
 +QTAILQ_FOREACH_SAFE(request, queue-requests, entry, next) {
 +QTAILQ_REMOVE(queue-requests, request, entry);
 +qemu_aio_release(request);
 +}
 +
 +g_free(queue);
 +}

 Can we be sure that no AIO requests are in flight that still use the now
 released AIOCB?
 Yeah, since qemu core code is serially performed, i think that when
 qemu_del_block_queue is performed, no requests are in flight. Right?

 Patch 2 has this code:

 +void bdrv_io_limits_disable(BlockDriverState *bs)
 +{
 +bs-io_limits_enabled = false;
 +
 +if (bs-block_queue) {
 +qemu_block_queue_flush(bs-block_queue);
 +qemu_del_block_queue(bs-block_queue);
 +bs-block_queue = NULL;
 +}

 Does this mean that you can't disable I/O limits while the VM is running?
 NO, you can even though VM is running.

Okay, in general qemu_block_queue_flush() empties the queue so that
there are no requests left that qemu_del_block_queue() could drop from
the queue. So in the common case it doesn't even enter the FOREACH loop.

I think problems start when requests have failed or exceeded the limit
again, then you have requests queued even after
qemu_block_queue_flush(). You must be aware of this, otherwise the code
in qemu_del_block_queue() wouldn't exist.

But you can't release the ACBs without having called their callback,
otherwise the caller would still assume that its ACB pointer is valid.
Maybe calling the callback before releasing the ACB would be enough.

However, for failed requests see below.


 +
 +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
 +BlockDriverState *bs,
 +BlockRequestHandler *handler,
 +int64_t sector_num,
 +QEMUIOVector *qiov,
 +int nb_sectors,
 +BlockDriverCompletionFunc *cb,
 +void *opaque)
 +{
 +BlockDriverAIOCB *acb;
 +BlockQueueAIOCB *request;
 +
 +if (queue-flushing) {
 +queue-req_failed = false;
 +return NULL;
 +} else {
 +acb = qemu_aio_get(block_queue_pool, bs,
 +   cb, opaque);
 +request = container_of(acb, BlockQueueAIOCB, common);
 +request-handler   = handler;
 +request-sector_num= sector_num;
 +request-qiov  = qiov;
 +request-nb_sectors= nb_sectors;
 +request-real_acb  = NULL;
 +QTAILQ_INSERT_TAIL(queue-requests, request, entry);
 +}
 +
 +return acb;
 +}
 +
 +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
 +{
 +int ret;
 +BlockDriverAIOCB *res;
 +
 +res = request-handler(request-common.bs, request-sector_num,
 +   request-qiov, request-nb_sectors,
 +   qemu_block_queue_callback, request);
 +if (res) {
 +request-real_acb = res;
 +}
 +
 +ret = (res == NULL) ? 0 : 1;
 +
 +return ret;

 You mean return (res != NULL); and want to have bool as the return value
 of this function.
 Yeah, thanks. i will modify as below:
 ret = (res == NULL) ? false : true;

 ret = (res != NULL) is really more readable.
 I have adopted Paolo's suggestion.
 

 and
 static bool qemu_block_queue_handler()


 +}
 +
 +void qemu_block_queue_flush(BlockQueue *queue)
 +{
 +queue-flushing = true;
 +while (!QTAILQ_EMPTY(queue-requests)) {
 +BlockQueueAIOCB *request = NULL;
 +int ret = 0;
 +
 +request = QTAILQ_FIRST(queue-requests);
 +QTAILQ_REMOVE(queue-requests, request, entry);
 +
 +queue-req_failed = true;
 +ret = qemu_block_queue_handler(request);
 +if (ret == 0) {
 +   

Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Richard W.M. Jones
On Tue, Oct 18, 2011 at 10:31:10AM +0200, Jan Kiszka wrote:
 On 2011-10-18 10:31, Wen Congyang wrote:
  At 10/18/2011 04:26 PM, Jan Kiszka Write:
  On 2011-10-18 10:25, Wen Congyang wrote:
  At 10/18/2011 04:19 PM, Jan Kiszka Write:
  On 2011-10-18 09:58, Wen Congyang wrote:
  At 10/18/2011 03:52 PM, Jan Kiszka Write:
  On 2011-10-18 09:15, Wen Congyang wrote:
  Hi, Jan Kiszka
 
  At 10/10/2011 05:34 PM, Jan Kiszka Write:
  On 2011-10-10 11:02, Daniel P. Berrange wrote:
  On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:
 
 
  Run gdb with set debug remote 1 and watch the communication, it is 
  not
  that complex. But a dump command is probably simpler for those
  scenarios, I agree.
 
  I have implemented the command dump and reuse migration's code. But I 
  meet a problem
  when I test it.
 
  Using migration code for dump is most probably the wrong approach as 
  you
  saw through that conflict. All you need are the register states and the
  RAM. Reuse gdbstub services for this.
 
  Hmm, if the migration code can not be reused, I think we should define 
  a new
  qemu's vmcore format, and add some codes into crash to support such 
  format.
 
  Please try to avoid defining something new. Unless there is a striking
  reason, standard gdb core files should be generated so that you can load
  the dump directly into gdb for analysis.
 
  I am not sure whehter the standard gdb core files can not be analyzed by 
  crash.
  If not, I think we should define something new because it's easier to use
  crash than gdb to analyze the core files.
 
  gdb allows you to walk up the frame and print variables (globals 
  local) etc.
  
  Crash uses gdb to provide common function, and you can also use all the gdb 
  commands
  in crash.
 
 That what's the added value here when I can use gdb directly?

Crash can decode kernel structures, providing for example process
listings and debugging into userspace processes.  Crash actually
reuses gdb's code too, so it's kind of a superset.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 10:34, Richard W.M. Jones wrote:
 On Tue, Oct 18, 2011 at 10:31:10AM +0200, Jan Kiszka wrote:
 On 2011-10-18 10:31, Wen Congyang wrote:
 At 10/18/2011 04:26 PM, Jan Kiszka Write:
 On 2011-10-18 10:25, Wen Congyang wrote:
 At 10/18/2011 04:19 PM, Jan Kiszka Write:
 On 2011-10-18 09:58, Wen Congyang wrote:
 At 10/18/2011 03:52 PM, Jan Kiszka Write:
 On 2011-10-18 09:15, Wen Congyang wrote:
 Hi, Jan Kiszka

 At 10/10/2011 05:34 PM, Jan Kiszka Write:
 On 2011-10-10 11:02, Daniel P. Berrange wrote:
 On Mon, Oct 10, 2011 at 08:52:08AM +0200, Jan Kiszka wrote:


 Run gdb with set debug remote 1 and watch the communication, it is 
 not
 that complex. But a dump command is probably simpler for those
 scenarios, I agree.

 I have implemented the command dump and reuse migration's code. But I 
 meet a problem
 when I test it.

 Using migration code for dump is most probably the wrong approach as 
 you
 saw through that conflict. All you need are the register states and the
 RAM. Reuse gdbstub services for this.

 Hmm, if the migration code can not be reused, I think we should define 
 a new
 qemu's vmcore format, and add some codes into crash to support such 
 format.

 Please try to avoid defining something new. Unless there is a striking
 reason, standard gdb core files should be generated so that you can load
 the dump directly into gdb for analysis.

 I am not sure whehter the standard gdb core files can not be analyzed by 
 crash.
 If not, I think we should define something new because it's easier to use
 crash than gdb to analyze the core files.

 gdb allows you to walk up the frame and print variables (globals 
 local) etc.

 Crash uses gdb to provide common function, and you can also use all the gdb 
 commands
 in crash.

 That what's the added value here when I can use gdb directly?
 
 Crash can decode kernel structures, providing for example process
 listings and debugging into userspace processes.  Crash actually
 reuses gdb's code too, so it's kind of a superset.

Yeah, I see. Could also be solved via gdb scripts, but crash is already
there.

But let's see if the formats actually differ. In the end, crash is just
parsing the same information that also gdb sees.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 10:36, Jan Kiszka wrote:
 Crash can decode kernel structures, providing for example process
 listings and debugging into userspace processes.  Crash actually
 reuses gdb's code too, so it's kind of a superset.
 
 Yeah, I see. Could also be solved via gdb scripts, but crash is already
 there.

[ BTW, crash is for the dead. But having those features in form of gdb
scripts would also allow live system analysis. Looks like it's worth
obsoleting crash on the long run. ]

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH v8 3/4] block: add block timer and throttling algorithm

2011-10-18 Thread Zhi Yong Wu
On Mon, Oct 17, 2011 at 6:26 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 26.09.2011 09:24, schrieb Zhi Yong Wu:
 On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf kw...@redhat.com wrote:
 Am 08.09.2011 12:11, schrieb Zhi Yong Wu:
 Note:
      1.) When bps/iops limits are specified to a small value such as 511 
 bytes/s, this VM will hang up. We are considering how to handle this 
 senario.
      2.) When dd command is issued in guest, if its option bs is set to 
 a large value such as bs=1024K, the result speed will slightly bigger 
 than the limits.

 For these problems, if you have nice thought, pls let us know.:)

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  block.c |  259 
 ---
  block.h |    1 -
  2 files changed, 248 insertions(+), 12 deletions(-)

 One general comment: What about synchronous and/or coroutine I/O
 operations? Do you think they are just not important enough to consider
 here or were they forgotten?
 For sync ops, we assume that it will be converse into async mode at
 some point of future, right?
 For coroutine I/O, it is introduced in image driver layer, and behind
 bdrv_aio_readv/writev. I think that we need not consider them, right?

 Meanwhile the block layer has been changed to handle all requests in
 terms of coroutines. So you would best move your intercepting code into
 the coroutine functions.
OK. I will.

 Also, do I understand correctly that you're always submitting the whole
 Right, when the block timer fire, it will flush whole request queue.
 queue at once? Does this effectively enforce the limit all the time or
 will it lead to some peaks and then no requests at all for a while until
 In fact, it only try to submit those enqueued request one by one. If
 fail to pass the limit, this request will be enqueued again.

 Right, I missed this. Makes sense.

 the average is right again?
 Yeah, it is possible. Do you better idea?

 Maybe some documentation on how it all works from a high level
 perspective would be helpful.

 +    /* throttling disk read I/O */
 +    if (bs-io_limits_enabled) {
 +        if (bdrv_exceed_io_limits(bs, nb_sectors, false, wait_time)) {
 +            ret = qemu_block_queue_enqueue(bs-block_queue, bs, 
 bdrv_aio_readv,
 +                           sector_num, qiov, nb_sectors, cb, opaque);
 +            printf(wait_time=%ld\n, wait_time);
 +            if (wait_time != -1) {
 +                printf(reset block timer\n);
 +                qemu_mod_timer(bs-block_timer,
 +                               wait_time + qemu_get_clock_ns(vm_clock));
 +            }
 +
 +            if (ret) {
 +                printf(ori ret is not null\n);
 +            } else {
 +                printf(ori ret is null\n);
 +            }
 +
 +            return ret;
 +        }
 +    }

 -    return drv-bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
 +    ret =  drv-bdrv_aio_readv(bs, sector_num, qiov, nb_sectors,
                                 cb, opaque);
 +    if (ret) {
 +        if (bs-io_limits_enabled) {
 +            bs-io_disps.bytes[BLOCK_IO_LIMIT_READ] +=
 +                              (unsigned) nb_sectors * BDRV_SECTOR_SIZE;
 +            bs-io_disps.ios[BLOCK_IO_LIMIT_READ]++;
 +        }

 I wonder if you can't reuse bs-nr_bytes/nr_ops instead of introducing a
 second counting mechanism. Would have the advantage that numbers are
 NO, our counting variables will be reset to ZERO if current slice
 time(0.1ms) is used up.

 Instead of setting the counter to zero you could remember the base value
 and calculate the difference when you need it. The advantage is that we
 can share infrastructure instead of introducing several subtly different
 ways of I/O accounting.
Good idea, let me try it.


 actually consistent (your metric counts slightly differently than the
 existing info blockstats one).
 Yeah, i notice this, and don't think there's wrong with it. and you?

 It's not really user friendly if a number that is called the same means
 this in one place and in another place that.
OK

 Kevin




-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] hw/nand hw/onenand and read-only

2011-10-18 Thread Juha.Riihimaki
On 18.10.11 11:23 , ext Markus Armbruster arm...@redhat.com wrote:

These guys have been converted to qdev relatively recently.

I wonder what happens when they use a drive defined with -drive
if=none,readonly.

If that's not a valid configuration, we better reject read-only drives,
like ide_init_drive() does.

I'm not an expert with QEMU command line options; how do you pass such a
drive to a NAND device? With if=mtd I get the following which I guess is
expected result:

$ qemu-system-arm -M beagle -drive if=mtd,file=nand.img,readonly
qemu: readonly flag not supported for drive with this interface


Regards,
Juha




Re: [Qemu-devel] [PATCH] PPC: Fail configure when libfdt is not available

2011-10-18 Thread Andreas Färber
Am 18.10.2011 02:18, schrieb Alexander Graf:
 We have several targets in the PPC tree now that basically require libfdt
 to function properly, namely the pseries and the e500 targets. This dependency
 will rather increase than decrease in the future, so I want to make sure
 that people building shiny new 1.0 actually have libfdt installed to get
 rid of a few ifdefs in the code.
 
 Warning: This patch will likely make configure fail for people who don't
 select their own --target-list, but don't have libfdt development packages
 installed. However, we really need this new dependency to move on.
 
 Signed-off-by: Alexander Graf ag...@suse.de

openSUSE 12.1 has libfdt1-devel, but you should set up a submodule and
working build rules for Darwin, Haiku, etc. `make` doesn't fully work so
I used custom scripts to build the right parts and to manually install
the resulting binary and headers.

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746, AG Nürnberg



Re: [Qemu-devel] [PATCH] PPC: Fail configure when libfdt is not available

2011-10-18 Thread Alexander Graf

Am 18.10.2011 um 10:55 schrieb Andreas Färber afaer...@suse.de:

 Am 18.10.2011 02:18, schrieb Alexander Graf:
 We have several targets in the PPC tree now that basically require libfdt
 to function properly, namely the pseries and the e500 targets. This 
 dependency
 will rather increase than decrease in the future, so I want to make sure
 that people building shiny new 1.0 actually have libfdt installed to get
 rid of a few ifdefs in the code.
 
 Warning: This patch will likely make configure fail for people who don't
 select their own --target-list, but don't have libfdt development packages
 installed. However, we really need this new dependency to move on.
 
 Signed-off-by: Alexander Graf ag...@suse.de
 
 openSUSE 12.1 has libfdt1-devel, but you should set up a submodule and
 working build rules for Darwin, Haiku, etc. `make` doesn't fully work so
 I used custom scripts to build the right parts and to manually install
 the resulting binary and headers.

I don't fully understand. It's a build dependency, so whoever maintains libfdt 
/ is interested in running ppc targets on those OSs needs to fix libfdt to 
build there.

It's really the same as a dependency on glib or sdl or ... :). It's just less 
well known (and less active as a project).


Alex

 



Re: [Qemu-devel] Patch for ui/cocoa.m

2011-10-18 Thread Andreas Färber
Hello Juan,

Am 18.10.2011 03:22, schrieb Juan Pineda:
 This is my first time submitting a patch, so please let me know if I'm not 
 following the correct protocol.

Please see http://wiki.qemu.org/Contribute/SubmitAPatch

Using the git-send-mail tool assures that the patch format gets right.

Then, this patch despite its small size is not trivial so instead of
qemu-triv...@nongnu.org please cc andreas.faer...@web.de for Cocoa. Thanks.

 Under OSX Lion the boot volume dialog is not closed and it permanently 
 obscures the emulator window since under Lion the dialog cannot be 
 repositioned. The fix adds only to add a single line to close the dialog.

I don't have access to v10.7 so please describe the problem in more
details: What command line do you use? If using the right arguments you
shouldn't see a window at all. Are you maybe using -drive instead of
-hda and that is not yet handled correctly?

Apart from this issue, is it working correctly for you?

Regards,
Andreas

 Signed-off-by: Juan Pineda j...@logician.com
 
 Thanks!
 -Juan
 
 
 git diff ui/cocoa.m
 diff --git a/ui/cocoa.m b/ui/cocoa.m
 index d9e4e3d..4b42462 100644
 --- a/ui/cocoa.m
 +++ b/ui/cocoa.m
 @@ -811,6 +811,8 @@ QemuCocoaView *cocoaView;
  
  char **argv = (char**)malloc( sizeof(char*)*3 );
  
 +   [sheet close];
 +
  asprintf(argv[0], %s, bin);
  asprintf(argv[1], -hda);
  asprintf(argv[2], %s, img);
 
 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746, AG Nürnberg



Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support

2011-10-18 Thread Zhi Yong Wu
On Tue, Oct 18, 2011 at 4:36 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 18.10.2011 10:07, schrieb Zhi Yong Wu:
 On Mon, Oct 17, 2011 at 6:17 PM, Kevin Wolf kw...@redhat.com wrote:
 +
 +typedef struct BlockQueueAIOCB BlockQueueAIOCB;
 +
 +struct BlockQueue {
 +    QTAILQ_HEAD(requests, BlockQueueAIOCB) requests;
 +    bool req_failed;
 +    bool flushing;
 +};

 I find req_failed pretty confusing. Needs documentation at least, but
 most probably also a better name.
 OK. request_has_failed?

 No, that doesn't describe what it's really doing.

 You set req_failed = true by default and then on some obscure condition
 clear it or not. It's tracking something, but I'm not sure what meaning
 it has during the whole process.
 In qemu_block_queue_flush,
 When bdrv_aio_readv/writev return NULL, if req_failed is changed to
 false, it indicates that the request exceeds the limits again; if
 req_failed is still true, it indicates that one I/O error takes place,
 at the monent, qemu_block_queue_callback(request, -EIO) need to be
 called to report this to upper layer.

 Okay, this makes some more sense now.

 How about reversing the logic and maintaining a limit_exceeded flag
 instead of a req_failed? I think this would be clearer.
OK

 +void qemu_del_block_queue(BlockQueue *queue)
 +{
 +    BlockQueueAIOCB *request, *next;
 +
 +    QTAILQ_FOREACH_SAFE(request, queue-requests, entry, next) {
 +        QTAILQ_REMOVE(queue-requests, request, entry);
 +        qemu_aio_release(request);
 +    }
 +
 +    g_free(queue);
 +}

 Can we be sure that no AIO requests are in flight that still use the now
 released AIOCB?
 Yeah, since qemu core code is serially performed, i think that when
 qemu_del_block_queue is performed, no requests are in flight. Right?

 Patch 2 has this code:

 +void bdrv_io_limits_disable(BlockDriverState *bs)
 +{
 +    bs-io_limits_enabled = false;
 +
 +    if (bs-block_queue) {
 +        qemu_block_queue_flush(bs-block_queue);
 +        qemu_del_block_queue(bs-block_queue);
 +        bs-block_queue = NULL;
 +    }

 Does this mean that you can't disable I/O limits while the VM is running?
 NO, you can even though VM is running.

 Okay, in general qemu_block_queue_flush() empties the queue so that
 there are no requests left that qemu_del_block_queue() could drop from
 the queue. So in the common case it doesn't even enter the FOREACH loop.
I think that we should adopt !QTAILQ_EMPTY(queue-requests), not
QTAILQ_FOREACH_SAFE in qemu_del_block_queue(),
right?

 I think problems start when requests have failed or exceeded the limit
 again, then you have requests queued even after
 qemu_block_queue_flush(). You must be aware of this, otherwise the code
 in qemu_del_block_queue() wouldn't exist.

 But you can't release the ACBs without having called their callback,
 otherwise the caller would still assume that its ACB pointer is valid.
 Maybe calling the callback before releasing the ACB would be enough.
Good, thanks.

 However, for failed requests see below.


 +
 +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue,
 +                        BlockDriverState *bs,
 +                        BlockRequestHandler *handler,
 +                        int64_t sector_num,
 +                        QEMUIOVector *qiov,
 +                        int nb_sectors,
 +                        BlockDriverCompletionFunc *cb,
 +                        void *opaque)
 +{
 +    BlockDriverAIOCB *acb;
 +    BlockQueueAIOCB *request;
 +
 +    if (queue-flushing) {
 +        queue-req_failed = false;
 +        return NULL;
 +    } else {
 +        acb = qemu_aio_get(block_queue_pool, bs,
 +                           cb, opaque);
 +        request = container_of(acb, BlockQueueAIOCB, common);
 +        request-handler       = handler;
 +        request-sector_num    = sector_num;
 +        request-qiov          = qiov;
 +        request-nb_sectors    = nb_sectors;
 +        request-real_acb      = NULL;
 +        QTAILQ_INSERT_TAIL(queue-requests, request, entry);
 +    }
 +
 +    return acb;
 +}
 +
 +static int qemu_block_queue_handler(BlockQueueAIOCB *request)
 +{
 +    int ret;
 +    BlockDriverAIOCB *res;
 +
 +    res = request-handler(request-common.bs, request-sector_num,
 +                           request-qiov, request-nb_sectors,
 +                           qemu_block_queue_callback, request);
 +    if (res) {
 +        request-real_acb = res;
 +    }
 +
 +    ret = (res == NULL) ? 0 : 1;
 +
 +    return ret;

 You mean return (res != NULL); and want to have bool as the return value
 of this function.
 Yeah, thanks. i will modify as below:
 ret = (res == NULL) ? false : true;

 ret = (res != NULL) is really more readable.
 I have adopted Paolo's suggestion.


 and
 static bool qemu_block_queue_handler()


 +}
 +
 +void qemu_block_queue_flush(BlockQueue *queue)
 +{
 +    queue-flushing = true;
 +    while (!QTAILQ_EMPTY(queue-requests)) {
 +        BlockQueueAIOCB *request = NULL;
 +        int ret = 0;
 +
 +        

Re: [Qemu-devel] [PATCH] PPC: Bump qemu-system-ppc to 64-bit physical address space

2011-10-18 Thread Andreas Färber
Am 18.10.2011 01:52, schrieb Alexander Graf:
 Some 32-bit PPC CPUs can use up to 36 bit of physicall address space.
 Treat them accordingly in the qemu-system-ppc binary type.

physical

 Signed-off-by: Alexander Graf ag...@suse.de
 ---
  configure|2 +-
  target-ppc/cpu.h |2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)
 
 diff --git a/configure b/configure
 index 9b4fe34..3bdb556 100755
 --- a/configure
 +++ b/configure
 @@ -3276,7 +3276,7 @@ case $target_arch2 in
;;
ppc)
  gdb_xml_files=power-core.xml power-fpu.xml power-altivec.xml 
 power-spe.xml
 -target_phys_bits=32
 +target_phys_bits=64
  target_nptl=yes
  target_libs_softmmu=$fdt_libs
;;

I've found this very unintuitive for the new targets I'm preparing.
Could we use the real target_phys_bits=36 here and later on ceil this to
32/64 before using it for libhw${target_phys_bits}?

Andreas

 diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
 index 8e5c85c..f36f375 100644
 --- a/target-ppc/cpu.h
 +++ b/target-ppc/cpu.h
 @@ -66,7 +66,7 @@
  #define TARGET_PAGE_BITS 12
  #endif /* defined(TARGET_PPCEMB) */
  
 -#define TARGET_PHYS_ADDR_SPACE_BITS 32
 +#define TARGET_PHYS_ADDR_SPACE_BITS 36
  #define TARGET_VIRT_ADDR_SPACE_BITS 32
  
  #endif /* defined (TARGET_PPC64) */


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746, AG Nürnberg



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Wen Congyang
At 10/18/2011 04:36 PM, Jan Kiszka Write:
 On 2011-10-18 10:34, Richard W.M. Jones wrote:
 
 Yeah, I see. Could also be solved via gdb scripts, but crash is already
 there.
 
 But let's see if the formats actually differ. In the end, crash is just
 parsing the same information that also gdb sees.

I think the format can be similar with diskdump/kdump/netdump:

dump_header: 1 block
sub header:  n blocks(n is stored in dump_header)
bitmap:  m blocks(2m is stored in dump_header)
dumpable bitmap: m blocks
memory data(We can know whether a page is stored in the core by bitmap and 
dumpable bitmap)

The format of dump header(It's like kdump/diskdump):
struct disk_dump_header {
charsignature[SIG_LEN]; /* = QEMU */
int header_version; /* Dump header version */
struct new_utsname  utsname;/* copy of system_utsname */
struct timeval  timestamp;  /* Time stamp */
unsigned intstatus;
int block_size; /* Size of a block in byte */
int sub_hdr_size;   /* Size of arch dependent
   header in blocks */
unsigned intbitmap_blocks;  /* Size of Memory bitmap in
   block */
unsigned intmax_mapnr;  /* = max_mapnr */
unsigned inttotal_ram_blocks;/* Number of blocks should be
   written */
unsigned intdevice_blocks;  /* Number of total blocks in
 * the dump device */
unsigned intwritten_blocks; /* Number of written blocks */
unsigned intcurrent_cpu;/* CPU# which handles dump */
int nr_cpus;/* Number of CPUs */
};

The sub header can contains all registers's value on each vcpu, and other 
information,
for example:
struct qemu_sub_header {
unsigned long   start_pfn;
unsigned long   end_pfn;
off_t   offset_note;
unsigned long   size_note;
};

Thanks
Wen Congyang

 
 Jan
 




Re: [Qemu-devel] [PATCH] PPC: Fail configure when libfdt is not available

2011-10-18 Thread Peter Maydell
On 18 October 2011 01:18, Alexander Graf ag...@suse.de wrote:
 +if test $fdt != yes -a \( $target_arch2 = ppc -o \
 +        $target_arch2 = ppc64 -o $target_arch2 = ppcemb \); then
 +  echo
 +  echo Error: libfdt missing
 +  echo The PowerPC target requires libfdt to work properly.
 +  echo Please make sure to have it and its development packages installed
 +  echo
 +  exit 1
 +fi
  if test $target_arch2 = ppc64 -a $fdt = yes; then
   echo CONFIG_PSERIES=y  $config_target_mak
  fi

The test -a, -o, ( and ) operators are deprecated (see the 'application
usage section' of the POSIX 'test' documentation:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html
) -- better not to use them in new code.

Since target_arch2 == ppc64 and fdt != yes is now an impossible
combination, you could remove the fdt check from the condition
below which sets CONFIG_PSERIES.

Also, missing '.' after 'installed'.

-- PMM



Re: [Qemu-devel] [PATCH] Memory API bugfix - abolish addrrrange_end()

2011-10-18 Thread Avi Kivity
On 10/18/2011 03:38 AM, David Gibson wrote:
 On Mon, Oct 17, 2011 at 12:34:19PM +0200, Avi Kivity wrote:
  On 10/17/2011 07:31 AM, David Gibson wrote:

In terms of how the code looks, it's seriously more ugly (see the
patches I sent out).  Conceptually it's cleaner, since we're not dodging
the issue that we need to deal with a full 64-bit domain.
  
   We don't have to dodge that issue.  I know how to remove the
   requirement for intermediate negative values, I just haven't made up a
   patch yet.  With that we can change to uint64 and cover the full 64
   bit range.  In fact I think I can make it so that size==0 represents
   size=2^64 and even handle the full 64-bit, inclusive range properly.
  
  That means you can't do a real size == 0.

 Yeah... a memory range with size 0 has no effect by definition, I
 think we can do without it.

How do we make sure all callers know this?

But my main concern is maintainability.  The 64-bit blanket is to short,
if we keep pulling it in various directions we'll just expose ourselves
in new ways.
  
   Nonsense, dealing with full X-bit range calculations in X-bit types is
   a fairly standard problem.  The kernel does it in VMA handling for
   one.  It just requires thinking about overflow cases.
  
  We discovered three bugs already (you found two, and I had one during
  development).  Even if it can probably be done with extreme care, but is
  it worth spending all that development time on?
  
  I'm not sure there is a parallel with vmas, since we're offsetting in
  both the positive and negative directions.

 I think the so-called negative offsetting is just an artifact of our
 implementation.  I don't see that it's any different from having a VMA
 whose file offset is larger than its memory address.


Consider the vga window at 0xa pointing into the framebuffer at
alias_offset 0x1a.  To the system, it looks like a copy of the
framebuffer starts at -0x100, becomes visible 0x1a bytes later
(at 0xa), then becomes invisible again at 0xa8000.

Yes, it's an artifact, but I don't want to spend too much time worrying
about it, if I can throw a few more bits at the problem.  The API is too
central to make a case by case analysis of where things can go wrong, it
needs to be robust.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.




Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support

2011-10-18 Thread Kevin Wolf
Am 18.10.2011 11:29, schrieb Zhi Yong Wu:
 +void qemu_del_block_queue(BlockQueue *queue)
 +{
 +BlockQueueAIOCB *request, *next;
 +
 +QTAILQ_FOREACH_SAFE(request, queue-requests, entry, next) {
 +QTAILQ_REMOVE(queue-requests, request, entry);
 +qemu_aio_release(request);
 +}
 +
 +g_free(queue);
 +}

 Can we be sure that no AIO requests are in flight that still use the now
 released AIOCB?
 Yeah, since qemu core code is serially performed, i think that when
 qemu_del_block_queue is performed, no requests are in flight. Right?

 Patch 2 has this code:

 +void bdrv_io_limits_disable(BlockDriverState *bs)
 +{
 +bs-io_limits_enabled = false;
 +
 +if (bs-block_queue) {
 +qemu_block_queue_flush(bs-block_queue);
 +qemu_del_block_queue(bs-block_queue);
 +bs-block_queue = NULL;
 +}

 Does this mean that you can't disable I/O limits while the VM is running?
 NO, you can even though VM is running.

 Okay, in general qemu_block_queue_flush() empties the queue so that
 there are no requests left that qemu_del_block_queue() could drop from
 the queue. So in the common case it doesn't even enter the FOREACH loop.
 I think that we should adopt !QTAILQ_EMPTY(queue-requests), not
 QTAILQ_FOREACH_SAFE in qemu_del_block_queue(),
 right?

I think QTAILQ_FOREACH_SAFE is fine.


 I think problems start when requests have failed or exceeded the limit
 again, then you have requests queued even after
 qemu_block_queue_flush(). You must be aware of this, otherwise the code
 in qemu_del_block_queue() wouldn't exist.

 But you can't release the ACBs without having called their callback,
 otherwise the caller would still assume that its ACB pointer is valid.
 Maybe calling the callback before releasing the ACB would be enough.
 Good, thanks.

 +}
 +}
 +}
 +
 +queue-req_failed = true;
 +queue-flushing   = false;
 +}
 +
 +bool qemu_block_queue_has_pending(BlockQueue *queue)
 +{
 +return !queue-flushing  !QTAILQ_EMPTY(queue-requests);
 +}

 Why doesn't the queue have pending requests in the middle of a flush
 operation? (That is, the flush hasn't completed yet)
 It is possible for the queue to have pending requests. if yes, how about?

 Sorry, can't parse this.

 I don't understand why the !queue-flushing part is correct.

 What about this?
 When bdrv_aio_readv/writev handle one request, it will determine if
 block queue is not being flushed and isn't NULL; if yes, It assume
 that this request is one new request from upper layer, so it won't
 determine if the I/O rate at runtime has exceeded the limits, but
 immediately insert it into block queue.

Hm, I think I understand what you're saying, but only after looking at
patch 3. This is not really implementing a has_pending(), but
has_pending_and_caller_wasnt_called_during_flush(). I think it would be
better to handle the queue-flushing condition in the caller where its
use is more obvious.

Kevin



Re: [Qemu-devel] [RFC][PATCH 00/45] qemu-kvm: MSI layer rework for in-kernel irqchip support

2011-10-18 Thread Jan Kiszka
On 2011-10-17 17:57, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 11:27:34AM +0200, Jan Kiszka wrote:
 As previously indicated, I was working for quite a while on a major
 refactoring of the MSI additions we have in qemu-kvm to support
 in-kernel irqchip, vhost and device assignment. This is now the outcome.

 I'm quite happy with it, things are still working (apparently), and the
 invasiveness of KVM hooks into the MSI layer is significantly reduced.
 Moreover, I was able to port the device assignment code over generic MSI
 support, reducing the size of that file a bit further.

 Some further highlights:
  - fix for HPET MSI support with in-kernel irqchip
  - fully configurable MSI-X (allows 1:1 mapping for assigned devices)
  - refactored KVM core API for device assignment and IRQ routing

 I'm sending the whole series in one chunk so that you can see what the
 result will be. It's RFC as I bet that there are regressions included
 and maybe still room left for improvements. Once all is fine (can be
 broken up into multiple chunks for the merge), I would suggest patching
 qemu-kvm first and then start with porting things over to upstream.

 Comments  review welcome.

 CC: Alexander Graf ag...@suse.de
 CC: Gerd Hoffmann kra...@redhat.com
 CC: Isaku Yamahata yamah...@valinux.co.jp
 
 
 So the scheme where we lazily update kvm gsi table
 on interrupts is interesting. There seems to be
 very little point in it for virtio, and it does
 seem to make it impossible to detect lack or resources
 (at the moment we let guest know if we run out of GSIs
 and linux guests can fall back to regular interrupts).

Are we really at that limit already? Then I think it's rather time to
lift it at the kernel side.

 
 I am guessing the idea is to use it for device assignment
 where it does make sense as there is no standard way
 to track which vectors are actually used?
 But how does it work there? kvm does not
 propage unmapped interrupts from an assigned device to qemu, does it?

No, device assignment and irqfd (vhost, vfio, whatever-will-come) belong
to the static group. They need to hand out a static gsi to the irq
source, thus they cannot participate in lazy updating.

But they can benefit from the generic config notifiers: whenever the
guest changes some route (or disables it), they just need to inform the
core about this. So there is no need to track used vectors separately
anymore.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [V2 PATCH] rtl8139: check the buffer availiability

2011-10-18 Thread Mark Wu

Hi Jason,
Could you please elaborate what problem you try to resolve by this 
patch?  And do you think we need notify I/O thread re-polling tap fd 
when receive descriptor becomes available? Otherwise, tap read polling 
will be disabled until the I/O handlers are updated by other reasons. Am 
I right?



On 10/17/2011 10:55 AM, Jason Wang wrote:

Reduce spurious packet drops on RX ring empty when in c+ mode by verifying that
we have at least 1 buffer ahead of the time.

Change from v1:
Fix style comments from Stefan.

Signed-off-by: Jason Wangjasow...@redhat.com
---
  hw/rtl8139.c |   44 ++--
  1 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index 3753950..bcbc5e3 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -84,6 +84,19 @@
  #define VLAN_TCI_LEN 2
  #define VLAN_HLEN (ETHER_TYPE_LEN + VLAN_TCI_LEN)

+/* w0 ownership flag */
+#define CP_RX_OWN (131)
+/* w0 end of ring flag */
+#define CP_RX_EOR (130)
+/* w0 bits 0...12 : buffer size */
+#define CP_RX_BUFFER_SIZE_MASK ((113) - 1)
+/* w1 tag available flag */
+#define CP_RX_TAVA (116)
+/* w1 bits 0...15 : VLAN tag */
+#define CP_RX_VLAN_TAG_MASK ((116) - 1)
+/* w2 low  32bit of Rx buffer ptr */
+/* w3 high 32bit of Rx buffer ptr */
+
  #if defined (DEBUG_RTL8139)
  #  define DPRINTF(fmt, ...) \
  do { fprintf(stderr, RTL8139:  fmt, ## __VA_ARGS__); } while (0)
@@ -805,6 +818,22 @@ static inline target_phys_addr_t rtl8139_addr64(uint32_t 
low, uint32_t high)
  #endif
  }

+/* Verify that we have at least one available rx buffer */
+static int rtl8139_cp_has_rxbuf(RTL8139State *s)
+{
+uint32_t val, rxdw0;
+target_phys_addr_t cplus_rx_ring_desc = rtl8139_addr64(s-RxRingAddrLO,
+   s-RxRingAddrHI);
+cplus_rx_ring_desc += 16 * s-currCPlusRxDesc;
+cpu_physical_memory_read(cplus_rx_ring_desc,val, 4);
+rxdw0 = le32_to_cpu(val);
+if (rxdw0  CP_RX_OWN) {
+return 1;
+} else {
+return 0;
+}
+}
+
  static int rtl8139_can_receive(VLANClientState *nc)
  {
  RTL8139State *s = DO_UPCAST(NICState, nc, nc)-opaque;
@@ -819,7 +848,7 @@ static int rtl8139_can_receive(VLANClientState *nc)
  if (rtl8139_cp_receiver_enabled(s)) {
  /* ??? Flow control not implemented in c+ mode.
 This is a hack to work around slirp deficiencies anyway.  */
-return 1;
+return rtl8139_cp_has_rxbuf(s);
  } else {
  avail = MOD2(s-RxBufferSize + s-RxBufPtr - s-RxBufAddr,
   s-RxBufferSize);
@@ -965,19 +994,6 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, 
const uint8_t *buf, size_

  /* begin C+ receiver mode */

-/* w0 ownership flag */
-#define CP_RX_OWN (131)
-/* w0 end of ring flag */
-#define CP_RX_EOR (130)
-/* w0 bits 0...12 : buffer size */
-#define CP_RX_BUFFER_SIZE_MASK ((113) - 1)
-/* w1 tag available flag */
-#define CP_RX_TAVA (116)
-/* w1 bits 0...15 : VLAN tag */
-#define CP_RX_VLAN_TAG_MASK ((116) - 1)
-/* w2 low  32bit of Rx buffer ptr */
-/* w3 high 32bit of Rx buffer ptr */
-
  int descriptor = s-currCPlusRxDesc;
  target_phys_addr_t cplus_rx_ring_desc;








Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 11:43, Wen Congyang wrote:
 At 10/18/2011 04:36 PM, Jan Kiszka Write:
 On 2011-10-18 10:34, Richard W.M. Jones wrote:

 Yeah, I see. Could also be solved via gdb scripts, but crash is already
 there.

 But let's see if the formats actually differ. In the end, crash is just
 parsing the same information that also gdb sees.
 
 I think the format can be similar with diskdump/kdump/netdump:
 
 dump_header: 1 block
 sub header:  n blocks(n is stored in dump_header)
 bitmap:  m blocks(2m is stored in dump_header)
 dumpable bitmap: m blocks
 memory data(We can know whether a page is stored in the core by bitmap and 
 dumpable bitmap)
 
 The format of dump header(It's like kdump/diskdump):
 struct disk_dump_header {
   charsignature[SIG_LEN]; /* = QEMU */
   int header_version; /* Dump header version */
   struct new_utsname  utsname;/* copy of system_utsname */
   struct timeval  timestamp;  /* Time stamp */
   unsigned intstatus;
   int block_size; /* Size of a block in byte */
   int sub_hdr_size;   /* Size of arch dependent
  header in blocks */
   unsigned intbitmap_blocks;  /* Size of Memory bitmap in
  block */
   unsigned intmax_mapnr;  /* = max_mapnr */
   unsigned inttotal_ram_blocks;/* Number of blocks should be
  written */
   unsigned intdevice_blocks;  /* Number of total blocks in
* the dump device */
   unsigned intwritten_blocks; /* Number of written blocks */
   unsigned intcurrent_cpu;/* CPU# which handles dump */
   int nr_cpus;/* Number of CPUs */
 };
 
 The sub header can contains all registers's value on each vcpu, and other 
 information,
 for example:
 struct qemu_sub_header {
   unsigned long   start_pfn;
   unsigned long   end_pfn;
   off_t   offset_note;
   unsigned long   size_note;
 };

So is this a standard format or only similar to something? Which tools
will understand it out-of-the-box? If it's not standard, why?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



[Qemu-devel] [Bug 874038] Re: ARM thumb2 does not propogate carry flag properly

2011-10-18 Thread Alexey Starikovskiy
Sorry for the noise, it appears I've checked on not clean 0.15.0

** Changed in: qemu
   Status: New = Invalid

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/874038

Title:
  ARM thumb2 does not propogate carry flag properly

Status in QEMU:
  Invalid

Bug description:
  information on carry flag is lost if gen_set_CF_bit31(t1) is called
  after logic operation.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/874038/+subscriptions



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Paolo Bonzini

On 10/18/2011 10:39 AM, Jan Kiszka wrote:


  Yeah, I see. Could also be solved via gdb scripts, but crash is already
  there.

[ BTW, crash is for the dead. But having those features in form of gdb
scripts would also allow live system analysis. Looks like it's worth
obsoleting crash on the long run. ]


Crash can also do live system analysis. :)

Paolo



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Christoph Hellwig
On Tue, Oct 18, 2011 at 12:41:10PM +0200, Paolo Bonzini wrote:
 On 10/18/2011 10:39 AM, Jan Kiszka wrote:
 
   Yeah, I see. Could also be solved via gdb scripts, but crash is already
   there.
 [ BTW, crash is for the dead. But having those features in form of gdb
 scripts would also allow live system analysis. Looks like it's worth
 obsoleting crash on the long run. ]

 Crash can also do live system analysis. :)

crash in fact is gdb - with a lot of additional commands.




Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache

2011-10-18 Thread Jan Kiszka
On 2011-10-17 17:37, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 01:19:56PM +0200, Jan Kiszka wrote:
 On 2011-10-17 13:06, Avi Kivity wrote:
 On 10/17/2011 11:27 AM, Jan Kiszka wrote:
 This cache will help us implementing KVM in-kernel irqchip support
 without spreading hooks all over the place.

 KVM requires us to register it first and then deliver it by raising a
 pseudo IRQ line returned on registration. While this could be changed
 for QEMU-originated MSI messages by adding direct MSI injection, we will
 still need this translation for irqfd-originated messages. The
 MSIRoutingCache will allow to track those registrations and update them
 lazily before the actual delivery. This avoid having to track MSI
 vectors at device level (like qemu-kvm currently does).


 +typedef enum {
 +MSI_ROUTE_NONE = 0,
 +MSI_ROUTE_STATIC,
 +} MSIRouteType;
 +
 +struct MSIRoutingCache {
 +MSIMessage msg;
 +MSIRouteType type;
 +int kvm_gsi;
 +int kvm_irqfd;
 +};
 +
 diff --git a/hw/pci.h b/hw/pci.h
 index 329ab32..5b5d2fd 100644
 --- a/hw/pci.h
 +++ b/hw/pci.h
 @@ -197,6 +197,10 @@ struct PCIDevice {
  MemoryRegion rom;
  uint32_t rom_bar;
  
 +/* MSI routing chaches */
 +MSIRoutingCache *msi_cache;
 +MSIRoutingCache *msix_cache;
 +
  /* MSI entries */
  int msi_entries_nr;
  struct KVMMsiMessage *msi_irq_entries;

 IMO this needlessly leaks kvm information into core qemu.  The cache
 should be completely hidden in kvm code.

 I think msi_deliver() can hide the use of the cache completely.  For
 pre-registered events like kvm's irqfd, you can use something like

   qemu_irq qemu_msi_irq(MSIMessage msg)

 for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
 for kvm, it allocates an irqfd and a permanent entry in the cache and
 returns a qemu_irq that triggers the irqfd.

 See my previously mail: you want to track the life-cycle of an MSI
 source to avoid generating routes for identical sources. A messages is
 not a source. Two identical messages can come from different sources.
 
 Since MSI messages are edge triggered, I don't see how this
 would work without losing interrupts. And AFAIK,
 existing guests do not use the same message for
 different sources.

Just like we handle shared edge-triggered line-base IRQs, shared MSIs
are in principle feasible as well.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 12:41, Paolo Bonzini wrote:
 On 10/18/2011 10:39 AM, Jan Kiszka wrote:

  Yeah, I see. Could also be solved via gdb scripts, but crash is already
  there.
 [ BTW, crash is for the dead. But having those features in form of gdb
 scripts would also allow live system analysis. Looks like it's worth
 obsoleting crash on the long run. ]
 
 Crash can also do live system analysis. :)

Does it work via remote interface? And it's still the wrong way around:
You have to append gdb to the majority of command when doing debugging
instead of issuing additional analysis commands from the gdb shell.
That's understandable given the limited scripting power of old gdbs. But
that's now history thanks to its python interface.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [RFC][PATCH 42/45] msix: Introduce msix_init_simple

2011-10-18 Thread Michael S. Tsirkin
On Mon, Oct 17, 2011 at 09:21:34PM +0200, Jan Kiszka wrote:
 On 2011-10-17 16:28, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 01:27:31PM +0200, Jan Kiszka wrote:
  On 2011-10-17 13:22, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
  Devices models are usually not interested in specifying MSI-X
  configuration details beyond the number of vectors to provide and the
  BAR number to use. Layout of an exclusively used BAR and its
  registration can also be handled centrally.
 
  This is the purpose of msix_init_simple. It provides handy services to
  the existing users. Future users like device assignment may require more
  detailed setup specification. For them we will (re-)introduce msix_init
  with the full list of configuration option (in contrast to the current
  code).
 
  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
  Well, this seems a bit of a code churn then, doesn't it?
  We are also discussing using memory BAR for virtio-pci for other
  stuff besides MSI-X, so the last user of the _simple variant
  will be ivshmem then?
 
  We will surely see more MSI-X users over the time. Not sure if they all
  mix their MSIX-X BARs with other stuff. But e.g. the e1000 variant I
  have here does not. So there should be users in the future.
 
  Jan
  
  Question is, how hard is to pass in the BAR and the offset?
 
 That is trivial. But have a look at the final simple implementation. It
 also manages the container memory region for table and PBA and
 registers/unregisters that container as BAR. So there is measurable
 added-value.
 
 Jan
 

Yes, I agree. In particular it's not very nice that the user has to know
the size of the bar to create. But the API is very unfortunate IMO.

I am also more interested in solutions that help all devices
and not just those that have a dedicated bar for msix + pba.

We should probably pass in the size of the memory region allocated to
the msix table, and verify that the table fits there.
We can also avoid passing in bar number, like this:

diff --git a/hw/pci.c b/hw/pci.c
index 749e8d8..d0d893e 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -903,6 +903,17 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
 : pci_dev-bus-address_space_mem;
 }
 
+int pci_get_bar_nr(PCIDevice *pci_dev, MemoryRegion *bar)
+{
+int region_num;
+for (region_num = 0; region_num  PCI_NUM_REGIONS; ++region_num) {
+if (pci_dev-io_regions[region_num].memory == bar) {
+return region_num;
+}
+}
+return -1;
+}
+
 pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num)
 {
 return pci_dev-io_regions[region_num].addr;
-- 
MST



Re: [Qemu-devel] [RFC][PATCH 42/45] msix: Introduce msix_init_simple

2011-10-18 Thread Jan Kiszka
On 2011-10-18 12:52, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 09:21:34PM +0200, Jan Kiszka wrote:
 On 2011-10-17 16:28, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 01:27:31PM +0200, Jan Kiszka wrote:
 On 2011-10-17 13:22, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
 Devices models are usually not interested in specifying MSI-X
 configuration details beyond the number of vectors to provide and the
 BAR number to use. Layout of an exclusively used BAR and its
 registration can also be handled centrally.

 This is the purpose of msix_init_simple. It provides handy services to
 the existing users. Future users like device assignment may require more
 detailed setup specification. For them we will (re-)introduce msix_init
 with the full list of configuration option (in contrast to the current
 code).

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

 Well, this seems a bit of a code churn then, doesn't it?
 We are also discussing using memory BAR for virtio-pci for other
 stuff besides MSI-X, so the last user of the _simple variant
 will be ivshmem then?

 We will surely see more MSI-X users over the time. Not sure if they all
 mix their MSIX-X BARs with other stuff. But e.g. the e1000 variant I
 have here does not. So there should be users in the future.

 Jan

 Question is, how hard is to pass in the BAR and the offset?

 That is trivial. But have a look at the final simple implementation. It
 also manages the container memory region for table and PBA and
 registers/unregisters that container as BAR. So there is measurable
 added-value.

 Jan

 
 Yes, I agree. In particular it's not very nice that the user has to know
 the size of the bar to create. But the API is very unfortunate IMO.

I'm open to see the prototypes of a different one.

 
 I am also more interested in solutions that help all devices
 and not just those that have a dedicated bar for msix + pba.

That's what patch 43 is for.

 
 We should probably pass in the size of the memory region allocated to
 the msix table, and verify that the table fits there.

Well, if you specify table and PBA offset explicitly, I think it is fair
to expect the caller having done the math.

There are some checks built into the core already, e.g. against table
and PBA overlap. We could add one for checking BAR limits. I'm not sure,
though, if requesting table and PBA sizes solves the issue that the user
may have done the calculation wrong.

 We can also avoid passing in bar number, like this:
 
 diff --git a/hw/pci.c b/hw/pci.c
 index 749e8d8..d0d893e 100644
 --- a/hw/pci.c
 +++ b/hw/pci.c
 @@ -903,6 +903,17 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
  : pci_dev-bus-address_space_mem;
  }
  
 +int pci_get_bar_nr(PCIDevice *pci_dev, MemoryRegion *bar)
 +{
 +int region_num;
 +for (region_num = 0; region_num  PCI_NUM_REGIONS; ++region_num) {
 +if (pci_dev-io_regions[region_num].memory == bar) {
 +return region_num;
 +}
 +}
 +return -1;
 +}
 +
  pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num)
  {
  return pci_dev-io_regions[region_num].addr;

That enforces a specific registration order. If you call msix_init
before doing the BAR registration, the function above will not help.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 6/7] stellaris: convert adc to memory API

2011-10-18 Thread Peter Maydell
2011/10/17 Benoît Canet benoit.ca...@gmail.com:
 Signed-off-by: Benoit Canet benoit.ca...@gmail.com
 ---
  hw/stellaris.c |   30 --
  1 files changed, 12 insertions(+), 18 deletions(-)

Reviewed-by: Peter Maydell peter.mayd...@linaro.org



Re: [Qemu-devel] [PATCH 4/7] stellaris: convert sys to memory API

2011-10-18 Thread Peter Maydell
2011/10/17 Benoît Canet benoit.ca...@gmail.com:
 Signed-off-by: Benoit Canet benoit.ca...@gmail.com
 ---
  hw/stellaris.c |   28 +++-
  1 files changed, 11 insertions(+), 17 deletions(-)

Reviewed-by: Peter Maydell peter.mayd...@linaro.org



Re: [Qemu-devel] [PATCH 2/7] integratorcp: convert icp pic to memory API

2011-10-18 Thread Peter Maydell
2011/10/17 Benoît Canet benoit.ca...@gmail.com:
 Signed-off-by: Benoit Canet benoit.ca...@gmail.com
 ---
  hw/integratorcp.c |   27 ++-
  1 files changed, 10 insertions(+), 17 deletions(-)

Reviewed-by: Peter Maydell peter.mayd...@linaro.org



Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook

2011-10-18 Thread Jan Kiszka
On 2011-10-17 15:48, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 03:41:44PM +0200, Avi Kivity wrote:
 On 10/17/2011 03:41 PM, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 01:15:56PM +0200, Jan Kiszka wrote:
 On 2011-10-17 12:56, Avi Kivity wrote:
 On 10/17/2011 11:27 AM, Jan Kiszka wrote:
 So far we deliver MSI messages by writing them into the target MMIO
 area. This reflects what happens on hardware, but imposes some
 limitations on the emulation when introducing KVM in-kernel irqchip
 models. For those we will need to track the message origin.

 Why do we need to track the message origin?  Emulated interrupt remapping?

 The origin holds the routing cache which we need to track if the message
 already has a route (and that without searching long lists) and to
 update that route instead of add another one.

 Hmm, yes, but if the device does stl_phys or something like this,
 it won't work with irqchip, will it? And it should, ideally.

 Why not?  it will fall back to the apic path, and use the local routing
 cache entry there.
 
 Does it still work with irqchip enabled? I didn't realize ...

Yep, as MSI requests that land in the APIC MMIO page are also fed into
msi_deliver and will take the normal path from there on.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 3/7] integratorcp: convert control to memory API

2011-10-18 Thread Peter Maydell
2011/10/17 Benoît Canet benoit.ca...@gmail.com:
 -static void icp_control_init(uint32_t base)
 +static void icp_control_init(target_phys_addr_t base)
  {
 -    int iomemtype;
 +    MemoryRegion *io;

 -    iomemtype = cpu_register_io_memory(icp_control_readfn,
 -                                       icp_control_writefn, NULL,
 -                                       DEVICE_NATIVE_ENDIAN);
 -    cpu_register_physical_memory(base, 0x0080, iomemtype);
 +    io = (MemoryRegion *)g_malloc0(sizeof(MemoryRegion));
 +    memory_region_init_io(io, icp_control_ops, NULL,
 +                          control, 0x0080);
 +    memory_region_add_subregion(get_system_memory(), base, io);
     /* ??? Save/restore.  */
  }

I didn't spot this the first time round, but this is wrong.
We shouldn't be g_malloc0()ing the MemoryRegion -- it should
be an element in some suitable device struct.

I think the right thing to do here is probably first to do the
(fairly trivial) conversion of the icp_control code to be a
sysbus device, then do the memory region conversion on top of that.

-- PMM



Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache

2011-10-18 Thread Jan Kiszka
On 2011-10-17 17:43, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 11:27:46AM +0200, Jan Kiszka wrote:
 This cache will help us implementing KVM in-kernel irqchip support
 without spreading hooks all over the place.

 KVM requires us to register it first and then deliver it by raising a
 pseudo IRQ line returned on registration. While this could be changed
 for QEMU-originated MSI messages by adding direct MSI injection, we will
 still need this translation for irqfd-originated messages. The
 MSIRoutingCache will allow to track those registrations and update them
 lazily before the actual delivery. This avoid having to track MSI
 vectors at device level (like qemu-kvm currently does).

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
 So if many devices are added, exhausting the number of GSIs supported,
 we get terrible performance intead of simply failing outright.
 
 To me, this looks more like a bug than a feature ...

If that ever turns out to be a bottleneck, failing looks like the worst
we can do. Reporting excessive cache flushes would make some sense and
could still be added.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

2011-10-18 Thread Michael S. Tsirkin
On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
 On 2011-10-17 17:48, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
  This optimization was only required to keep KVM route usage low. Now
  that we solve that problem via lazy updates, we can drop the field. We
  still need interfaces to clear pending vectors, though (and we have to
  make use of them more broadly - but that's unrelated to this patch).
 
  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
  
  Lazy updates should be an implementation detail.
  IMO resource tracking of vectors makes sense
  as an API. Making devices deal with pending
  vectors as a concept, IMO, does not.
 
 There is really no use for tracking the vector lifecycle once we have
 lazy updates (except for static routes). It's a way too invasive
 concept, and it's not needed for anything but KVM.

I think it's needed. The PCI spec states that when the device
does not need an interrupt anymore, it should clear the pending
bit. The use/unuse is IMO a decent API for this,
because it uses a familiar resource tracking concept.
Exposing this knowledge of msix to devices seems
like a worse API.

 
 If you want an example, check
 http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
 compare it to the changes done to hpet in this series.
 
 Jan
 

This seems to be a general argument that lazy updates are good?
I have no real problem with them, besides the fact that
we need an API to reserve space in the routing
table so that device setup can fail upfront.

-- 
MST



Re: [Qemu-devel] [PATCH v2 00/13] allow tools to use the QEMU main loop

2011-10-18 Thread Paolo Bonzini

On 10/17/2011 10:19 PM, Paolo Bonzini wrote:

On 10/17/2011 10:11 PM, Anthony Liguori wrote:

Leftover from RFC. Right now, this is just a chainsaw cleanup.
However, Kevin
has plans to use the main loop everywhere (making qemu-io and qemu-img
one huge
coroutine).


Can you add something to docs that explains how to use the new main loop?


Yes, of course.


Is it okay to just move more functions to main-loop.h (bottom halves, 
lock/unlock iothread, child watch) and add doc comments there?


If so, I can send the 2 updated patches (main-loop.h/main-loop.c) for 
review, and push again to github.


Paolo




Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook

2011-10-18 Thread Michael S. Tsirkin
On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
 On 2011-10-17 15:43, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
  diff --git a/hw/msi.c b/hw/msi.c
  index 3c7ebc3..9055155 100644
  --- a/hw/msi.c
  +++ b/hw/msi.c
  @@ -40,6 +40,14 @@
   /* Flag for interrupt controller to declare MSI/MSI-X support */
   bool msi_supported;
   
  +static void msi_unsupported(MSIMessage *msg)
  +{
  +/* If we get here, the board failed to register a delivery handler. */
  +abort();
  +}
  +
  +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
  +
  
  How about we set this to NULL, and check it instead of the bool
  flag?
  
 
 Yeah. I will introduce
 
 bool msi_supported(void)
 {
 return msi_deliver != msi_unsupported;
 }
 
 OK?
 
 Jan
 

Looks a bit weird ...
NULL is a pretty standard value for an invalid pointer, isn't it?

-- 
MST



Re: [Qemu-devel] [RFC][PATCH 08/45] Introduce MSIMessage structure

2011-10-18 Thread Jan Kiszka
On 2011-10-17 15:01, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 02:09:46PM +0200, Jan Kiszka wrote:
 On 2011-10-17 14:04, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 01:51:00PM +0200, Jan Kiszka wrote:
 On 2011-10-17 13:46, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 11:27:42AM +0200, Jan Kiszka wrote:
 Will be used for generating and distributing MSI messages, both in
 emulation mode and under KVM.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

 I would add

 uint64_t msix_get_address(dev, vector)
 uint64_t msix_get_data(dev, vector)

 and same for msi.

 this would minimise the changes while still making it
 possible to avoid code duplication in kvm.

 I'm introducing msi[x]_message_from_vector for that purpose later on. Or
 what do you mean?

 Jan

 It does not look like everyone actually wants the structure,
 users seem to put it on stack and then immediately
 unwrap it to get at the address/data.
 So two accessorts get_data + get_address instead of one, will
 remove the need to rework all code to use the structure.

 The idea of this patch is to start handling MSI messages as a single
 blob. There should be no need to ask a device for parts of that blobs
 this way.
 
 There should be no need to look at the message at all.
 devices really only care about vector numbers.
 So we are left with msix.c msi.c and kvm as the only users.
 kvm has a cache of messages so it needs a struct of these,
 msix/msi don't.

MSIMessages is primarily designed for path from the device to the
interrupt controller. And there are multiple stops, already in the path
after this patch set. Interrupt remapping, e.g., would add another stop.

 
 If you see use cases in this series, though, let me know.

 Jan
 
 Yes, I see them. msix_notify is one example. msi_notify is another.
 
 E.g. msi_notify would IMO look nicer as:
 stl_le_phys(msi_get_address(dev, vector), msi_get_data(dev, vector));

This line does not exist anymore at the end of this series. See
pc_msi_deliver.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

2011-10-18 Thread Jan Kiszka
On 2011-10-18 13:58, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
 On 2011-10-17 17:48, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
 This optimization was only required to keep KVM route usage low. Now
 that we solve that problem via lazy updates, we can drop the field. We
 still need interfaces to clear pending vectors, though (and we have to
 make use of them more broadly - but that's unrelated to this patch).

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

 Lazy updates should be an implementation detail.
 IMO resource tracking of vectors makes sense
 as an API. Making devices deal with pending
 vectors as a concept, IMO, does not.

 There is really no use for tracking the vector lifecycle once we have
 lazy updates (except for static routes). It's a way too invasive
 concept, and it's not needed for anything but KVM.
 
 I think it's needed. The PCI spec states that when the device
 does not need an interrupt anymore, it should clear the pending
 bit.

That should be done explicitly if it is required outside existing
clearing points. We already have that service, it's called
msix_clear_vector. That alone does not justify msix_vector_use and all
the state and logic behind it IMHO.

 The use/unuse is IMO a decent API for this,
 because it uses a familiar resource tracking concept.
 Exposing this knowledge of msix to devices seems
 like a worse API.
 

 If you want an example, check
 http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
 compare it to the changes done to hpet in this series.

 Jan

 
 This seems to be a general argument that lazy updates are good?
 I have no real problem with them, besides the fact that
 we need an API to reserve space in the routing
 table so that device setup can fail upfront.

That's not possible, even with used vectors, as devices change their
vector usage depending on how the guest configures the devices. If you
(pre-)allocate all possible vectors, you may run out of resources
earlier than needed actually. That's also why we do those data == 0
checks to skip used but unconfigured vectors.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache

2011-10-18 Thread Michael S. Tsirkin
On Mon, Oct 17, 2011 at 09:19:34PM +0200, Jan Kiszka wrote:
 On 2011-10-17 17:37, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 01:19:56PM +0200, Jan Kiszka wrote:
  On 2011-10-17 13:06, Avi Kivity wrote:
  On 10/17/2011 11:27 AM, Jan Kiszka wrote:
  This cache will help us implementing KVM in-kernel irqchip support
  without spreading hooks all over the place.
 
  KVM requires us to register it first and then deliver it by raising a
  pseudo IRQ line returned on registration. While this could be changed
  for QEMU-originated MSI messages by adding direct MSI injection, we will
  still need this translation for irqfd-originated messages. The
  MSIRoutingCache will allow to track those registrations and update them
  lazily before the actual delivery. This avoid having to track MSI
  vectors at device level (like qemu-kvm currently does).
 
 
  +typedef enum {
  +MSI_ROUTE_NONE = 0,
  +MSI_ROUTE_STATIC,
  +} MSIRouteType;
  +
  +struct MSIRoutingCache {
  +MSIMessage msg;
  +MSIRouteType type;
  +int kvm_gsi;
  +int kvm_irqfd;
  +};
  +
  diff --git a/hw/pci.h b/hw/pci.h
  index 329ab32..5b5d2fd 100644
  --- a/hw/pci.h
  +++ b/hw/pci.h
  @@ -197,6 +197,10 @@ struct PCIDevice {
   MemoryRegion rom;
   uint32_t rom_bar;
   
  +/* MSI routing chaches */
  +MSIRoutingCache *msi_cache;
  +MSIRoutingCache *msix_cache;
  +
   /* MSI entries */
   int msi_entries_nr;
   struct KVMMsiMessage *msi_irq_entries;
 
  IMO this needlessly leaks kvm information into core qemu.  The cache
  should be completely hidden in kvm code.
 
  I think msi_deliver() can hide the use of the cache completely.  For
  pre-registered events like kvm's irqfd, you can use something like
 
qemu_irq qemu_msi_irq(MSIMessage msg)
 
  for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
  for kvm, it allocates an irqfd and a permanent entry in the cache and
  returns a qemu_irq that triggers the irqfd.
 
  See my previously mail: you want to track the life-cycle of an MSI
  source to avoid generating routes for identical sources. A messages is
  not a source. Two identical messages can come from different sources.
  
  Since MSI messages are edge triggered, I don't see how this
  would work without losing interrupts. And AFAIK,
  existing guests do not use the same message for
  different sources.
 
 Just like we handle shared edge-triggered line-base IRQs, shared MSIs
 are in principle feasible as well.
 
 Jan
 

For this case it seems quite harmless to use multiple
routes for identical sources. Yes it would use more resources
but it never happens in practice.
So what Avi said originally is still true.

-- 
MST



Re: [Qemu-devel] [Qemu-ppc] [PATCH] PPC: Fail configure when libfdt is not available

2011-10-18 Thread David Gibson
On Tue, Oct 18, 2011 at 10:55:01AM +0200, Andreas Färber wrote:
 Am 18.10.2011 02:18, schrieb Alexander Graf:
  We have several targets in the PPC tree now that basically require libfdt
  to function properly, namely the pseries and the e500 targets. This 
  dependency
  will rather increase than decrease in the future, so I want to make sure
  that people building shiny new 1.0 actually have libfdt installed to get
  rid of a few ifdefs in the code.
  
  Warning: This patch will likely make configure fail for people who don't
  select their own --target-list, but don't have libfdt development packages
  installed. However, we really need this new dependency to move on.
  
  Signed-off-by: Alexander Graf ag...@suse.de
 
 openSUSE 12.1 has libfdt1-devel, but you should set up a submodule and
 working build rules for Darwin, Haiku, etc. `make` doesn't fully work so
 I used custom scripts to build the right parts and to manually install
 the resulting binary and headers.

If there are build problems with libfdt on any platform let me know
about them.  I would like it to build clean as widely as possible, but
I don't have that great a diversity of build environments, so I have
to reply on bug reports.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson



Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook

2011-10-18 Thread Jan Kiszka
On 2011-10-18 14:05, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
 On 2011-10-17 15:43, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
 diff --git a/hw/msi.c b/hw/msi.c
 index 3c7ebc3..9055155 100644
 --- a/hw/msi.c
 +++ b/hw/msi.c
 @@ -40,6 +40,14 @@
  /* Flag for interrupt controller to declare MSI/MSI-X support */
  bool msi_supported;
  
 +static void msi_unsupported(MSIMessage *msg)
 +{
 +/* If we get here, the board failed to register a delivery handler. */
 +abort();
 +}
 +
 +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
 +

 How about we set this to NULL, and check it instead of the bool
 flag?


 Yeah. I will introduce

 bool msi_supported(void)
 {
 return msi_deliver != msi_unsupported;
 }

 OK?

 Jan

 
 Looks a bit weird ...
 NULL is a pretty standard value for an invalid pointer, isn't it?

Save us the runtime check and is equally expressive and readable IMHO.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [RFC][PATCH 12/45] msi: Introduce MSIRoutingCache

2011-10-18 Thread Jan Kiszka
On 2011-10-18 14:17, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 09:19:34PM +0200, Jan Kiszka wrote:
 On 2011-10-17 17:37, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 01:19:56PM +0200, Jan Kiszka wrote:
 On 2011-10-17 13:06, Avi Kivity wrote:
 On 10/17/2011 11:27 AM, Jan Kiszka wrote:
 This cache will help us implementing KVM in-kernel irqchip support
 without spreading hooks all over the place.

 KVM requires us to register it first and then deliver it by raising a
 pseudo IRQ line returned on registration. While this could be changed
 for QEMU-originated MSI messages by adding direct MSI injection, we will
 still need this translation for irqfd-originated messages. The
 MSIRoutingCache will allow to track those registrations and update them
 lazily before the actual delivery. This avoid having to track MSI
 vectors at device level (like qemu-kvm currently does).


 +typedef enum {
 +MSI_ROUTE_NONE = 0,
 +MSI_ROUTE_STATIC,
 +} MSIRouteType;
 +
 +struct MSIRoutingCache {
 +MSIMessage msg;
 +MSIRouteType type;
 +int kvm_gsi;
 +int kvm_irqfd;
 +};
 +
 diff --git a/hw/pci.h b/hw/pci.h
 index 329ab32..5b5d2fd 100644
 --- a/hw/pci.h
 +++ b/hw/pci.h
 @@ -197,6 +197,10 @@ struct PCIDevice {
  MemoryRegion rom;
  uint32_t rom_bar;
  
 +/* MSI routing chaches */
 +MSIRoutingCache *msi_cache;
 +MSIRoutingCache *msix_cache;
 +
  /* MSI entries */
  int msi_entries_nr;
  struct KVMMsiMessage *msi_irq_entries;

 IMO this needlessly leaks kvm information into core qemu.  The cache
 should be completely hidden in kvm code.

 I think msi_deliver() can hide the use of the cache completely.  For
 pre-registered events like kvm's irqfd, you can use something like

   qemu_irq qemu_msi_irq(MSIMessage msg)

 for non-kvm, it simply returns a qemu_irq that triggers a stl_phys();
 for kvm, it allocates an irqfd and a permanent entry in the cache and
 returns a qemu_irq that triggers the irqfd.

 See my previously mail: you want to track the life-cycle of an MSI
 source to avoid generating routes for identical sources. A messages is
 not a source. Two identical messages can come from different sources.

 Since MSI messages are edge triggered, I don't see how this
 would work without losing interrupts. And AFAIK,
 existing guests do not use the same message for
 different sources.

 Just like we handle shared edge-triggered line-base IRQs, shared MSIs
 are in principle feasible as well.

 Jan

 
 For this case it seems quite harmless to use multiple
 routes for identical sources.

Unless we track the source (via the MSIRoutingCache abstraction), there
can be no multiple routes. The core cannot differentiate between
identical messages, thus will not create multiple routes.

But that's actually a corner case, and we could probably live with it.
The real question is if we want to search for MSI routes on each message
delivery.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 3/4] qapi-types.py: Fail gracefully if out dir is not specified

2011-10-18 Thread Luiz Capitulino
On Mon, 17 Oct 2011 16:27:34 -0500
Michael Roth mdr...@linux.vnet.ibm.com wrote:

 On Mon, 17 Oct 2011 13:29:36 -0200, Luiz Capitulino lcapitul...@redhat.com 
 wrote:
  Otherwise we get:
  
  Traceback (most recent call last):
File ./scripts/qapi-types.py, line 183, in module
os.makedirs(output_dir)
  File /usr/lib64/python2.7/os.py, line 157, in makedirs
mkdir(name, mode)
OSError: [Errno 2] No such file or directory: ''
  
  Signed-off-by: Luiz Capitulino lcapitul...@redhat.com
  ---
   scripts/qapi-types.py |6 +-
   1 files changed, 5 insertions(+), 1 deletions(-)
  
  diff --git a/scripts/qapi-types.py b/scripts/qapi-types.py
  index 8df4b72..4a2ddc4 100644
  --- a/scripts/qapi-types.py
  +++ b/scripts/qapi-types.py
  @@ -178,7 +178,11 @@ if __name__ == '__main__':
   prefix = a
   elif o in (-o, --output-dir):
   output_dir = a
  -
  +
  +if not output_dir:
  +sys.stdout.write(ouput directory was not specified\n)
  +sys.exit(1)
  +   
 
 We should probably just set output_dir to os.getcwd() here.

Fair enough.

 
   c_file = os.path.join(output_dir, prefix + c_file)
   h_file = os.path.join(output_dir, prefix + h_file)
   
  -- 
  1.7.7.rc3
  
 




Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

2011-10-18 Thread Michael S. Tsirkin
On Tue, Oct 18, 2011 at 02:08:59PM +0200, Jan Kiszka wrote:
 On 2011-10-18 13:58, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
  On 2011-10-17 17:48, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
  This optimization was only required to keep KVM route usage low. Now
  that we solve that problem via lazy updates, we can drop the field. We
  still need interfaces to clear pending vectors, though (and we have to
  make use of them more broadly - but that's unrelated to this patch).
 
  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
  Lazy updates should be an implementation detail.
  IMO resource tracking of vectors makes sense
  as an API. Making devices deal with pending
  vectors as a concept, IMO, does not.
 
  There is really no use for tracking the vector lifecycle once we have
  lazy updates (except for static routes). It's a way too invasive
  concept, and it's not needed for anything but KVM.
  
  I think it's needed. The PCI spec states that when the device
  does not need an interrupt anymore, it should clear the pending
  bit.
 
 That should be done explicitly if it is required outside existing
 clearing points. We already have that service, it's called
 msix_clear_vector.

We do? I don't seem to see it upstream...

 That alone does not justify msix_vector_use and all
 the state and logic behind it IMHO.

To me it looks like an abstraction that solves both
this problem and the resource allocation problem.
Resources are actually limited BTW, this is not just
a KVM thing. qemu.git currently lets guests decide
what to do with them, but it might turn out to
be benefitial to warn the management application
that it is shooting itself in the foot.

  The use/unuse is IMO a decent API for this,
  because it uses a familiar resource tracking concept.
  Exposing this knowledge of msix to devices seems
  like a worse API.
  
 
  If you want an example, check
  http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
  compare it to the changes done to hpet in this series.
 
  Jan
 
  
  This seems to be a general argument that lazy updates are good?
  I have no real problem with them, besides the fact that
  we need an API to reserve space in the routing
  table so that device setup can fail upfront.
 
 That's not possible, even with used vectors, as devices change their
 vector usage depending on how the guest configures the devices. If you
 (pre-)allocate all possible vectors, you may run out of resources
 earlier than needed actually.

This really depends, but please do look at how with virtio
we report resource shortage to guest and let it fall back to
level interrups. You seem to remove that capability.

I actually would not mind preallocating everything upfront which is much
easier.  But with your patch we get a silent failure or a drastic
slowdown which is much more painful IMO.

 That's also why we do those data == 0
 checks to skip used but unconfigured vectors.
 
 Jan

These checks work more or less by luck BTW. It's
a hack which I hope lazy allocation will replace.

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH] RUN_STATE_POSTMIGRATE can be transited from RUN_STATE_PAUSED

2011-10-18 Thread Luiz Capitulino
On Tue, 18 Oct 2011 12:47:27 +0800
Wen Congyang we...@cn.fujitsu.com wrote:

 The user can run command stop before migration. So the guest's RUN_STATE
 can be transisted from RUN_STATE_PAUSED to RUN_STATE_POSTMIGRATE.
 
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com

I've already submitted a pull request with the exact same patch included.

Thanks.

 
 ---
  vl.c |1 +
  1 files changed, 1 insertions(+), 0 deletions(-)
 
 diff --git a/vl.c b/vl.c
 index 2dce3ae..efae19c 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -341,6 +341,7 @@ static const RunStateTransition 
 runstate_transitions_def[] = {
  { RUN_STATE_IO_ERROR, RUN_STATE_RUNNING },
  
  { RUN_STATE_PAUSED, RUN_STATE_RUNNING },
 +{ RUN_STATE_PAUSED, RUN_STATE_POSTMIGRATE },
  
  { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
  




Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook

2011-10-18 Thread Michael S. Tsirkin
On Tue, Oct 18, 2011 at 02:23:29PM +0200, Jan Kiszka wrote:
 On 2011-10-18 14:05, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
  On 2011-10-17 15:43, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
  diff --git a/hw/msi.c b/hw/msi.c
  index 3c7ebc3..9055155 100644
  --- a/hw/msi.c
  +++ b/hw/msi.c
  @@ -40,6 +40,14 @@
   /* Flag for interrupt controller to declare MSI/MSI-X support */
   bool msi_supported;
   
  +static void msi_unsupported(MSIMessage *msg)
  +{
  +/* If we get here, the board failed to register a delivery handler. 
  */
  +abort();
  +}
  +
  +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
  +
 
  How about we set this to NULL, and check it instead of the bool
  flag?
 
 
  Yeah. I will introduce
 
  bool msi_supported(void)
  {
  return msi_deliver != msi_unsupported;
  }
 
  OK?
 
  Jan
 
  
  Looks a bit weird ...
  NULL is a pretty standard value for an invalid pointer, isn't it?
 
 Save us the runtime check and is equally expressive and readable IMHO.
 
 Jan

Do we need to check?
NULL dereference leads to a crash just as surely...

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

2011-10-18 Thread Jan Kiszka
On 2011-10-18 14:33, Michael S. Tsirkin wrote:
 On Tue, Oct 18, 2011 at 02:08:59PM +0200, Jan Kiszka wrote:
 On 2011-10-18 13:58, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
 On 2011-10-17 17:48, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
 This optimization was only required to keep KVM route usage low. Now
 that we solve that problem via lazy updates, we can drop the field. We
 still need interfaces to clear pending vectors, though (and we have to
 make use of them more broadly - but that's unrelated to this patch).

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

 Lazy updates should be an implementation detail.
 IMO resource tracking of vectors makes sense
 as an API. Making devices deal with pending
 vectors as a concept, IMO, does not.

 There is really no use for tracking the vector lifecycle once we have
 lazy updates (except for static routes). It's a way too invasive
 concept, and it's not needed for anything but KVM.

 I think it's needed. The PCI spec states that when the device
 does not need an interrupt anymore, it should clear the pending
 bit.

 That should be done explicitly if it is required outside existing
 clearing points. We already have that service, it's called
 msix_clear_vector.
 
 We do? I don't seem to see it upstream...

True. From the device's POV, MSI-X (and also MSI!) vectors are actually
level-triggered. So we should communicate the level to the MSI core and
not just the edge. Needs more fixing

 
 That alone does not justify msix_vector_use and all
 the state and logic behind it IMHO.
 
 To me it looks like an abstraction that solves both
 this problem and the resource allocation problem.
 Resources are actually limited BTW, this is not just
 a KVM thing. qemu.git currently lets guests decide
 what to do with them, but it might turn out to
 be benefitial to warn the management application
 that it is shooting itself in the foot.
 
 The use/unuse is IMO a decent API for this,
 because it uses a familiar resource tracking concept.
 Exposing this knowledge of msix to devices seems
 like a worse API.


 If you want an example, check
 http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
 compare it to the changes done to hpet in this series.

 Jan


 This seems to be a general argument that lazy updates are good?
 I have no real problem with them, besides the fact that
 we need an API to reserve space in the routing
 table so that device setup can fail upfront.

 That's not possible, even with used vectors, as devices change their
 vector usage depending on how the guest configures the devices. If you
 (pre-)allocate all possible vectors, you may run out of resources
 earlier than needed actually.
 
 This really depends, but please do look at how with virtio
 we report resource shortage to guest and let it fall back to
 level interrups. You seem to remove that capability.

To my understanding, virtio will be the exception as no other device
will have a chance to react on resource shortage while sending(!) an MSI
message.

 
 I actually would not mind preallocating everything upfront which is much
 easier.  But with your patch we get a silent failure or a drastic
 slowdown which is much more painful IMO.

Again: did we already saw that limit? And where does it come from if not
from KVM?

 
 That's also why we do those data == 0
 checks to skip used but unconfigured vectors.

 Jan
 
 These checks work more or less by luck BTW. It's
 a hack which I hope lazy allocation will replace.

The check is still valid (for x86) when we have to use static routes
(device assignment, vhost). For lazy updates, it's obsolete, that's true.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes

2011-10-18 Thread Michael S. Tsirkin
On Mon, Oct 17, 2011 at 09:00:12PM +0200, Jan Kiszka wrote:
 On 2011-10-17 14:16, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 11:27:56AM +0200, Jan Kiszka wrote:
  Also invoke the mask notifier if the global MSI-X mask is modified. For
  this purpose, we push the notifier call from the per-vector mask update
  to the central msix_handle_mask_update.
 
  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
  
  This is a bugfix, isn't it?
  If yes it should be separated and put on -stable.
 
 Yep, will pull this to the front.

I'll apply this to qemu.git, no need to mix bugfixes
with features ...

  
  ---
   hw/msix.c |   16 +---
   1 files changed, 9 insertions(+), 7 deletions(-)
 
  diff --git a/hw/msix.c b/hw/msix.c
  index 739b56f..247b255 100644
  --- a/hw/msix.c
  +++ b/hw/msix.c
  @@ -221,7 +221,15 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
   
   static void msix_handle_mask_update(PCIDevice *dev, int vector)
   {
  -if (!msix_is_masked(dev, vector)  msix_is_pending(dev, vector)) {
  +bool masked = msix_is_masked(dev, vector);
  +int ret;
  +
  +if (dev-msix_mask_notifier) {
  +ret = dev-msix_mask_notifier(dev, vector,
  +  msix_is_masked(dev, vector));
  
  Use 'masked' value here as well?
 
 Yes.
 
 Jan
 





Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook

2011-10-18 Thread Jan Kiszka
On 2011-10-18 14:38, Michael S. Tsirkin wrote:
 On Tue, Oct 18, 2011 at 02:23:29PM +0200, Jan Kiszka wrote:
 On 2011-10-18 14:05, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
 On 2011-10-17 15:43, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
 diff --git a/hw/msi.c b/hw/msi.c
 index 3c7ebc3..9055155 100644
 --- a/hw/msi.c
 +++ b/hw/msi.c
 @@ -40,6 +40,14 @@
  /* Flag for interrupt controller to declare MSI/MSI-X support */
  bool msi_supported;
  
 +static void msi_unsupported(MSIMessage *msg)
 +{
 +/* If we get here, the board failed to register a delivery handler. 
 */
 +abort();
 +}
 +
 +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
 +

 How about we set this to NULL, and check it instead of the bool
 flag?


 Yeah. I will introduce

 bool msi_supported(void)
 {
 return msi_deliver != msi_unsupported;
 }

 OK?

 Jan


 Looks a bit weird ...
 NULL is a pretty standard value for an invalid pointer, isn't it?

 Save us the runtime check and is equally expressive and readable IMHO.

 Jan
 
 Do we need to check?
 NULL dereference leads to a crash just as surely...

There is no NULL state of msi_deliver. A) it would execute
msi_unsupported if all goes wrong (which will abort) and B)
msi_supported() is supposed to protect us in the absence of bugs from
ever executing msi_deliver() if it points to msi_unsupported.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook

2011-10-18 Thread malc
On Tue, 18 Oct 2011, Michael S. Tsirkin wrote:

 On Tue, Oct 18, 2011 at 02:23:29PM +0200, Jan Kiszka wrote:
  On 2011-10-18 14:05, Michael S. Tsirkin wrote:
   On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
   On 2011-10-17 15:43, Michael S. Tsirkin wrote:
   On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
   diff --git a/hw/msi.c b/hw/msi.c
   index 3c7ebc3..9055155 100644
   --- a/hw/msi.c
   +++ b/hw/msi.c
   @@ -40,6 +40,14 @@
/* Flag for interrupt controller to declare MSI/MSI-X support */
bool msi_supported;

   +static void msi_unsupported(MSIMessage *msg)
   +{
   +/* If we get here, the board failed to register a delivery 
   handler. */
   +abort();
   +}
   +
   +void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
   +
  
   How about we set this to NULL, and check it instead of the bool
   flag?
  
  
   Yeah. I will introduce
  
   bool msi_supported(void)
   {
   return msi_deliver != msi_unsupported;
   }
  
   OK?
  
   Jan
  
   
   Looks a bit weird ...
   NULL is a pretty standard value for an invalid pointer, isn't it?
  
  Save us the runtime check and is equally expressive and readable IMHO.
  
  Jan
 
 Do we need to check?
 NULL dereference leads to a crash just as surely...
 

Not universally (not on AIX for instance (read)).

-- 
mailto:av1...@comtv.ru



Re: [Qemu-devel] [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes

2011-10-18 Thread Jan Kiszka
On 2011-10-18 14:40, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 09:00:12PM +0200, Jan Kiszka wrote:
 On 2011-10-17 14:16, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 11:27:56AM +0200, Jan Kiszka wrote:
 Also invoke the mask notifier if the global MSI-X mask is modified. For
 this purpose, we push the notifier call from the per-vector mask update
 to the central msix_handle_mask_update.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

 This is a bugfix, isn't it?
 If yes it should be separated and put on -stable.

 Yep, will pull this to the front.
 
 I'll apply this to qemu.git, no need to mix bugfixes
 with features ...

This doesn't apply to qemu.git (there are no notifiers upstream).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

2011-10-18 Thread Michael S. Tsirkin
On Tue, Oct 18, 2011 at 02:38:36PM +0200, Jan Kiszka wrote:
 On 2011-10-18 14:33, Michael S. Tsirkin wrote:
  On Tue, Oct 18, 2011 at 02:08:59PM +0200, Jan Kiszka wrote:
  On 2011-10-18 13:58, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 09:28:12PM +0200, Jan Kiszka wrote:
  On 2011-10-17 17:48, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
  This optimization was only required to keep KVM route usage low. Now
  that we solve that problem via lazy updates, we can drop the field. We
  still need interfaces to clear pending vectors, though (and we have to
  make use of them more broadly - but that's unrelated to this patch).
 
  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
  Lazy updates should be an implementation detail.
  IMO resource tracking of vectors makes sense
  as an API. Making devices deal with pending
  vectors as a concept, IMO, does not.
 
  There is really no use for tracking the vector lifecycle once we have
  lazy updates (except for static routes). It's a way too invasive
  concept, and it's not needed for anything but KVM.
 
  I think it's needed. The PCI spec states that when the device
  does not need an interrupt anymore, it should clear the pending
  bit.
 
  That should be done explicitly if it is required outside existing
  clearing points. We already have that service, it's called
  msix_clear_vector.
  
  We do? I don't seem to see it upstream...
 
 True. From the device's POV, MSI-X (and also MSI!) vectors are actually
 level-triggered.

This definitely takes adjusting to.

 So we should communicate the level to the MSI core and
 not just the edge. Needs more fixing

  
  That alone does not justify msix_vector_use and all
  the state and logic behind it IMHO.
  
  To me it looks like an abstraction that solves both
  this problem and the resource allocation problem.
  Resources are actually limited BTW, this is not just
  a KVM thing. qemu.git currently lets guests decide
  what to do with them, but it might turn out to
  be benefitial to warn the management application
  that it is shooting itself in the foot.
  
  The use/unuse is IMO a decent API for this,
  because it uses a familiar resource tracking concept.
  Exposing this knowledge of msix to devices seems
  like a worse API.
 
 
  If you want an example, check
  http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
  compare it to the changes done to hpet in this series.
 
  Jan
 
 
  This seems to be a general argument that lazy updates are good?
  I have no real problem with them, besides the fact that
  we need an API to reserve space in the routing
  table so that device setup can fail upfront.
 
  That's not possible, even with used vectors, as devices change their
  vector usage depending on how the guest configures the devices. If you
  (pre-)allocate all possible vectors, you may run out of resources
  earlier than needed actually.
  
  This really depends, but please do look at how with virtio
  we report resource shortage to guest and let it fall back to
  level interrups. You seem to remove that capability.
 
 To my understanding, virtio will be the exception as no other device
 will have a chance to react on resource shortage while sending(!) an MSI
 message.

Hmm, are you familiar with that spec? This is not what virtio does,
resource shortage is detected during setup.
This is exactly the problem with lazy registration as you don't
allocate until it's too late.

  
  I actually would not mind preallocating everything upfront which is much
  easier.  But with your patch we get a silent failure or a drastic
  slowdown which is much more painful IMO.
 
 Again: did we already saw that limit? And where does it come from if not
 from KVM?

It's a hardware limitation of intel APICs. interrupt vector is encoded
in an 8 bit field in msi address. So you can have at most 256 of these.

  
  That's also why we do those data == 0
  checks to skip used but unconfigured vectors.
 
  Jan
  
  These checks work more or less by luck BTW. It's
  a hack which I hope lazy allocation will replace.
 
 The check is still valid (for x86) when we have to use static routes
 (device assignment, vhost).

It's not valid at all - we are just lucky that linux and
windows guests seem to zero out the vector when it's not in use.
They do not have to do that.

 For lazy updates, it's obsolete, that's true.
 
 Jan
 
 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 12:42, Christoph Hellwig wrote:
 On Tue, Oct 18, 2011 at 12:41:10PM +0200, Paolo Bonzini wrote:
 On 10/18/2011 10:39 AM, Jan Kiszka wrote:

  Yeah, I see. Could also be solved via gdb scripts, but crash is already
  there.
 [ BTW, crash is for the dead. But having those features in form of gdb
 scripts would also allow live system analysis. Looks like it's worth
 obsoleting crash on the long run. ]

 Crash can also do live system analysis. :)
 
 crash in fact is gdb - with a lot of additional commands.

Is there a trick to make it work against a gdbserver?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [RFC][PATCH 11/45] msi: Factor out delivery hook

2011-10-18 Thread Michael S. Tsirkin
On Tue, Oct 18, 2011 at 04:44:47PM +0400, malc wrote:
 On Tue, 18 Oct 2011, Michael S. Tsirkin wrote:
 
  On Tue, Oct 18, 2011 at 02:23:29PM +0200, Jan Kiszka wrote:
   On 2011-10-18 14:05, Michael S. Tsirkin wrote:
On Mon, Oct 17, 2011 at 09:15:47PM +0200, Jan Kiszka wrote:
On 2011-10-17 15:43, Michael S. Tsirkin wrote:
On Mon, Oct 17, 2011 at 11:27:45AM +0200, Jan Kiszka wrote:
diff --git a/hw/msi.c b/hw/msi.c
index 3c7ebc3..9055155 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -40,6 +40,14 @@
 /* Flag for interrupt controller to declare MSI/MSI-X support */
 bool msi_supported;
 
+static void msi_unsupported(MSIMessage *msg)
+{
+/* If we get here, the board failed to register a delivery 
handler. */
+abort();
+}
+
+void (*msi_deliver)(MSIMessage *msg) = msi_unsupported;
+
   
How about we set this to NULL, and check it instead of the bool
flag?
   
   
Yeah. I will introduce
   
bool msi_supported(void)
{
return msi_deliver != msi_unsupported;
}
   
OK?
   
Jan
   

Looks a bit weird ...
NULL is a pretty standard value for an invalid pointer, isn't it?
   
   Save us the runtime check and is equally expressive and readable IMHO.
   
   Jan
  
  Do we need to check?
  NULL dereference leads to a crash just as surely...
  
 
 Not universally (not on AIX for instance (read)).

This is a NULL function call though :)
Anyway, this was just nitpicking. Do it any way you like.

 -- 
 mailto:av1...@comtv.ru



Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

2011-10-18 Thread Jan Kiszka
On 2011-10-17 17:48, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 11:28:02AM +0200, Jan Kiszka wrote:
 This optimization was only required to keep KVM route usage low. Now
 that we solve that problem via lazy updates, we can drop the field. We
 still need interfaces to clear pending vectors, though (and we have to
 make use of them more broadly - but that's unrelated to this patch).

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
 Lazy updates should be an implementation detail.
 IMO resource tracking of vectors makes sense
 as an API. Making devices deal with pending
 vectors as a concept, IMO, does not.

There is really no use for tracking the vector lifecycle once we have
lazy updates (except for static routes). It's a way too invasive
concept, and it's not needed for anything but KVM.

If you want an example, check
http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/70915 and
compare it to the changes done to hpet in this series.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes

2011-10-18 Thread Michael S. Tsirkin
On Tue, Oct 18, 2011 at 02:45:58PM +0200, Jan Kiszka wrote:
 On 2011-10-18 14:40, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 09:00:12PM +0200, Jan Kiszka wrote:
  On 2011-10-17 14:16, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 11:27:56AM +0200, Jan Kiszka wrote:
  Also invoke the mask notifier if the global MSI-X mask is modified. For
  this purpose, we push the notifier call from the per-vector mask update
  to the central msix_handle_mask_update.
 
  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
  This is a bugfix, isn't it?
  If yes it should be separated and put on -stable.
 
  Yep, will pull this to the front.
  
  I'll apply this to qemu.git, no need to mix bugfixes
  with features ...
 
 This doesn't apply to qemu.git (there are no notifiers upstream).
 
 Jan

Right, thanks for the reminder. Anyway, I'll take care
of this, don't worry :)

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

2011-10-18 Thread Jan Kiszka
On 2011-10-18 14:48, Michael S. Tsirkin wrote:
 To my understanding, virtio will be the exception as no other device
 will have a chance to react on resource shortage while sending(!) an MSI
 message.
 
 Hmm, are you familiar with that spec?

Not by heart.

 This is not what virtio does,
 resource shortage is detected during setup.
 This is exactly the problem with lazy registration as you don't
 allocate until it's too late.

When is that setup phase? Does it actually come after every change to an
MSI vector? I doubt so. Thus virtio can only estimate the guest usage as
well (a guest may or may not actually write a non-null data into a
vector and unmask it).

 

 I actually would not mind preallocating everything upfront which is much
 easier.  But with your patch we get a silent failure or a drastic
 slowdown which is much more painful IMO.

 Again: did we already saw that limit? And where does it come from if not
 from KVM?
 
 It's a hardware limitation of intel APICs. interrupt vector is encoded
 in an 8 bit field in msi address. So you can have at most 256 of these.

There should be no such limitation with pseudo GSIs we use for MSI
injection. They end up as MSI messages again, so actually 256 (-reserved
vectors) * number-of-cpus (on x86).

 

 That's also why we do those data == 0
 checks to skip used but unconfigured vectors.

 Jan

 These checks work more or less by luck BTW. It's
 a hack which I hope lazy allocation will replace.

 The check is still valid (for x86) when we have to use static routes
 (device assignment, vhost).
 
 It's not valid at all - we are just lucky that linux and
 windows guests seem to zero out the vector when it's not in use.
 They do not have to do that.

It is valid as it is just an optimization. If an unused vector has a
non-null data field, we just redundantly register a route where we do
not actually have to. But we do need to be prepared for potentially
arriving messages on that virtual GSI, either via irqfd or kvm device
assignment.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] GPLv3 troubles

2011-10-18 Thread Anthony Liguori

On 10/18/2011 03:01 AM, Markus Armbruster wrote:

Avi Kivitya...@redhat.com  writes:


On 10/17/2011 07:46 PM, Stefan Weil wrote:


So let's start. For any of my contributions, I agree to GPL v2 or later.
Later generations should have the possibility to replace GPL v2 by
something which matches future requirements.


I expect Red Hat contributions can be relicensed to v2+ as well.


Plenty of precedence for that.



Okay, let's get serious about it.  I set up the following wiki page for 
coordination:


http://wiki.qemu.org/Relicensing

Please get the appropriate approval at Red Hat, and follow the instructions on 
the wiki.  I'm in the process of getting approval at IBM.  Between RH, IBM, and 
Blue, there's a pretty large chunk of files that can be relicensed immediately. 
 It's probably best to tackle this incrementally.


Regards,

Anthony Liguori



[Qemu-devel] In soft freeze, hard freeze begins on November 1st

2011-10-18 Thread Anthony Liguori

Hi,

Just a friendly reminder that we are now in the soft feature freeze. 
Maintainers should exercise discretion about committing any new feature that 
would be extremely disruptive at this point.


The remaining schedule follows below.  November 1st will be the hard freeze of 
master.


Please note that unlike previous releases, we will not be forking off the 1.1 
development branch until after the 1.0 release.  The goal is to make sure that 
we all focus on making 1.0 our highest quality release to date.


| 2011-11-01
| Freeze master
|-
| 2011-11-07
| Tag qemu-1.0-rc1
|-
| 2011-11-14
| Tag qemu-1.0-rc2
|-
| 2011-11-21
| Tag qemu-1.0-rc3
|-
| 2011-11-28
| Tag qemu-1.0-rc4
|-
| 2011-12-01
| Tag qemu-1.0

Regards,

Anthony Liguori



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Christoph Hellwig
On Tue, Oct 18, 2011 at 02:47:39PM +0200, Jan Kiszka wrote:
  Crash can also do live system analysis. :)
  
  crash in fact is gdb - with a lot of additional commands.
 
 Is there a trick to make it work against a gdbserver?

No idea.  I just noticed that crash was built around a gdb source
tree and additional files when I had to hack it for a new kernel
version a while ago.

The optimum would be if all the crash changes were merged back
into upstream gdb, but I'm not sure the gdb community would like
to see all that very linux kernel specific code (especially given that it's
very version dependent)




Re: [Qemu-devel] [RFC][PATCH 22/45] qemu-kvm: msix: Fire mask notifier on global mask changes

2011-10-18 Thread Jan Kiszka
On 2011-10-17 14:16, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 11:27:56AM +0200, Jan Kiszka wrote:
 Also invoke the mask notifier if the global MSI-X mask is modified. For
 this purpose, we push the notifier call from the per-vector mask update
 to the central msix_handle_mask_update.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
 This is a bugfix, isn't it?
 If yes it should be separated and put on -stable.

Yep, will pull this to the front.

 
 ---
  hw/msix.c |   16 +---
  1 files changed, 9 insertions(+), 7 deletions(-)

 diff --git a/hw/msix.c b/hw/msix.c
 index 739b56f..247b255 100644
 --- a/hw/msix.c
 +++ b/hw/msix.c
 @@ -221,7 +221,15 @@ static bool msix_is_masked(PCIDevice *dev, int vector)
  
  static void msix_handle_mask_update(PCIDevice *dev, int vector)
  {
 -if (!msix_is_masked(dev, vector)  msix_is_pending(dev, vector)) {
 +bool masked = msix_is_masked(dev, vector);
 +int ret;
 +
 +if (dev-msix_mask_notifier) {
 +ret = dev-msix_mask_notifier(dev, vector,
 +  msix_is_masked(dev, vector));
 
 Use 'masked' value here as well?

Yes.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC][PATCH 42/45] msix: Introduce msix_init_simple

2011-10-18 Thread Jan Kiszka
On 2011-10-17 16:28, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 01:27:31PM +0200, Jan Kiszka wrote:
 On 2011-10-17 13:22, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 11:28:16AM +0200, Jan Kiszka wrote:
 Devices models are usually not interested in specifying MSI-X
 configuration details beyond the number of vectors to provide and the
 BAR number to use. Layout of an exclusively used BAR and its
 registration can also be handled centrally.

 This is the purpose of msix_init_simple. It provides handy services to
 the existing users. Future users like device assignment may require more
 detailed setup specification. For them we will (re-)introduce msix_init
 with the full list of configuration option (in contrast to the current
 code).

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

 Well, this seems a bit of a code churn then, doesn't it?
 We are also discussing using memory BAR for virtio-pci for other
 stuff besides MSI-X, so the last user of the _simple variant
 will be ivshmem then?

 We will surely see more MSI-X users over the time. Not sure if they all
 mix their MSIX-X BARs with other stuff. But e.g. the e1000 variant I
 have here does not. So there should be users in the future.

 Jan
 
 Question is, how hard is to pass in the BAR and the offset?

That is trivial. But have a look at the final simple implementation. It
also manages the container memory region for table and PBA and
registers/unregisters that container as BAR. So there is measurable
added-value.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 15:08, Christoph Hellwig wrote:
 On Tue, Oct 18, 2011 at 02:47:39PM +0200, Jan Kiszka wrote:
 Crash can also do live system analysis. :)

 crash in fact is gdb - with a lot of additional commands.

 Is there a trick to make it work against a gdbserver?
 
 No idea.  I just noticed that crash was built around a gdb source
 tree and additional files when I had to hack it for a new kernel
 version a while ago.
 
 The optimum would be if all the crash changes were merged back
 into upstream gdb, but I'm not sure the gdb community would like
 to see all that very linux kernel specific code (especially given that it's
 very version dependent)

That's why I'm advocating gdb scripts, maybe maintained inside the
kernel tree.

But I need to look closer as well to see if vanilla gdb is already able
to provide all registers used by crash so far (surely not all required
for perfect debugging, but that's a long-known issue).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support

2011-10-18 Thread Zhi Yong Wu
On Tue, Oct 18, 2011 at 5:56 PM, Kevin Wolf kw...@redhat.com wrote:
 Am 18.10.2011 11:29, schrieb Zhi Yong Wu:
 +void qemu_del_block_queue(BlockQueue *queue)
 +{
 +    BlockQueueAIOCB *request, *next;
 +
 +    QTAILQ_FOREACH_SAFE(request, queue-requests, entry, next) {
 +        QTAILQ_REMOVE(queue-requests, request, entry);
 +        qemu_aio_release(request);
 +    }
 +
 +    g_free(queue);
 +}

 Can we be sure that no AIO requests are in flight that still use the now
 released AIOCB?
 Yeah, since qemu core code is serially performed, i think that when
 qemu_del_block_queue is performed, no requests are in flight. Right?

 Patch 2 has this code:

 +void bdrv_io_limits_disable(BlockDriverState *bs)
 +{
 +    bs-io_limits_enabled = false;
 +
 +    if (bs-block_queue) {
 +        qemu_block_queue_flush(bs-block_queue);
 +        qemu_del_block_queue(bs-block_queue);
 +        bs-block_queue = NULL;
 +    }

 Does this mean that you can't disable I/O limits while the VM is running?
 NO, you can even though VM is running.

 Okay, in general qemu_block_queue_flush() empties the queue so that
 there are no requests left that qemu_del_block_queue() could drop from
 the queue. So in the common case it doesn't even enter the FOREACH loop.
 I think that we should adopt !QTAILQ_EMPTY(queue-requests), not
 QTAILQ_FOREACH_SAFE in qemu_del_block_queue(),
 right?

 I think QTAILQ_FOREACH_SAFE is fine.


 I think problems start when requests have failed or exceeded the limit
 again, then you have requests queued even after
 qemu_block_queue_flush(). You must be aware of this, otherwise the code
 in qemu_del_block_queue() wouldn't exist.

 But you can't release the ACBs without having called their callback,
 otherwise the caller would still assume that its ACB pointer is valid.
 Maybe calling the callback before releasing the ACB would be enough.
 Good, thanks.

 +            }
 +        }
 +    }
 +
 +    queue-req_failed = true;
 +    queue-flushing   = false;
 +}
 +
 +bool qemu_block_queue_has_pending(BlockQueue *queue)
 +{
 +    return !queue-flushing  !QTAILQ_EMPTY(queue-requests);
 +}

 Why doesn't the queue have pending requests in the middle of a flush
 operation? (That is, the flush hasn't completed yet)
 It is possible for the queue to have pending requests. if yes, how about?

 Sorry, can't parse this.

 I don't understand why the !queue-flushing part is correct.

 What about this?
 When bdrv_aio_readv/writev handle one request, it will determine if
 block queue is not being flushed and isn't NULL; if yes, It assume
 that this request is one new request from upper layer, so it won't
 determine if the I/O rate at runtime has exceeded the limits, but
 immediately insert it into block queue.

 Hm, I think I understand what you're saying, but only after looking at
 patch 3. This is not really implementing a has_pending(), but
 has_pending_and_caller_wasnt_called_during_flush(). I think it would be
Correct.
 better to handle the queue-flushing condition in the caller where its
 use is more obvious.
OK. i will do as this.

 Kevin




-- 
Regards,

Zhi Yong Wu



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Richard W.M. Jones
On Tue, Oct 18, 2011 at 12:47:23PM +0200, Jan Kiszka wrote:
 On 2011-10-18 12:41, Paolo Bonzini wrote:
  On 10/18/2011 10:39 AM, Jan Kiszka wrote:
 
   Yeah, I see. Could also be solved via gdb scripts, but crash is already
   there.
  [ BTW, crash is for the dead. But having those features in form of gdb
  scripts would also allow live system analysis. Looks like it's worth
  obsoleting crash on the long run. ]
  
  Crash can also do live system analysis. :)
 
 Does it work via remote interface? And it's still the wrong way around:
 You have to append gdb to the majority of command when doing debugging
 instead of issuing additional analysis commands from the gdb shell.
 That's understandable given the limited scripting power of old gdbs. But
 that's now history thanks to its python interface.

Before this conversation heads any further into the realm of
absurdity, let's get back to the key requirements for production
systems:

(A) No development tools can be installed.

(B) The core file must be removed from the production system quickly
and analyzed by on another machine (indeed, often by experts from
another company entirely).

Anything that involves live debugging using GDB is simply not
acceptable for production systems, whether you think that's right or
not.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top



Re: [Qemu-devel] KVM call agenda for October 18

2011-10-18 Thread Juan Quintela
Juan Quintela quint...@redhat.com wrote:
 Hi

 Please send in any agenda items you are interested in covering.

 Thanks, Juan.

No agenda - no call

Happy hacking, Juan.



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 15:30, Richard W.M. Jones wrote:
 On Tue, Oct 18, 2011 at 12:47:23PM +0200, Jan Kiszka wrote:
 On 2011-10-18 12:41, Paolo Bonzini wrote:
 On 10/18/2011 10:39 AM, Jan Kiszka wrote:

  Yeah, I see. Could also be solved via gdb scripts, but crash is already
  there.
 [ BTW, crash is for the dead. But having those features in form of gdb
 scripts would also allow live system analysis. Looks like it's worth
 obsoleting crash on the long run. ]

 Crash can also do live system analysis. :)

 Does it work via remote interface? And it's still the wrong way around:
 You have to append gdb to the majority of command when doing debugging
 instead of issuing additional analysis commands from the gdb shell.
 That's understandable given the limited scripting power of old gdbs. But
 that's now history thanks to its python interface.
 
 Before this conversation heads any further into the realm of
 absurdity, let's get back to the key requirements for production
 systems:
 
 (A) No development tools can be installed.
 
 (B) The core file must be removed from the production system quickly
 and analyzed by on another machine (indeed, often by experts from
 another company entirely).
 
 Anything that involves live debugging using GDB is simply not
 acceptable for production systems, whether you think that's right or
 not.

My memory is usually that bad, but not in this case. :)

This discussion should also work out a bigger picture than just your
(valid) scenario. We should look at meeting that requirements but also
see what would be a useful foundation for future extensions like having
post-crash analysis and live debugging at a similar feature level with
very similar or (much better) identical tools.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

2011-10-18 Thread Michael S. Tsirkin
On Tue, Oct 18, 2011 at 03:00:29PM +0200, Jan Kiszka wrote:
 On 2011-10-18 14:48, Michael S. Tsirkin wrote:
  To my understanding, virtio will be the exception as no other device
  will have a chance to react on resource shortage while sending(!) an MSI
  message.
  
  Hmm, are you familiar with that spec?
 
 Not by heart.
 
  This is not what virtio does,
  resource shortage is detected during setup.
  This is exactly the problem with lazy registration as you don't
  allocate until it's too late.
 
 When is that setup phase? Does it actually come after every change to an
 MSI vector? I doubt so.

No. During setup, driver requests vectors from the OS, and then tells
the device which vector should each VQ use.  It then checks that the
assignment was successful. If not, it retries with less vectors.

Other devices can do this during initialization, and signal
resource availability to guest using msix vector number field.

 Thus virtio can only estimate the guest usage as
 well

At some level, this is fundamental: some guest operations
have no failure mode. So we must preallocate
some resources to make sure they won't fail.

 (a guest may or may not actually write a non-null data into a
 vector and unmask it).

Please, forget the non-NULL thing. virtio driver knows exactly
how many vectors we use and communicates this info to the device.
This is not uncommon at all.

  
 
  I actually would not mind preallocating everything upfront which is much
  easier.  But with your patch we get a silent failure or a drastic
  slowdown which is much more painful IMO.
 
  Again: did we already saw that limit? And where does it come from if not
  from KVM?
  
  It's a hardware limitation of intel APICs. interrupt vector is encoded
  in an 8 bit field in msi address. So you can have at most 256 of these.
 
 There should be no such limitation with pseudo GSIs we use for MSI
 injection. They end up as MSI messages again, so actually 256 (-reserved
 vectors) * number-of-cpus (on x86).

This limits which CPUs can get the interrupt though.
Linux seems to have a global pool as it wants to be able to freely
balance vectors between CPUs. Or, consider a guest with a single CPU :)

Anyway, why argue - there is a limitation, and it's not coming from KVM,
right?

  
 
  That's also why we do those data == 0
  checks to skip used but unconfigured vectors.
 
  Jan
 
  These checks work more or less by luck BTW. It's
  a hack which I hope lazy allocation will replace.
 
  The check is still valid (for x86) when we have to use static routes
  (device assignment, vhost).
  
  It's not valid at all - we are just lucky that linux and
  windows guests seem to zero out the vector when it's not in use.
  They do not have to do that.
 
 It is valid as it is just an optimization. If an unused vector has a
 non-null data field, we just redundantly register a route where we do
 not actually have to.

Well, the only reason we even have this code is because
it was claimed that some devices declare support for a huge number
of vectors which then go unused. So if the guest does not
do this we'll run out of vectors ...

 But we do need to be prepared

And ATM, we aren't, and probably can't be without kernel
changes, right?

 for potentially
 arriving messages on that virtual GSI, either via irqfd or kvm device
 assignment.
 
 Jan

Why irqfd?  Device assignment is ATM the only place where we use these
ugly hacks.

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



Re: [Qemu-devel] hw/nand hw/onenand and read-only

2011-10-18 Thread Markus Armbruster
juha.riihim...@nokia.com writes:

 On 18.10.11 11:23 , ext Markus Armbruster arm...@redhat.com wrote:

These guys have been converted to qdev relatively recently.

I wonder what happens when they use a drive defined with -drive
if=none,readonly.

If that's not a valid configuration, we better reject read-only drives,
like ide_init_drive() does.

 I'm not an expert with QEMU command line options; how do you pass such a
 drive to a NAND device? With if=mtd I get the following which I guess is
 expected result:

 $ qemu-system-arm -M beagle -drive if=mtd,file=nand.img,readonly
 qemu: readonly flag not supported for drive with this interface

Yes, that way works:

$ qemu-system-arm -drive if=mtd,file=tmp.qcow2,readonly
qemu-system-arm: -drive if=mtd,file=tmp.qcow2,readonly: readonly not 
supported by this bus type

But try this way:

$ qemu-system-arm -drive if=none,file=tmp.qcow2,readonly,id=foo -device 
nand,drive=foo
Kernel image must be specified

Grr, force it:

$ qemu-system-arm -drive if=none,file=tmp.qcow2,readonly,id=foo -device 
nand,drive=foo -kernel /dev/null
qemu: hardware error: nand_device_init: Unsupported NAND block size.
[...]

Note that I didn't bother supplying a drive with sensible size.  If I
did, I guess I'd get nand coupled to a read-only drive.  Easy enough for
you to try :)



Re: [Qemu-devel] [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers

2011-10-18 Thread Michael S. Tsirkin
On Mon, Oct 17, 2011 at 09:08:58PM +0200, Jan Kiszka wrote:
 On 2011-10-17 14:39, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 01:45:04PM +0200, Jan Kiszka wrote:
  On 2011-10-17 13:40, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 11:27:57AM +0200, Jan Kiszka wrote:
  MSI config notifiers are supposed to be triggered on every relevant
  configuration change of MSI vectors or if MSI is enabled/disabled.
 
  Two notifiers are established, one for vector changes and one for general
  enabling. The former notifier additionally passes the currently active
  MSI message.
  This will allow to update potential in-kernel IRQ routes on
  changes. The latter notifier is optional and will only be used by a
  subset of clients.
 
  These notifiers are currently only available for MSI-X but will be
  extended to legacy MSI as well.
 
  Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
  Passing message, always, does not seem to make sense: message is only
  valid if it is unmasked.
 
  If we go from unmasked to masked, the consumer could just ignore the
  message.
  
  Why don't we let the consumer get the message if it needs it?
 
 Because most consumer will need and because I want to keep the API simple.

The API seems to get more complex, as you are passing
in fields which are only valid sometimes.

  
  Further, IIRC the spec requires any changes to be done while
  message is masked. So mask notifier makes more sense to me:
  it does the same thing using one notifier that you do
  using two notifiers.
 
  That's in fact a possible optimization (only invoke the callback on mask
  transitions).
  
  Further, it is one that is already implemented.
  So I would prefer not to add work by removing it :)
 
 Generalization to cover MSI requires some changes. Unneeded behavioral
 changes back and forth should and will of course be avoided. I will
 rework this.
 
  
  Not sure if that applies to MSI as well, probably not.
  
  Probably not. However, if per vector masking is
  supported, and while vector is masked, the address/
  data values might not make any sense.
  
  So I think even msi users needs to know about masked state.
 
 Yes, and they get this information via the config notifier.
 
  
  To
  have common types, I would prefer to stay with vector config notifiers
  as name then.
 
  Jan
  
  So we pass in nonsense values and ask all users to know about MSIX rules.
  Ugh.
  
  I do realize msi might change the vector without masking.
  We can either artificially call mask before value change
  and unmask after, or use 3 notifiers: mask,unmask,config.
  Add a comment that config is invoked when configuration
  for an unmasked vector is changed, and that
  it can only happen for msi, not msix.
 
 I see no need in complicating the API like this. MSI-X still needs the
 config information on unmask, so let's just consistently pass it via the
 unified config notifier instead of forcing the consumers to create yet
 two more handlers. I really do not see the benefit for the consumer.
 
 Jan
 

The benefit is a clearer API, where all parameters you get are valid,
so you do not need to go read the spec to see what is OK to use.
Generally, encoding events in flags is more error
prone than using different notifiers for different events.

E.g. _unmask and _mask make
it obvious that they are called on mask and on unmask
respectively.
OTOH _config_change(bool mask) is unclear: is 'mask' the new
state or the old state?

It might be just my taste, but I usually prefer multiple
functions doing one thing each rather than a single
function doing multiple things. It shouldn't be too hard ...

-- 
MST



Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

2011-10-18 Thread Jan Kiszka
On 2011-10-18 15:37, Michael S. Tsirkin wrote:
 On Tue, Oct 18, 2011 at 03:00:29PM +0200, Jan Kiszka wrote:
 On 2011-10-18 14:48, Michael S. Tsirkin wrote:
 To my understanding, virtio will be the exception as no other device
 will have a chance to react on resource shortage while sending(!) an MSI
 message.

 Hmm, are you familiar with that spec?

 Not by heart.

 This is not what virtio does,
 resource shortage is detected during setup.
 This is exactly the problem with lazy registration as you don't
 allocate until it's too late.

 When is that setup phase? Does it actually come after every change to an
 MSI vector? I doubt so.
 
 No. During setup, driver requests vectors from the OS, and then tells
 the device which vector should each VQ use.  It then checks that the
 assignment was successful. If not, it retries with less vectors.
 
 Other devices can do this during initialization, and signal
 resource availability to guest using msix vector number field.
 
 Thus virtio can only estimate the guest usage as
 well
 
 At some level, this is fundamental: some guest operations
 have no failure mode. So we must preallocate
 some resources to make sure they won't fail.

We can still track the expected maximum number of active vectors at core
level, collect them from the KVM layer, and warn if we expect conflicts.
Anxious MSI users could then refrain from using this feature, others
might be fine with risking a slow-down on conflicts.

 
 (a guest may or may not actually write a non-null data into a
 vector and unmask it).
 
 Please, forget the non-NULL thing. virtio driver knows exactly
 how many vectors we use and communicates this info to the device.
 This is not uncommon at all.
 


 I actually would not mind preallocating everything upfront which is much
 easier.  But with your patch we get a silent failure or a drastic
 slowdown which is much more painful IMO.

 Again: did we already saw that limit? And where does it come from if not
 from KVM?

 It's a hardware limitation of intel APICs. interrupt vector is encoded
 in an 8 bit field in msi address. So you can have at most 256 of these.

 There should be no such limitation with pseudo GSIs we use for MSI
 injection. They end up as MSI messages again, so actually 256 (-reserved
 vectors) * number-of-cpus (on x86).
 
 This limits which CPUs can get the interrupt though.
 Linux seems to have a global pool as it wants to be able to freely
 balance vectors between CPUs. Or, consider a guest with a single CPU :)
 
 Anyway, why argue - there is a limitation, and it's not coming from KVM,
 right?

No, our limit we hit with MSI message routing are first of all KVM GSIs,
and there only pseudo GSIs that do not go to any interrupt controller
with limited pins. That could easily be lifted in the kernel if we run
into shortages in practice.

 


 That's also why we do those data == 0
 checks to skip used but unconfigured vectors.

 Jan

 These checks work more or less by luck BTW. It's
 a hack which I hope lazy allocation will replace.

 The check is still valid (for x86) when we have to use static routes
 (device assignment, vhost).

 It's not valid at all - we are just lucky that linux and
 windows guests seem to zero out the vector when it's not in use.
 They do not have to do that.

 It is valid as it is just an optimization. If an unused vector has a
 non-null data field, we just redundantly register a route where we do
 not actually have to.
 
 Well, the only reason we even have this code is because
 it was claimed that some devices declare support for a huge number
 of vectors which then go unused. So if the guest does not
 do this we'll run out of vectors ...
 
 But we do need to be prepared
 
 And ATM, we aren't, and probably can't be without kernel
 changes, right?
 
 for potentially
 arriving messages on that virtual GSI, either via irqfd or kvm device
 assignment.

 Jan
 
 Why irqfd?  Device assignment is ATM the only place where we use these
 ugly hacks.

vfio will use irqfds. And that virtio is partly out of the picture is
only because we know much more about virtio internals (specifically:
will not advertise more vectors than guests will want to use).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [RFC 3/6] block: wait for overlapping requests

2011-10-18 Thread Marcelo Tosatti
On Mon, Oct 17, 2011 at 04:47:29PM +0100, Stefan Hajnoczi wrote:
 When copy-on-read is enabled it is necessary to wait for overlapping
 requests before issuing new requests.  This prevents races between the
 copy-on-read and a write request.
 
 Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 ---
  block.c |   39 +++
  1 files changed, 39 insertions(+), 0 deletions(-)
 
 diff --git a/block.c b/block.c
 index e624ac3..cc3202c 100644
 --- a/block.c
 +++ b/block.c
 @@ -1001,6 +1001,7 @@ struct BdrvTrackedRequest {
  int nb_sectors;
  bool is_write;
  QLIST_ENTRY(BdrvTrackedRequest) list;
 +CoQueue wait_queue; /* coroutines blocked on this request */
  };
  
  /**
 @@ -1014,6 +1015,12 @@ static void tracked_request_remove(BdrvTrackedRequest 
 *req)
  {
  if (req) {
  QLIST_REMOVE(req, list);
 +
 +/* Wake up all coroutines blocked on this request */
 +while (qemu_co_queue_next(req-wait_queue)) {
 +/* Do nothing */
 +}
 +
  g_free(req);
  }
  }
 @@ -1038,12 +1045,36 @@ static BdrvTrackedRequest 
 *tracked_request_add(BlockDriverState *bs,
  req-sector_num = sector_num;
  req-nb_sectors = nb_sectors;
  req-is_write = is_write;
 +qemu_co_queue_init(req-wait_queue);
  
  QLIST_INSERT_HEAD(bs-tracked_requests, req, list);
  
  return req;
  }
  
 +static bool tracked_request_overlaps(BdrvTrackedRequest *req,
 + int64_t sector_num, int nb_sectors) {
 +return false; /* not yet implemented */
 +}
 +
 +static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
 +int64_t sector_num, int nb_sectors)
 +{
 +BdrvTrackedRequest *req;
 +bool retry;
 +
 +do {
 +retry = false;
 +QLIST_FOREACH(req, bs-tracked_requests, list) {
 +if (tracked_request_overlaps(req, sector_num, nb_sectors)) {
 +qemu_co_queue_wait(req-wait_queue);
 +retry = true;

What prevents overlapping requests (from waiter criteria) to be inserted
to the queue while there are waiters again?

That is, why is it not possible for a waiter to wait indefinetely?




Re: [Qemu-devel] [RFC][PATCH 23/45] qemu-kvm: Rework MSI-X mask notifier to generic MSI config notifiers

2011-10-18 Thread Jan Kiszka
On 2011-10-18 15:46, Michael S. Tsirkin wrote:
 To
 have common types, I would prefer to stay with vector config notifiers
 as name then.

 Jan

 So we pass in nonsense values and ask all users to know about MSIX rules.
 Ugh.

 I do realize msi might change the vector without masking.
 We can either artificially call mask before value change
 and unmask after, or use 3 notifiers: mask,unmask,config.
 Add a comment that config is invoked when configuration
 for an unmasked vector is changed, and that
 it can only happen for msi, not msix.

 I see no need in complicating the API like this. MSI-X still needs the
 config information on unmask, so let's just consistently pass it via the
 unified config notifier instead of forcing the consumers to create yet
 two more handlers. I really do not see the benefit for the consumer.

 Jan

 
 The benefit is a clearer API, where all parameters you get are valid,
 so you do not need to go read the spec to see what is OK to use.
 Generally, encoding events in flags is more error
 prone than using different notifiers for different events.
 
 E.g. _unmask and _mask make
 it obvious that they are called on mask and on unmask
 respectively.
 OTOH _config_change(bool mask) is unclear: is 'mask' the new
 state or the old state?
 
 It might be just my taste, but I usually prefer multiple
 functions doing one thing each rather than a single
 function doing multiple things. It shouldn't be too hard ...

The impact on the user side (device models) will be larger while the
work in those different handlers will be widely the same (check my
series, e.g. the virtio handler). But it's still just a guess of mine as
well.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Wen Congyang

On 10/18/2011 06:28 PM, Jan Kiszka wrote:

On 2011-10-18 11:43, Wen Congyang wrote:

At 10/18/2011 04:36 PM, Jan Kiszka Write:

On 2011-10-18 10:34, Richard W.M. Jones wrote:

Yeah, I see. Could also be solved via gdb scripts, but crash is already
there.

But let's see if the formats actually differ. In the end, crash is just
parsing the same information that also gdb sees.


I think the format can be similar with diskdump/kdump/netdump:

dump_header: 1 block
sub header:  n blocks(n is stored in dump_header)
bitmap:  m blocks(2m is stored in dump_header)
dumpable bitmap: m blocks
memory data(We can know whether a page is stored in the core by bitmap and 
dumpable bitmap)

The format of dump header(It's like kdump/diskdump):
struct disk_dump_header {
charsignature[SIG_LEN]; /* = QEMU */
int header_version; /* Dump header version */
struct new_utsname  utsname;/* copy of system_utsname */
struct timeval  timestamp;  /* Time stamp */
unsigned intstatus;
int block_size; /* Size of a block in byte */
int sub_hdr_size;   /* Size of arch dependent
   header in blocks */
unsigned intbitmap_blocks;  /* Size of Memory bitmap in
   block */
unsigned intmax_mapnr;  /* = max_mapnr */
unsigned inttotal_ram_blocks;/* Number of blocks should be
   written */
unsigned intdevice_blocks;  /* Number of total blocks in
 * the dump device */
unsigned intwritten_blocks; /* Number of written blocks */
unsigned intcurrent_cpu;/* CPU# which handles dump */
int nr_cpus;/* Number of CPUs */
};

The sub header can contains all registers's value on each vcpu, and other 
information,
for example:
struct qemu_sub_header {
unsigned long   start_pfn;
unsigned long   end_pfn;
off_t   offset_note;
unsigned long   size_note;
};


So is this a standard format or only similar to something? Which tools
will understand it out-of-the-box? If it's not standard, why?


Only similar to something, and we can add a little codes into crash to 
support such format.


If you have a better format, please tell me.

Thanks
Wen Congyang



Jan





[Qemu-devel] job offer for you

2011-10-18 Thread jennifergates48

Hello,

 new job offer for you http://www.universfreeads.com/job.php?job=41264 


..



Re: [Qemu-devel] [Question] dump memory when host pci device is used by guest

2011-10-18 Thread Jan Kiszka
On 2011-10-18 15:51, Wen Congyang wrote:
 On 10/18/2011 06:28 PM, Jan Kiszka wrote:
 On 2011-10-18 11:43, Wen Congyang wrote:
 At 10/18/2011 04:36 PM, Jan Kiszka Write:
 On 2011-10-18 10:34, Richard W.M. Jones wrote:

 Yeah, I see. Could also be solved via gdb scripts, but crash is already
 there.

 But let's see if the formats actually differ. In the end, crash is just
 parsing the same information that also gdb sees.

 I think the format can be similar with diskdump/kdump/netdump:

 dump_header: 1 block
 sub header:  n blocks(n is stored in dump_header)
 bitmap:  m blocks(2m is stored in dump_header)
 dumpable bitmap: m blocks
 memory data(We can know whether a page is stored in the core by bitmap and 
 dumpable bitmap)

 The format of dump header(It's like kdump/diskdump):
 struct disk_dump_header {
 charsignature[SIG_LEN]; /* = QEMU */
 int header_version; /* Dump header version */
 struct new_utsname  utsname;/* copy of system_utsname */
 struct timeval  timestamp;  /* Time stamp */
 unsigned intstatus;
 int block_size; /* Size of a block in byte */
 int sub_hdr_size;   /* Size of arch dependent
header in blocks */
 unsigned intbitmap_blocks;  /* Size of Memory bitmap in
block */
 unsigned intmax_mapnr;  /* = max_mapnr */
 unsigned inttotal_ram_blocks;/* Number of blocks should be
written */
 unsigned intdevice_blocks;  /* Number of total blocks in
  * the dump device */
 unsigned intwritten_blocks; /* Number of written blocks */
 unsigned intcurrent_cpu;/* CPU# which handles dump */
 int nr_cpus;/* Number of CPUs */
 };

 The sub header can contains all registers's value on each vcpu, and other 
 information,
 for example:
 struct qemu_sub_header {
 unsigned long   start_pfn;
 unsigned long   end_pfn;
 off_t   offset_note;
 unsigned long   size_note;
 };

 So is this a standard format or only similar to something? Which tools
 will understand it out-of-the-box? If it's not standard, why?
 
 Only similar to something, and we can add a little codes into crash to 
 support such format.
 
 If you have a better format, please tell me.

The format crash already processes? What do you need in addition? That
should be discussed first, not the format details.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [RFC][PATCH 28/45] qemu-kvm: msix: Drop tracking of used vectors

2011-10-18 Thread Michael S. Tsirkin
On Tue, Oct 18, 2011 at 03:46:06PM +0200, Jan Kiszka wrote:
 On 2011-10-18 15:37, Michael S. Tsirkin wrote:
  On Tue, Oct 18, 2011 at 03:00:29PM +0200, Jan Kiszka wrote:
  On 2011-10-18 14:48, Michael S. Tsirkin wrote:
  To my understanding, virtio will be the exception as no other device
  will have a chance to react on resource shortage while sending(!) an MSI
  message.
 
  Hmm, are you familiar with that spec?
 
  Not by heart.
 
  This is not what virtio does,
  resource shortage is detected during setup.
  This is exactly the problem with lazy registration as you don't
  allocate until it's too late.
 
  When is that setup phase? Does it actually come after every change to an
  MSI vector? I doubt so.
  
  No. During setup, driver requests vectors from the OS, and then tells
  the device which vector should each VQ use.  It then checks that the
  assignment was successful. If not, it retries with less vectors.
  
  Other devices can do this during initialization, and signal
  resource availability to guest using msix vector number field.
  
  Thus virtio can only estimate the guest usage as
  well
  
  At some level, this is fundamental: some guest operations
  have no failure mode. So we must preallocate
  some resources to make sure they won't fail.
 
 We can still track the expected maximum number of active vectors at core
 level, collect them from the KVM layer, and warn if we expect conflicts.
 Anxious MSI users could then refrain from using this feature, others
 might be fine with risking a slow-down on conflicts.

It seems like a nice feature until you have to debug it in the field :).
If you really think it's worthwhile, let's add a 'force' flag so that
advanced users at least can declare that they know what they are doing.

  
  (a guest may or may not actually write a non-null data into a
  vector and unmask it).
  
  Please, forget the non-NULL thing. virtio driver knows exactly
  how many vectors we use and communicates this info to the device.
  This is not uncommon at all.
  
 
 
  I actually would not mind preallocating everything upfront which is much
  easier.  But with your patch we get a silent failure or a drastic
  slowdown which is much more painful IMO.
 
  Again: did we already saw that limit? And where does it come from if not
  from KVM?
 
  It's a hardware limitation of intel APICs. interrupt vector is encoded
  in an 8 bit field in msi address. So you can have at most 256 of these.
 
  There should be no such limitation with pseudo GSIs we use for MSI
  injection. They end up as MSI messages again, so actually 256 (-reserved
  vectors) * number-of-cpus (on x86).
  
  This limits which CPUs can get the interrupt though.
  Linux seems to have a global pool as it wants to be able to freely
  balance vectors between CPUs. Or, consider a guest with a single CPU :)
  
  Anyway, why argue - there is a limitation, and it's not coming from KVM,
  right?
 
 No, our limit we hit with MSI message routing are first of all KVM GSIs,
 and there only pseudo GSIs that do not go to any interrupt controller
 with limited pins.

I see KVM_MAX_IRQ_ROUTES 1024
This is  256 so KVM does not seem to be the problem.

 That could easily be lifted in the kernel if we run
 into shortages in practice.

What I was saying is that resources are limited even without kvm.

  
 
 
  That's also why we do those data == 0
  checks to skip used but unconfigured vectors.
 
  Jan
 
  These checks work more or less by luck BTW. It's
  a hack which I hope lazy allocation will replace.
 
  The check is still valid (for x86) when we have to use static routes
  (device assignment, vhost).
 
  It's not valid at all - we are just lucky that linux and
  windows guests seem to zero out the vector when it's not in use.
  They do not have to do that.
 
  It is valid as it is just an optimization. If an unused vector has a
  non-null data field, we just redundantly register a route where we do
  not actually have to.
  
  Well, the only reason we even have this code is because
  it was claimed that some devices declare support for a huge number
  of vectors which then go unused. So if the guest does not
  do this we'll run out of vectors ...
  
  But we do need to be prepared
  
  And ATM, we aren't, and probably can't be without kernel
  changes, right?
  
  for potentially
  arriving messages on that virtual GSI, either via irqfd or kvm device
  assignment.
 
  Jan
  
  Why irqfd?  Device assignment is ATM the only place where we use these
  ugly hacks.
 
 vfio will use irqfds. And that virtio is partly out of the picture is
 only because we know much more about virtio internals (specifically:
 will not advertise more vectors than guests will want to use).
 
 Jan
 
 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux



Re: [Qemu-devel] hw/nand hw/onenand and read-only

2011-10-18 Thread Peter Maydell
On 18 October 2011 14:38, Markus Armbruster arm...@redhat.com wrote:
 Yes, that way works:

    $ qemu-system-arm -drive if=mtd,file=tmp.qcow2,readonly
    qemu-system-arm: -drive if=mtd,file=tmp.qcow2,readonly: readonly not 
 supported by this bus type

 But try this way:

    $ qemu-system-arm -drive if=none,file=tmp.qcow2,readonly,id=foo -device 
 nand,drive=foo
    Kernel image must be specified

 Grr, force it:

    $ qemu-system-arm -drive if=none,file=tmp.qcow2,readonly,id=foo -device 
 nand,drive=foo -kernel /dev/null
    qemu: hardware error: nand_device_init: Unsupported NAND block size.
    [...]

 Note that I didn't bother supplying a drive with sensible size.  If I
 did, I guess I'd get nand coupled to a read-only drive.  Easy enough for
 you to try :)

Isn't this a block layer problem? It seems to be blockdev.c that
emits the readonly not supported message for MTD devices, so presumably
it ought to do so however the user sets up the device...

-- PMM



Re: [Qemu-devel] [RFC 5/6] block: core copy-on-read logic

2011-10-18 Thread Marcelo Tosatti
On Mon, Oct 17, 2011 at 04:47:31PM +0100, Stefan Hajnoczi wrote:
 Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 ---
  block.c  |   69 
 ++
  trace-events |1 +
  2 files changed, 70 insertions(+), 0 deletions(-)
 
 diff --git a/block.c b/block.c
 index 0c22741..2aec6b4 100644
 --- a/block.c
 +++ b/block.c
 @@ -1409,6 +1409,55 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t 
 offset,
  return 0;
  }
  
 +static int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
 +int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
 +{
 +void *bounce_buffer;
 +struct iovec iov;
 +QEMUIOVector bounce_qiov;
 +int64_t cluster_sector_num;
 +int cluster_nb_sectors;
 +size_t skip_bytes;
 +int ret;
 +
 +/* Cover entire cluster so no additional backing file I/O is required 
 when
 + * allocating cluster in the image file.
 + */
 +round_to_clusters(bs, sector_num, nb_sectors,
 +  cluster_sector_num, cluster_nb_sectors);
 +
 +trace_bdrv_co_copy_on_readv(bs, sector_num, nb_sectors,
 +cluster_sector_num, cluster_nb_sectors);
 +
 +iov.iov_len = cluster_nb_sectors * BDRV_SECTOR_SIZE;
 +iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
 +qemu_iovec_init_external(bounce_qiov, iov, 1);
 +
 +ret = bs-drv-bdrv_co_readv(bs, cluster_sector_num, cluster_nb_sectors,
 + bounce_qiov);
 +if (ret  0) {
 +goto err;
 +}
 +
 +ret = bs-drv-bdrv_co_writev(bs, cluster_sector_num, cluster_nb_sectors,
 +  bounce_qiov);
 +if (ret  0) {
 +/* It might be okay to ignore write errors for guest requests.  If 
 this
 + * is a deliberate copy-on-read then we don't want to ignore the 
 error.
 + * Simply report it in all cases.
 + */
 +goto err;
 +}
 +
 +skip_bytes = (sector_num - cluster_sector_num) * BDRV_SECTOR_SIZE;
 +qemu_iovec_from_buffer(qiov, bounce_buffer + skip_bytes,
 +   nb_sectors * BDRV_SECTOR_SIZE);
 +
 +err:
 +qemu_vfree(bounce_buffer);
 +return ret;
 +}
 +
  /*
   * Handle a read request in coroutine context
   */
 @@ -1431,7 +1480,27 @@ static int coroutine_fn 
 bdrv_co_do_readv(BlockDriverState *bs,
  }
  
  req = tracked_request_add(bs, sector_num, nb_sectors, false);
 +
 +if (bs-copy_on_read) {
 +int pnum;
 +
 +/* TODO it is not safe to call bdrv_is_allocated() in coroutine 
 context
 + * because it's a synchronous interface.  We probably want a
 + * bdrv_co_is_allocated(). */
 +ret = bdrv_is_allocated(bs, sector_num, nb_sectors, pnum);
 +if (ret  0) {
 +goto out;
 +}

This lacks shared base image support, BTW (as in copy should be
performed only if cluster not in destination chain). Could be added
later.




Re: [Qemu-devel] [PATCH 1/2] Move graphic-related coalesced MMIO flushes to affected device models

2011-10-18 Thread Jan Kiszka
On 2011-10-18 16:00, Avi Kivity wrote:
 On 09/30/2011 01:31 PM, Jan Kiszka wrote:
 This is conceptually cleaner and will allow us to drop the nographic
 timer. Moreover, it will be mandatory to fully exploit future per-device
 coalesced MMIO rings.

 
 Appears to break winxp installation - the guest enters an infinite
 bitblt loop.  Trying to find out why.

Hmm, maybe there are side effects in certain modes that actually
disallow coalescing.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 3/7] integratorcp: convert control to memory API

2011-10-18 Thread Avi Kivity
On 10/18/2011 01:44 PM, Peter Maydell wrote:
 2011/10/17 Benoît Canet benoit.ca...@gmail.com:
  -static void icp_control_init(uint32_t base)
  +static void icp_control_init(target_phys_addr_t base)
   {
  -int iomemtype;
  +MemoryRegion *io;
 
  -iomemtype = cpu_register_io_memory(icp_control_readfn,
  -   icp_control_writefn, NULL,
  -   DEVICE_NATIVE_ENDIAN);
  -cpu_register_physical_memory(base, 0x0080, iomemtype);
  +io = (MemoryRegion *)g_malloc0(sizeof(MemoryRegion));
  +memory_region_init_io(io, icp_control_ops, NULL,
  +  control, 0x0080);
  +memory_region_add_subregion(get_system_memory(), base, io);
  /* ??? Save/restore.  */
   }

 I didn't spot this the first time round, but this is wrong.
 We shouldn't be g_malloc0()ing the MemoryRegion -- it should
 be an element in some suitable device struct.

 I think the right thing to do here is probably first to do the
 (fairly trivial) conversion of the icp_control code to be a
 sysbus device, then do the memory region conversion on top of that.

It's not any less wrong than the original code, which also leaked
memory.  I'll merge it and let any further conversion take place on top.

(though g_malloc0() is better replaced by g_new()).

-- 
error compiling committee.c: too many arguments to function




  1   2   3   >